ArticlePDF Available

Aviation security screeners: Visual abilities & visual knowledge measurement

Authors:

Abstract

The role of image-based and knowledge-based factors in the X-ray screening using the Object Recognition Test (ORT) and prohibited items test (PIT) was investigated. It was found that X-ray detection performance relies on visual abilities necessary for coping with image-based effects such as view, bag complexity, and superposition. Large differences were found in detection performance between screeners and novices for the PIT. The application of the ORT and PIT for screener certification and competency assessment were also described.
IEEE A&E SYSTEMS MAGAZINE, JUNE 2005 29
Aviation Security Screeners
Visual Abilities & Visual Knowledge Measurement
Adrian Schwaninger, Diana Hardmeier & Franziska Hofer
University of Zurich, Switzerland
ABSTRACT
A central aspect of airport security is reliable detection
of forbidden objects in passenger’s bags using X-ray
screening equipment. Human recognition involves visual
processing of the X-ray image and matching items with
object representations stored in visual memory. Thus,
without knowing which objects are forbidden and what
they look like, prohibited items are difficult to recognize
(aspect of visual knowledge). In order to measure whether
a screener has acquired the necessary visual knowledge, we
have applied the prohibited items test (PIT). This test
contains different forbidden items according to
international prohibited items lists. The items are placed in
X-ray images of passenger bags so that the object shapes
can be seen relatively well. Since all images can be
inspected for 10 seconds, failing to recognize a threat item
can be mainly attributed to a lack of visual knowledge.
The object recognition test (ORT) is more related to
visual processing and encoding. Three image-based factors
can be distinguished that challenge different visual
processing abilities. First, depending on the rotation within
a bag, an object can be more or less difficult to recognize
(effect of viewpoint). Second, prohibited items can be more
or less superimposed by other objects, which can impair
detection performance (effect of superposition). Third, the
number and type of other objects in a bag can challenge
visual search and processing capacity (effect of bag
complexity). The ORT has been developed to measure how
well screeners cope with these image-based factors. This
test contains only guns and knives, placed into bags in
different views with different superposition and complexity
levels. Detection performance is determined by the ability
of a screener to detect threat items despite rotation,
superposition and bag complexity. Since the shapes of guns
and knives are usually well-known even by novices, the
aspect of visual threat object knowledge is of minor
importance in this test.
A total of 134 aviation security screeners and 134
novices participated in this study. Detection performance
was measured using A’. The three image-based factors of
the ORT were validated. The effect of view, superposition,
and bag complexity were highly significant. The validity of
the PIT was examined by comparing the two participant
groups. Large differences were found in detection
performance between screeners and novices for the PIT.
This result is consistent with the assumption that the PIT
measures aspects related to visual knowledge. Although
screeners were also better than novices in the ORT, the
relative difference was much smaller. This result is
consistent with the assumption that the ORT measures
image-based factors that are related to visual processing
abilities; whereas the PIT is more related to visual
knowledge. For both tests, large inter-individual
differences were found. Reliability was high for both
participant groups and tests, indicating that they can be
used for measuring performance on an individual basis.
The application of the ORT and PIT for screener
certification and competency assessment are discussed.
INTRODUCTION
The importance of aviation security has changed
dramatically in the last years. As a consequence of the new
threat situation large investments into technology have been
made. State-of-the-art X-ray machines provide high resolution
images, many image enhancement features and even automatic
detection of explosive material. However, it is becoming clear
since recently that the best technology is only as valuable as the
humans that operate it. Indeed, reliable recognition of threat
items in X-ray images of passenger bags is a demanding task.
Consider the images depicted in Figure 1. Each of the two bags
contains a threat item that could be used to severely harm
This research was financially supported by Zurich Airport Unique, Switzerland.
Author’s Current Address:
University of Zurich, Switzerland, Psychologisches Institut, Visual Cognition
Research Group (VICOREG), Klosbachstr. 107, 8032 Zurich, Switzerland.
Based on a presentation at Carnahan 2004.
0885/8985/05/ $17.00 © 2005 IEEE
IEEE A&E SYSTEMS MAGAZINE, JUNE 2005
30
Fig. 1. Examples of prohibited items in X-ray images of passenger bags
Fig. 1A. Gas spray in the center of the baggage below the eyeglasses;
Fig. 1B. Switchblade knife slightly above the center of the baggage next to the keys
a
b
people.
Even though most people would probably recognize
prohibited items like the gas spray in Figure 1A when depicted
in a photograph, this and other threat objects are relatively hard
to recognize for novices because the shape features look quite
different in an X-ray image than in reality. Other dangerous
items (e.g., the switchblade knife in Figure 1B) might be
missed by a novice because they look similar to harmless
objects (e.g. a pen). Several other threat objects are usually not
encountered in real life (e.g., improvised explosive devices,
IEDs), which stresses the importance of computer-based
training in order to achieve a high detection performance
within a few seconds of inspection time [1].
In short, the knowledge about which items are prohibited
and what they look like in an X-ray image is certainly an
important determinant for detection performance. The
Prohibited Items Test (PIT) has been developed to measure this
knowledge-based component and it therefore contains a large
number of different forbidden objects according to
international prohibited items lists [2].
As pointed out by [3] several image-based effects influence
how well threat items can be recognized in X-ray images
(Figure 2). Viewpoint can strongly affect recognition
performance, which has been shown previously in many object
recognition studies (for reviews see [4 - 7]). Since objects are
often superimposed on each other in X-ray images, the degree
of superposition can affect detection performance
substantially. Another image-based factor is bag complexity,
which is determined by the type and number of objects in a bag.
The Object Recognition Test (ORT) has been developed to
measure how well screeners can cope with such image-based
factors [8]. In order to reduce effects of visual knowledge, only
guns and knives are used in this test, i.e., object shapes that are
usually well known also by novices.
The purpose of this study is to investigate the role of
image-based and knowledge-based factors in X-ray screening
using these two different tests. To what extent screeners know
which items are prohibited and what they look like in
passenger bags is measured by the PIT. It includes prohibited
items of different categories in X-ray images of passenger bags
while keeping effects of view, superposition, and bag
complexity relatively constant. The objects are displayed in an
easy view with a moderate degree of superposition in bags of
limited complexity during 10 seconds per image. If a
participant fails to detect a threat item it is therefore rather
Fig. 2. Image-based factors according to [3];
Fig. 2A. Viewpoint of the threat item
(canonical vs. non-canonical);
Fig. 2B. Bag complexity (low vs. high);
Fig. 2C. Superposition of the threat item (low vs. high)
View
Bag
Complexity
Superposition
IEEE A&E SYSTEMS MAGAZINE, JUNE 2005 31
related to a lack of visual knowledge than to an attentional
failure or visual processing capacity limitations. Since many
different prohibited items with shapes that are often not known
from everyday experience are used in the PIT, a substantial
difference in detection performance between novices and
screeners could be expected. The ORT measures how well
someone can cope with image-based factors such as view,
superposition, and bag complexity. As mentioned above, only
guns and knives are used in this test, i.e., object shapes that are
well known by both screeners and novices. Therefore, smaller
differences between screeners and novices might be expected
for the ORT compared to the PIT. However, expertise might
increase visual abilities that are necessary in order to cope with
image difficulty resulting from effects of viewpoint,
superposition, and bag complexity. Therefore, the effect size
of the interaction between image-based effects and expertise is
an important measure in this study as well.
METHOD
Participants
A total of 268 participants took part in this study. Half were
aviation security screeners, the other half were novices.
All participants were tested with the ORT and then the PIT.
The screener group consisted of 67 females and 67 males at the
ages of 24 and 57 years (M = 41.05 years, SD = 7.84
years). All had undergone initial classroom and on the job
training and they had at least two years of work experience in
airport security screening of carry-on bags.
The novices group consisted of 134 males between 21 and
26 years (M = 23.24 years, SD = 1.22 years).
Materials and Procedure
Prohibited Items Test (PIT)
This test contains a wide spectrum of prohibited items
which can be classified into seven categories according to
international prohibited items lists [2]. The PIT version used in
this study included a total of 19 guns, 27 sharp objects, 14 blunt
and hunt instruments, 5 highly inflammable substances, 17
explosives, 3 chemicals, and 13 other prohibited items (e.g.,
buckshot, ivory). All prohibited items were depicted from an
easy viewpoint and combined with a bag of medium
complexity and low superposition, so their shapes could be
seen relatively well and the influence of image-based factors
could be minimized. X-ray images were taken from Heimann
6040i machines and displayed in color. 68 bags contained one
threat item, 6 bags contained two threat items, and 6 bags
contained three threat items. Each bag was shown twice
resulting in a total of 160 trials. There were four blocks of 40
trials. Block order was counterbalanced across four groups of
participants using a Latin Square design. Trial order was
randomized within each block. Only responses to images
containing one threat item were used for statistical analyses.
The PIT is fully computer-based and starts with a
self-explanatory instruction, followed by a brief training
session with eight examples to familiarize the participants with
the procedure. Feedback is provided after each trial only in the
introductory phase. Each X-ray image was displayed for a
maximum of 10 seconds in the introductory and test phases.
This duration is long enough to ensure that missing a threat
item can be mainly attributed to a lack of visual knowledge
rather than a failure of attention. For each image, participants
had to decide whether the bag was OK (no threat) or NOT OK
(threat) and indicate on a slider how sure they were in their
decision (confidence ratings on a 50 point scale). In addition,
participants had to indicate the threat category of the prohibited
item(s) by clicking the corresponding buttons on the screen
(for NOT OK decisions only). Pressing the space bar displayed
the next image. As the test was subdivided into four blocks,
participants were allowed to take a short break after a block
was completed.
Object Recognition Test (ORT)
As explained in the introduction, [3] pointed out that
image-based factors such as viewpoint, superposition, and bag
complexity can substantially affect detection performance in
X-ray images. The ORT has been designed to measure how
well people can cope with such image-based factors rather than
measuring knowledge-based determinants of threat detection
performance [8]. To this end, guns and knives with the blade
open are used in the ORT, i.e., object shapes that can be
assumed to be known by most people. In addition, all guns and
knives are shown for 10 seconds before the test starts, which
further reduces the role of knowledge-based factors in this test.
In reality, a threat object can be depicted from a difficult
viewpoint in a close-packed bag and be superimposed by other
objects. The X-ray images used in the ORT vary systematically
in image difficulty by varying the degree of view difficulty,
bag complexity, and superposition, both independently and in
combination. This makes it possible to investigate main effects
as well as interactions between the image-based factors. All
X-ray images of the ORT are in black-and-white, as color, per
se, is mainly diagnostic for the material of objects in the bag,
and thus, could be primarily helpful for experts.
Eight guns and eight knives with common shapes were
used. Each gun and each knife was displayed in an easy view
and a rotated view to measure the effect of viewpoint. In order
to equalize image difficulty resulting from viewpoint changes,
guns were more rotated than knives based on results of a pilot
study. Each view was combined with two bags of low
complexity: once with low superposition; and once with high
superposition. These combinations were also generated using
two closed-packed bags with a higher degree of bag
complexity. In addition, each bag was presented once with and
once without the threat item. Thus, there were a total of 256
trials: 2 weapons (guns, knives) * 8 (exemplars) * 2 (views) * 2
(bag complexities) * 2 (superpositions) * 2 (harmless vs. threat
images). There were four blocks of 64 trials each. The order of
blocks was counterbalanced across four groups of participants
using a Latin Square. Within each block the order of trials was
random.
The ORT is fully computer-based. After task instructions an
introductory session followed using 2 guns and 2 knives not
displayed in the test phase. Feedback was provided after each
IEEE A&E SYSTEMS MAGAZINE, JUNE 2005
32
trial but only in the introductory phase. Prior to the test phase,
the eight guns and eight knives used at test were presented for
10 seconds, respectively. Half of the guns and knives were
shown in an easy view and half were depicted in a rotated view.
At test, each object was presented in the easy and the rotated
view with low and high superposition and with low and high
bag complexity. Each image was displayed for 4 seconds. This
duration was chosen to match the demands of high passenger
flow where average X-ray image inspection time at
checkpoints is in the range of 3-5 seconds. For each X-ray
image, participants had to decide whether the X-ray images
contained one of the guns or knives shown in the introductory
phase or not (NOT OK or OK response). Confidence ratings
had to be provided by changing the position of a slider (90
point scale). The next trial was started by pressing the space
bar. Short breaks were possible after completing one of the
four blocks.
RESULTS
It is important to take the hit rate as well as the false alarm
rate into account if threat and non-threat images are used in a
computer-based test requiring OK and NOT OK responses.
The reason is simple: A candidate could achieve a hit rate of
100% simply by judging all bags as being NOT OK. Whether a
high hit rate reflects good visual detection performance, or just
a lenient response bias, can only be determined if the false
alarm rate is considered, too. Psychophysics provides several
methods in order to derive more valid estimates based on hit
and false alarm rates. A well-known measure from signal
detection theory is d’ [9]. It equals z(H) – z(FA) whereas H
denotes the hit rate, FA the false alarm rate and z represents the
transformation into zscores (standard deviation units). An
often used “non-parametric” measure is A’ [10]. This measure
represents an estimate of the area under an ROC curve that is
specified by only one data point. More specifically, A’
corresponds to the average area for the two linear ROC curves
that maximize and minimize the hit rate. The term
“non-parametric” is a bit misleading because it only refers to
the fact that the computation of A’ doesn’t require an priori
assumption about the underlying distributions [11, 12]. This
has sometimes been regarded as an advantage over SDT
measures such as d’ and m (for a more detailed discussion of
this issue. See also [13]). Although only A’ data are reported in
this study, it should be stressed that similar results were
obtained for d’ data. Moreover, correlations between A’ and d’
were very high for both tests and screeners groups (ORT: r =
.94 for screeners and r = .97 for novices, PIT: r = .95 for
screeners and r = .98 for novices, all p < .001).
The results section is organized as follows: first, ANOVA
results of the ORT are presented. These analyses were
conducted to investigate whether detection performance of
aviation security screeners and novices is affected by
image-based factors. In addition, the effect of expertise on the
three image-based factors measured by the ORT was
examined. Second, overall detection performance in the ORT
is compared to overall detection performance in the PIT
1
. More
specifically, the effect of expertise on image-based factors and
knowledge-based factors is analyzed, comparing detection
performance of aviations security screeners with that of
novices in the two tests. Finally, the results of reliability
analyses are presented which were conducted to evaluate
whether the ORT and PIT can be used for measuring detection
performance on an individual basis.
ORT and Abilities to Cope with Image-Based Factors
A’ scores calculated from hit and false alarm rates of the
ORT were subjected to three-way analyses of variance
(ANOVA) with the three within-participants factors view, bag
complexity, and superposition. Results of aviation security
screeners show that there were significant main effects of view
(easy vs. rotated) with an effect size of η
2
= .71, F(1, 133) =
318.59, MSE = 0.003, p < .001, bag complexity (low vs. high)
η
2
= .83, F(1, 133) = 652.96, MSE = 0.003, p < .001, and
superposition (low vs. high) η
2
= .61, F(1, 133) = 203.73, MSE
= 0.003, p < .001. The following two-way interactions were
significant: View * superposition, η
2
= .12, F(1, 133) = 17.91,
MSE = 0.002, p < .001, bag complexity * superposition η
2
=
.12, F(1, 133) = 18.22, MSE = 0.002, p < .001. Note however,
that the effect sizes of these interactions are rather low when
compared to the effect sizes of the main effects. All other
interactions were not significant. In short, there were clear
main effects of view, bag complexity, and superposition with
very large effect sizes (see also conventions by [14]). Some
interactions reached statistical significance, but the effect sizes
were relatively small when compared to the effect sizes of the
main effects.
Similar results could be observed for novices. Again, there
were significant main effects of view (easy vs. rotated) η
2
=
.76, F(1, 133) = 428.33, MSE = 0.005, p < .001, bag complexity
(low vs. high) η
2
= .72, F(1, 133) = 333.14, MSE = 0.005, p <
.001, and superposition (low vs. high) η
2
= .63, F(1, 133) =
228.09, MSE = 0.004, p < .001. All two-way interactions were
significant: View * bag complexity η
2
= .06, F(1, 133) = 9.07,
MSE = 0.004, p < .01, view * superposition η
2
= .07, F(1, 133)
= 10.43, MSE = 0.004, p < .01, bag complexity * superposition
η
2
= .15, F(1, 133) = 23.15, MSE = 0.004, p < .001. The
three-way interaction between view, bag complexity and
superposition also reached statistical significance, η
2
= .03,
F(1, 133) = 4.14, MSE = 0.004, p < .05. As for screeners, very
large effect sizes were found for main effects whereas the
interactions showed much smaller effect sizes.
Figure 3 shows the main effects of each of the three
image-based factors, averaged across the other two factors. A
comparison of Figure 3A (aviation security screeners) and
Figure 3B (novices) reveals that screeners were slightly better
than novices while both screener groups are substantially
affected by the image-based factors view, bag complexity, and
superposition. In order to examine whether expertise has a
A’ scores for the PIT were calculated using the responses to images of the
following categories: guns, sharp objects, hunt and blunt instruments.
1
A’ scores for the PIT were calculated using the responses to images of the following
categories: guns, sharp objects, hunt and blunt instruments.
IEEE A&E SYSTEMS MAGAZINE, JUNE 2005 33
differential effect on these image-based factors, a four-way
analysis of variance (ANOVA) with the within-participants
factors view, bag complexity, and superposition and the
between-participant factor expertise was computed. There
were again significant main effects of view (easy vs. rotated) η
2
= .74, F(1, 266) = 744.57, MSE = 0.004, p < .001, bag
complexity (low vs. high) η
2
= .77, F(1, 266) = 884.75, MSE =
0.004, p < .001, and superposition (low vs. high) η
2
= .62, F(1,
266) = 428.20, MSE = 0.004, p < .001. Two-way interactions
between view and bag complexity η
2
= .04, F(1, 266) = 10.23,
MSE = 0.003, p < .01, view and superposition η
2
= .09, F(1,
266) = 26.17, MSE = 0.003, p < .001, view and expertise η
2
=
.10, F(1, 266) = 30.52, p < .001, and superposition and expertise
η
2
= .03, F(1, 266) = 9.39, p < .01 were significant, as well as
the tree-way interactions between view, bag complexity, and
superposition η
2
= .02, F(1, 266) = 5.47, MSE = 0.003, p < .05,
and bag complexity, superposition and expertise η
2
= .13, F(1,
266) = 41.13, p < .001. Although these interactions were
significant, all have relatively low effect sizes when compared
to the main effects. All other interactions were not significant.
In short, these results indicate that the effects of
image-based factors are apparent for novices as well as for
aviation security screeners and expertise does only slightly
reduce these effects of view, bag complexity, and
superposition.
PIT, Visual Knowledge and Expertise
In contrast to the ORT, the PIT has been developed to
measure whether screeners know which items are prohibited
and how they look in X-ray images of passenger bags [2].
Whereas in the ORT only guns and knives are used – object
shapes that are also familiar to novices – the PIT contains all
kinds of forbidden objects based on international prohibited
items lists. In this test, all target objects are shown in an easy
viewpoint with a moderate degree of superposition in bags of
moderate bag complexity. As mentioned above, each image is
shown for 10 seconds and therefore missing a threat item in the
PIT can rather be attributed to a lack of visual knowledge than
to an attentional failure or visual processing capacity
limitations. If detection performance in the PIT is indeed
mainly determined by visual experience and training with
X-ray images, large differences between novices and aviation
security screeners should be observed in this test. As reported
in the previous section, only moderate differences between
novices and screeners were found for the ORT.
In order to compare relative difference between experts and
novices for the PIT and ORT, overall hit and false alarm rates
were used to compute relative detection performance
difference separately for the ORT and PIT using the following
formula:
(A’
experts
– A’
novices
) / A’
novices
Relative detection performance difference between experts
and novices was indeed much higher for the PIT than for the
ORT (15.89% vs. 6.05%). This is consistent with the view that
the PIT measures visual knowledge dependent on training and
expertise, whereas the ORT measures more stable visual
abilities used to cope with image-based factors such as effects
of view, bag complexity, and superposition.
This main finding was further analyzed using a two-way
analysis of variance (ANOVA) with the within-participant
factor test type (ORT, PIT) and the between-participant factor
expertise using overall A’ scores from each test. There was a
significant
effect
of
test
type
(ORT
vs.
PIT)
η
2
=
.80,
F(1, 266)
= 1075.10, MSE = 0.002, p < .001, a significant effect of
expertise (experts vs. novices) η
2
= .44, F(1, 266) = 206.11,
MSE = 0.004, p < .001, and a significant interaction of test type
and expertise η
2
= .20, F(1, 266) = 65.30, p < .001. The
interaction between test type and expertise is consistent with
the hypothesis that the ORT measures rather image-based
factors whereas the PIT measures rather knowledge-based
factors.
It must also be noted however, that correlation analyses
showed that the two tests are far from being orthogonal.
Overall detection performance A’ of the two tests correlates
with r = .51, p < .001 for experts, and r = .42, p < .001 for
novices. This could at least indicate that detection performance
in PIT is not only determined by visual knowledge but also by
visual abilities used to cope with image-based factors as
measured by the ORT.
Fig. 3. Detection performance (A’) in the ORT
with standard deviations;
Fig. 3A. For aviations security screeners;
Fig. 3B. For novices
0.50
0.60
0.70
0.80
0.90
1.00
View Bag complexity Superposition
Detection Performance (A')
low
high
0.50
0.60
0.70
0.80
0.90
1.00
View Bag complexity Superposition
Detection Performance (A')
low
high
b
a
IEEE A&E SYSTEMS MAGAZINE, JUNE 2005
34
Table 1. Reliability Analyses
NOTE: Cronbach Alpha values and split-half reliabilities (Guttman) for both tests in each group
(experts and novices separately) calculated for percentage correct (PC) and confidence ratings (CR)
separately for signal plus noise (SN) and noise trials (N).
One potential argument against the analyses of this section
could be that the expert group consisted of males and females,
whereas the novices group consisted only of males. However,
it is unlikely that gender effects can explain the differences
found between experts and novices since no significant
differences were found between male and female screeners,
neither for the ORT (p = .70) nor for the PIT (p = .78).
Reliability Analyses
Internal reliability was analyzed using Cronbach’s Alpha
and Guttman split-half coefficients separately for both
participant groups (aviation security screeners and novices).
Analyses were computed for signal plus noise trials (bags
including a threat item) and noise trials (harmless bags),
respectively. Reliability coefficients were computed on the
basis of the percentage correct measures (i.e., hit and correct
rejections), as well as on the basis of the screeners’ confidence
ratings (CR) for hits and correct rejections. As can be seen in
Table 1 high reliability coefficients were found for both tests
and participant groups.
The results section has clearly shown that item difficulty in
the ORT depends on the main effects and interactions between
view, bag complexity, and superposition. Therefore, the high
internal consistency also found for the ORT is a nice example
for the fact that a test can be homogenous and multifactorial
(see [15]).
DISCUSSION
The objective of this study was to examine the role of
image-based and knowledge-based factors for detecting threat
items in passenger bags. As pointed out by [3], image-based
factors such as effects of viewpoint, bag complexity, and
superposition can substantially affect detection performance.
The ORT has been developed to measure how well a
participant can cope with these image-based factors [8]. This
test contains guns and knives depicted in an easy and difficult
view shown in bags with low and high bag complexity while
being strongly or little superimposed by other objects. Main
effects with large effect sizes were found for aviation security
Reliability Coefficients
PC
SN
PC
N
CR
SN
CR
N
Cronbach Alpha .840 .878 .887 .924
Screeeners
Split-half .811 .915 .859 .948
Cronbach Alpha .871 .877 .885 .914
PIT
Novices
Split-half .882 .862 .883 .890
Cronbach Alpha .862 .934 .902 .962
Screeners
Split-half .733 .813 .792 .887
Cronbach Alpha .899 .910 .916 .959
ORT
Novices
Split-half .778 .810 .759 .907
IEEE A&E SYSTEMS MAGAZINE, JUNE 2005 35
screeners as well as novices. While screeners achieved a
moderately better detection performance in the ORT, they
were still significantly affected when threat items were rotated,
superimposed by other objects, or shown in complex bags.
This result is consistent with the view that the ORT does
measure visual abilities necessary to cope with image
difficulty resulting from effects of viewpoint, bag complexity
and superposition. Large inter-individual differences were
found both for novices as well as experts. Internal reliability
was very high for both groups. Therefore, this test could be a
useful tool both for competency assessment of screeners as
well as for pre-employment assessment purposes.
The PIT has been developed to measure whether a screener
knows which items are prohibited and what they look like in
X-ray images of passenger bags [2]. In this test, all objects are
depicted in an easy view. Bag complexity and superposition
are moderate so that the threat item shapes are visible. Images
are shown for 10 seconds, i.e., missing a threat item can be
attributed to a lack of visual knowledge rather than to an
attentional failure or a visual processing capacity limitation. If
the PIT is indeed related to visual knowledge based on
expertise and training, large differences between novices and
experts should be observed. Indeed, relative detection
performance difference between novices and experts was
about three times higher for the PIT than for the ORT. This
result is consistent with the view that the PIT measures
knowledge-based factors whereas the ORT measures visual
abilities used for coping with image-based factors. As for the
ORT, excellent reliability coefficients were found for the PIT.
This test could therefore provide a useful tool for certification,
competency, and risk assessment as well as for quality control
in general.
In summary, the results of this study confirm that X-ray
detection performance relies on visual abilities necessary for
coping with image-based effects such as view, bag complexity,
and superposition. Visual experience and training are
necessary to know which items are prohibited and what they
look like in X-ray images of passenger bags. Both aspects are
prerequisites for a good screener and can be evaluated using
the ORT and PIT.
ACKNOWLEDGMENT
This research was financially supported by Zurich Airport
Unique, Switzerland. We are thankful to Zurich State Police,
Airport Division for their help in creating the stimuli and the
good collaboration for conducting the study.
REFERENCES
[1] A. Schwaninger and F. Hofer,
Evaluation of CBT for increasing threat detection performance
in X-ray screening,
in The Internet Society 2004, Advances in Learning,
Commerce and Security, K. Morgan and M. J. Spector, Eds.
Wessex: WIT Press 2004, pp. 147-156.
[2] A. Schwaninger,
Prohibited items test (PIT): Test manual and user’s guide.
Zurich: APSS.
[3] A. Schwaninger,
Detection systems: Screener evaluation and selection,
AIRPORT, Vol. 2, pp. 14-15, 2003.
[4] M.J. Tarr and H.H. Bülthoff,
Is human object recognition better described by geon structural
descriptions or by multiple views?,
Journal of Experimental Psychology,
Human Perception and Performance,
Vol. 21(6), pp. 1494-1505, 1995.
[5] M.J. Tarr and H.H. Bülthoff,
Object recognition in man, monkey and machine.
Cambridge, Massachusetts: MIT Press, 1999.
[6] A. Schwaninger,
Object recognition and signal detection,
in Praxisfelder der Wahrnehmungspsychologie,
B. Kersten and M.T. Groner, Eds., Bern: Huber, in press.
[7] M. Graf, A. Schwaninger, C. Wallraven and H.H. Bülthoff,
Psychophysical results from experiments on recognition
& categorisation,
Information Society Technologies (IST) programme,
Cognitive Vision Systems – CogVis; (IST-2000-29375), 2002.
[8] A. Schwaninger,
Object recognition test (ORT): Test manual and user’s guide.
Zurich: APSS.
[9] D.M. Green and J.A. Swets,
Signal detection theory and psychophysics,
New York: Wiley, 1966.
[10] I. Pollack and D.A. Norman,
A non-parametric analysis of recognition experiments,
Psychonomic Science, Vol. 1, pp. 125-126, 1964.
[11] R.E. Pastore, E.J. Crawley, M.S. Berens and M.A. Skelly,
Nonparametric” A’ and other modern misconceptions about
signal detection theory,
Psychonomic Bulletin & Review, Vol. 10, pp. 556-569, 2003.
[12] N.A. MacMillan and C.D. Creelman,
Detection theory: A user’s guide.
Cambridge: University Press, 1991.
[13] F. Hofer and A. Schwaninger,
Reliable and valid measures of threat detection performance
in X-ray screening,
IEEE ICCST Proceedings, Vol. 38, pp. 303-308, 2004.
[14] J. Cohen,
Statistical power analysis for the behavioral sciences.
New York: Erlbaum, Hillsdale, 1988.
[15] P. Kline,
The handbook of psychological testing.
London: Routledge, 2000.
... Stimuli were grayscale luggage X-ray images taken from the X-ray ORT 1.3 and X-ray ORT 2.0 Schwaninger et al. 2005). All images were resized to 714 × 562 pixels. ...
... We used 32 unique target-present images. Target-present stimuli always included one target (gun or knife), with a low level of superposition, an easy viewpoint, and a high level of bag complexity (see Hardmeier et al. 2005;Schwaninger et al. 2005). The quadrant in which the target was located was determined for each target-present image. ...
Article
Full-text available
Visual inspection of luggage using X-ray technology at airports is a time-sensitive task that is often supported by automated systems to increase performance and reduce workload. The present study evaluated how time pressure and automation support influence visual search behavior and performance in a simulated luggage screening task. Moreover, we also investigated how target expectancy (i.e., targets appearing in a target-often location or not) influenced performance and visual search behavior. We used a paradigm where participants used the mouse to uncover a portion of the screen which allowed us to track how much of the stimulus participants uncovered prior to their decision. Participants were randomly assigned to either a high (5-s time per trial) or a low (10-s time per trial) time-pressure condition. In half of the trials, participants were supported by an automated diagnostic aid (85% reliability) in deciding whether a threat item was present. Moreover, within each half, in target-present trials, targets appeared in a predictable location (i.e., 70% of targets appeared in the same quadrant of the image) to investigate effects of target expectancy. The results revealed better detection performance with low time pressure and faster response times with high time pressure. There was an overall negative effect of automation support because the automation was only moderately reliable. Participants also uncovered a smaller amount of the stimulus under high time pressure in target-absent trials. Target expectancy of target location improved accuracy, speed, and the amount of uncovered space needed for the search. Significance Statement Luggage screening is a safety–critical real-world visual search task which often has to be done under time pressure. The present research found that time pressure compromises performance and increases the risk to miss critical items even with automation support. Moreover, even highly reliable automated support may not improve performance if it does not exceed the manual capabilities of the human screener. Lastly, the present research also showed that heuristic search strategies (e.g., areas where targets appear more often) seem to guide attention also in luggage screening.
... Tehlikenin büyüklüğü nedeniyle güvenliği sağlamada yararlanılan teknolojiye büyük yatırımlar yapılmaktadır. Bu yatırımlarda havalimanına giren yolcu, personel, bagaj ve kargo gibi her unsurun incelenebilmesini ve yasadışı eylemlerde kullanılacak her türlü silah, patlayıcı ve diğer tehlikeli aletlerin tespit edilebilmesini sağlayan cihazlar ön planda yer almaktadır [15]. Bu cihazlar; ...
... Bu nedenle testin güvenlik görevlilerini sertifikalandırmada, yeterlilik ve risk değerlendirmelerinde kullanılabileceği ifade edilmiştir. Çalışmanın nihai sonucunda görüntülemeye ilişkin cihazlarda tehlikeli madde tespit performansının görsel yeteneklere dayandığı bu nedenle görsel deneyim ve eğitimin çok önemli olduğu vurgulanmıştır [15]. ...
Article
Full-text available
zet Seyahat edilen ülkeler hakkında ilk izlenimlerin edinildiği yer havlimanlardır. Ülkelerin prestiji açısından ve ziyaretçiler üzerindeki etkisi nedeniyle büyük öneme sahip olan havalimanlarının güvenliklerinin sağlanması da oldukça önemlidir. Geçmişte uygulanan güvenlik prosedürleri, havacılık sektörünün vazgeçilmez unsurları olan hız ve konforu olumsuz yönde etkileyerek bir takım güvenlik zafiyetlerine neden olmuştur. Bu olumsuzlukların giderilmesinde teknoloji ve insan unsurlarının birlikte kullanılması gerekli görülmektedir. Teknoloji, güvenlik hizmetlerinden taviz verilmeden bu hizmetlerin kolaylaştırılmasını sağlamaktadır. Ancak teknoloji tek başına yeterli olmamak ile beraber insan unsuru ile tamamlandığında bir anlam ifade etmektedir. Çalışmada havalimanlarında güvenliğin sağlanmasında yararlanılan teknolojiler ve bu teknolojileri kullanan insan unsuru incelenmiştir. Kapı ve el dedektörü, vücut tarama sistemi, x-ray cihazı, patlayıcı tespit cihazı, iz ve koku tespit cihazlarının çalışma prensipleri, kullanım amaçları, avantaj-dezavantajları, insanlara kullanımda ve karar vermede sağladıkları kolaylıklar, insanın bu teknolojilere uyumunun sağlanmasında yapılabilecekler; işe alımda doğru seçim, başlangıç ve tazeleme eğitimleri, sertifikasyon, tecrübe, performans değerlendirme, güvenlik bilincine dayalı örgüt kültürü gibi unsurlar ele alınmıştır. Çalışmanın sonucunda bu iki unsurun uyumunun etkin bir şekilde sağlanması durumunda havalimanı güvenliğinin en iyi düzeyde gerçekleştirilebileceği değerlendirmesi yapılmıştır. Abstract Airports are the place where first impressions are made about the countries visited. It is also very important to ensure the security of airports, which are of great importance in terms of the prestige of the countries and their impact on visitors. Security procedures implemented in the past have negatively affected speed and comfort which are indispensable elements of the aviation industry has caused a number of security weakness. In order to overcome these negativities, it is necessary to use technology and human elements together. Technology enables these services to be facilitated without compromising security services. However, although technology alone is not sufficient, it makes sense when it is completed with the human element.
... In contrast, CT machines compute 3D volumetric images from estimates of X-ray reflectance and absorption measured at many angles (Mouton and Breckon, 2015). Unlike DV imaging, screeners can interactively manipulate viewing angle at will, which may confer an advantage when detecting hard-to-identify objects, as angles can be selected to reduce ambiguity and avoid object superposition, which has been demonstrated to impair object identification (Bolfing et al., 2008;Schwaninger et al., 2005). Further, consistent with other challenging real-world visual search tasks, baggage screening entails significant visuo-cognitive demands, representing bag images in three dimensions may help to offload some specific demands, such as the need for mental rotation of objects represented in two dimensions (Merks et al., 2018;Muhl-Richardson et al., 2018). ...
Article
Full-text available
Computed Tomography (CT) is increasingly used in screening of cabin baggage in airports. The current study aimed to establish whether screening with CT confers a detection advantage over dual-view (DV) X-ray when resolution is controlled. We also evaluated whether a ‘targetless’ search strategy – in which screeners identify and reject safe items – improved detection relative to target-based methods. In an online study, 104 novice screeners were trained with either CT or DV, and either a targetless or a target-based search strategy. Screeners were then tested in a simulated cabin baggage screening task. CT screeners performed with greater sensitivity than DV screeners. Search strategy did not affect sensitivity, although the target-based strategy resulted in a more liberal criterion. We conclude that CT imaging confers a benefit to screening performance over DV when image resolution is controlled. This is likely due to the ability to rotate the image to resolve occlusions.
Article
Full-text available
At airports, security officers (screeners) inspect X-ray images of passenger baggage in order to prevent threat items (bombs, guns, knives, etc.) from being brought onto an aircraft. Because threat items rarely occur, many airports use a threat-image-projection (TIP) system, which projects pre-recorded X-ray images of threat items onto some of the X-ray baggage images in order to improve the threat detection of screeners. TIP is regulatorily mandated in many countries and is also used to identify officers with insufficient threat-detection performance. However, TIP images sometimes look unrealistic because of artifacts and unrealistic scenarios, which could reduce the efficacy of TIP. Screeners rated a representative sample of TIP images regarding artifacts identified in a pre-study. We also evaluated whether specific image characteristics affect the occurrence rate of artifacts. 24% of the TIP images were rated to display artifacts and 26% to depict unrealistic scenarios, with 34% showing at least one of the two. With two-thirds of the TIP images having been perceived as realistic, we argue that TIP still serves its purpose, but artifacts and unrealistic scenarios should be reduced. Recommendations on how to improve the efficacy of TIP by considering image characteristics are provided.
Chapter
Aviation-security X-ray equipment is used to screen objects, while human screeners re-examine baggage and travelers to detect prohibited objects. Artificial Intelligence technology is applied to increase the accuracy in searching guns and knives, considered the most dangerous in X-ray images at baggage and aviation security screening. Artificial intelligence aviation security X-ray detects objects, finds them rapidly, reducing screeners’ labor, thereby providing better service to passengers. In this regard, neural networks based on machine learning have been continuously updated to develop such advanced equipment. In this study, the neural network O-Net is developed to improve object detection. O-Net is developed based on U-Net. The developed O-Net is tested for various neural networks, providing a wide range of experimental results.
Book
The five-volume set IFIP AICT 630, 631, 632, 633, and 634 constitutes the refereed proceedings of the International IFIP WG 5.7 Conference on Advances in Production Management Systems, APMS 2021, held in Nantes, France, in September 2021.* The 378 papers presented were carefully reviewed and selected from 529 submissions. They discuss artificial intelligence techniques, decision aid and new and renewed paradigms for sustainable and resilient production systems at four-wall factory and value chain levels. The papers are organized in the following topical sections: Part I: artificial intelligence based optimization techniques for demand-driven manufacturing; hybrid approaches for production planning and scheduling; intelligent systems for manufacturing planning and control in the industry 4.0; learning and robust decision support systems for agile manufacturing environments; low-code and model-driven engineering for production system; meta-heuristics and optimization techniques for energy-oriented manufacturing systems; metaheuristics for production systems; modern analytics and new AI-based smart techniques for replenishment and production planning under uncertainty; system identification for manufacturing control applications; and the future of lean thinking and practice Part II: digital transformation of SME manufacturers: the crucial role of standard; digital transformations towards supply chain resiliency; engineering of smart-product-service-systems of the future; lean and Six Sigma in services healthcare; new trends and challenges in reconfigurable, flexible or agile production system; production management in food supply chains; and sustainability in production planning and lot-sizing Part III: autonomous robots in delivery logistics; digital transformation approaches in production management; finance-driven supply chain; gastronomic service system design; modern scheduling and applications in industry 4.0; recent advances in sustainable manufacturing; regular session: green production and circularity concepts; regular session: improvement models and methods for green and innovative systems; regular session: supply chain and routing management; regular session: robotics and human aspects; regular session: classification and data management methods; smart supply chain and production in society 5.0 era; and supply chain risk management under coronavirus Part IV: AI for resilience in global supply chain networks in the context of pandemic disruptions; blockchain in the operations and supply chain management; data-based services as key enablers for smart products, manufacturing and assembly; data-driven methods for supply chain optimization; digital twins based on systems engineering and semantic modeling; digital twins in companies first developments and future challenges; human-centered artificial intelligence in smart manufacturing for the operator 4.0; operations management in engineer-to-order manufacturing; product and asset life cycle management for smart and sustainable manufacturing systems; robotics technologies for control, smart manufacturing and logistics; serious games analytics: improving games and learning support; smart and sustainable production and supply chains; smart methods and techniques for sustainable supply chain management; the new digital lean manufacturing paradigm; and the role of emerging technologies in disaster relief operations: lessons from COVID-19 Part V: data-driven platforms and applications in production and logistics: digital twins and AI for sustainability; regular session: new approaches for routing problem solving; regular session: improvement of design and operation of manufacturing systems; regular session: crossdock and transportation issues; regular session: maintenance improvement and lifecycle management; regular session: additive manufacturing and mass customization; regular session: frameworks and conceptual modelling for systems and services efficiency; regular session: optimization of production and transportation systems; regular session: optimization of supply chain agility and reconfigurability; regular session: advanced modelling approaches; regular session: simulation and optimization of systems performances; regular session: AI-based approaches for quality and performance improvement of production systems; and regular session: risk and performance management of supply chains *The conference was held online.
Article
Full-text available
Aviation security X-ray equipment currently searches objects through primary screening, in which the screener has to re-search a baggage/person to detect the target object from overlapping objects. The advancements of computer vision and deep learning technology can be applied to improve the accuracy of identifying the most dangerous goods, guns and knives, from X-ray images of baggage. Artificial intelligence-based aviation security X-rays can facilitate the high-speed detection of target objects while reducing the overall security search duration and load on the screener. Moreover, the overlapping phenomenon was improved by using raw RGB images from X-rays and simultaneously converting the images into grayscale for input. An O-Net structure was designed through various learning rates and dense/depth-wise experiments as an improvement based on U-Net. Two encoders and two decoders were used to incorporate various types of images in processing and maximize the output performance of the neural network, respectively. In addition, we proposed U-Net segmentation to detect target objects more clearly than the You Only Look Once (YOLO) of Bounding-box (Bbox) type through the concept of a “confidence score”. Consequently, the comparative analysis of basic segmentation models such as Fully Convolutional Networks (FCN), U-Net, and Segmentation-networks (SegNet) based on the major performance indicators of segmentation-pixel accuracy and mean-intersection over union (m-IoU)-revealed that O-Net improved the average pixel accuracy by 5.8%, 2.26%, and 5.01% and the m-IoU was improved by 43.1%, 9.84%, and 23.31%, respectively. Moreover, the accuracy of O-Net was 6.56% higher than that of U-Net, indicating the superiority of the O-Net architecture.
Article
Objective The study addresses the impact of time pressure on human interactions with automated decision support systems (DSSs) and related performance consequences. Background When humans interact with DSSs, this often results in worse performance than could be expected from the automation alone. Previous research has suggested that time pressure might make a difference by leading humans to rely more on a DSS. Method In two laboratory experiments, participants performed a luggage screening task either manually, supported by a highly reliable DSS, or by a low reliable DSS. Time provided for inspecting the X-rays was 4.5 s versus 9 s varied within-subjects as the time pressure manipulation. Participants in the automation conditions were either shown the automation’s advice prior (Experiment 1) or following (Experiment 2) their own inspection, before they made their final decision. Results In Experiment 1, time pressure compromised performance independent of whether the task was performed manually or with automation support. In Experiment 2, the negative impact of time pressure was only found in the manual but not in the two automation conditions. However, neither experiment revealed any positive impact of time pressure on overall performance, and the joint performance of human and automation was mostly worse than the performance of the automation alone. Conclusion Time pressure compromises the quality of decision-making. Providing a DSS can reduce this effect, but only if the automation’s advice follows the assessment of the human. Application The study provides suggestions for the effective implementation of DSSs in addition to supporting concerns that highly reliable DSSs are not used optimally by human operators.
Article
Existing theories of visual search are generally deduced from lab-based studies involving the identification of a target object among similar distractors. The role of the right parietal cortex in visual search is well-established. However, less is known about real-world visual search tasks, such as x-ray screening, which require targets to be disembedded from their background. Research has shown variations in the cognitive abilities required for these tasks and typical lab-based visual search tasks. Thus, the findings of traditional visual search studies do not always transfer into the applied domain. Although brain imaging studies have offered insights into visual search tasks involving disembedding, highlighting an association between the left parietal cortex and disembedding performance, no causal link has yet been established. To this end, we carried out a pilot study (n=34, between-subjects) administering non-invasive brain stimulation over the posterior parietal cortex (PPC) prior to completing a security x-ray screening task. The findings suggested that anodal left PPC tDCS enhanced novice performance in x-ray screening over that of sham stimulation, in line with brain imaging findings. However, the efficacy of tDCS is under question, with a growing number of failed replications. With this in mind, this study aims to re-test our original hypothesis by examining the effects of left-side parietal stimulation on novice x-ray screener performance and comparing them to those of sham stimulation and of stimulation on a control site (right PPC). As such, this within-subjects study comprised three sessions (2mA left PPC, 2mA right PPC, low-intensity sham stimulation left PPC), to investigate effects of anodal tDCS on x-ray screening performance. The pre-registered analysis did not detect any significant differences between left PPC tDCS and sham tDCS or left PPC tDCS and right PPC tDCS on novice performance (d’) in x-ray screening. Further exploratory analyses detected no effects of left PPC tDCS on any other indices of performance in the x-ray security screening task (c, RTs and accuracy), or a disembedding control task (RTs and accuracy). The use of alternative stimulation techniques, with replicable behavioural effects on the parietal lobe (or a multi-technique approach), and well-powered studies with a systematic variation of stimulation parameters, could help to choose between two possible interpretations: that neither left nor right PPC are causally related to either tasks or that tDCS was ineffective. Finally, low-intensity sham stimulation (0.016mA), previously shown to outperform other sham conditions in between-subjects designs, was found to be ineffective for blinding participants in a within-subjects design. Our findings raise concerns for the current lack of optimal control conditions and add to the growing literature highlighting the need for replication in the field.
Technical Report
Full-text available
A firm understanding of how the human visual system recognises and categorises objects is important in order to build a successful cognitive vision system. We have reviewed the relevant literature both on visual object recognition and categorisation (chapter 1). Based on this review and the technical annex of this project we have addressed several topics in a series of psychophysical experiments, focusing on structural aspects of recognition memory, object similarity in the context of categorisation, shape transformations in categorisation, the role of context in recognition and categorisation, and the interplay between object motion and shape for categorisation decisions (chapter 2). Based on our psychophysical results we present our view on recognition and categorisation, proposing an integrative framework that serves as a theoretical basis for a computational recognition system grounded in cognitive research (chapter 3).
Chapter
Full-text available
The relevance of aviation security has increased dramatically in recent years. Airport security technology has evolved remarkably over the last decade, which is especially evident for state-of-the-art X-ray screening systems. However, such systems will be only as effective as the people who operate them. Recognizing all kinds of prohibited items in X-ray images of passenger bags is a challenging object recognition task. In this article we present a method to measure screener detection performance based on signal detection theory. This method is applied to measure training effects resulting from individually adaptive computer based training (CBT). We have found large increases of detection performance and substantial reductions in response time suggesting that CBT is a very effective tool for increasing effectiveness and efficiency in aviation security screening.
Article
Full-text available
Modern systems such as nuclear power plants, the Space Shuttle or the International Space Station are examples of mission critical systems that need to be monitored around the clock. Such systems typically consist of embedded sensors in networked subsystems that can transmit data to central (or remote) monitoring stations. At Qualtech Systems, we are developing a Remote Diagnosis Server (RDS) to implement a remote health monitoring systems based on telemetry data from such systems. RDS can also be used to provide online monitoring of sensor-rich, network capable, legacy systems such as jet engines, building heating-ventilation-air-conditioning systems, and automobiles. The International Space Station utilizes a highly redundant, fault tolerant, software configurable, complex, 1553 bus system that links all major sub-systems. All sensor and monitoring information is communicated using this bus and sent to the ground station via telemetry. It is, therefore, a critical system and any failures in the bus system need to be diagnosed promptly. We have modeled a representative section of the ISS 1553 bus system using publicly accessible information. In this paper, we present our modeling and analysis results, and our Telediagnosis solution for monitoring and diagnosis of the ISS based on Telemetry data.
Article
Full-text available
Human factors have gained much attention recently, and it has become clear that effective selection, evaluation and training of airport security personnel are crucial factors for increasing airport security and efficiency. Since June 2000 scientists from the University of Zurich have investigated human factors in x-ray screening. The research projects were conducted in close collaboration with Zurich State Police, Airport Division and were funded by Zurich Airport. Important insights were revealed for the following topics: (1) reliable measurements of threat detection, (2) screener evaluation and selection, (3) training of screeners, and (4) pre-employment assessment. In AIRPORT 3/2002 an overview of these studies was presented (page 20-21). In AIRPORT 1/2003 the first topic was presented in more detail (page 22–23). In this article topic (2) is discussed, i.e. how scientific methods can be applied to build reliable tests for screener evaluation and selection.
Article
Full-text available
A non-parametric method for evaluating the results of recognition memory experiments and psychophysical detection experiments is presented. The method is based upon an ordinal analysis of recognition performance, which transforms the results of recognition tests into equivalent results for a forced-choice experiment.