How to Fend off Shoulder Surfing Volker Roth Kai Richter

How to Fend off Shoulder Surfing
Volker Roth a Kai Richter b
a OGM
Laboratory LLC
6825 Pine St, Omaha, NE 68106, USA
b Zentrum
f¨ur Graphische Datenverarbeitung e.V.
Fraunhoferstr. 5, 64283 Darmstadt, Germany
Abstract
Magnetic stripe cards are in common use for electronic payments and cash withdrawal. Reported incidents document that criminals easily pickpocket cards or skim them by swiping
them through additional card readers. Personal identification numbers (PINs) are obtained
by shoulder surfing, through the use of mirrors or concealed miniature cameras. Both elements, the PIN and the card, are generally sufficient to give the criminal full access to the
victim’s account. In this paper, we present alternative PIN entry methods to which we refer
as cognitive trapdoor games. These methods make it significantly harder for a criminal to
obtain PINs even if he fully observes the entire input and output of a PIN entry procedure.
We also introduce the idea of probabilistic cognitive trapdoor games, which offer resilience
to shoulder surfing even if the criminal records a PIN entry procedure with a camera. We
studied the security as well as the usability of our methods. The results support the hypothesis that our primary mechanism strikes a balance between security and usability that is
of practical value. In this article, we give a detailed account of our mechanisms and their
evaluation.
Key words: Security engineering, usability and security, secure PIN entry, human
computer interaction, shoulder surfing
1 Introduction
Personal Identification Numbers (“PIN”) are used as a means of authenticating oneself when withdrawing money from Automatic Teller Machines (“ATM”), authorizing Point of Sales (POS) transactions, unlocking our cell phones and Portable Digital Assistants (“PDA”), gaining access to secure areas, or disarming anti-burglar
alarms, to name a few examples. Typically, a user proves himself to a machine by
Email address: [email protected] (Volker Roth).
Preprint submitted to Elsevier Science
May 21, 2006
entering a four digit PIN number using a PIN pad with three by four keys, and an
automatic process verifies whether the entered PIN is correct.
However, anyone who has the PIN pad in his field of view may observe the PIN
number that a prover enters and use that information to impersonate the legitimate
prover. This particular attack is widely known as shoulder surfing. 1
As an added security mechanism against such involuntary PIN disclosure, many
authentication systems require not only something that the legitimate prover knows
but also something that he has, such as a magnetic stripe card with certain information stored on it. However, fraudsters steal or skim valid cards with increasing
sophistication (Weinstock, 1987; Brader, 1998; Wood, 2003; Summers and Toyne,
2003; Colville, 2003) causing significant damage to customers and the banking
industry. The means by which fraudsters obtain the corresponding PIN numbers
also increased. In several recent cases, miniature camera devices were planted at
ATMs in a concealed fashion which radioed video images of PIN entry sequences
to nearby receivers (Wood, 2003; Summers and Toyne, 2003; Colville, 2003).
We investigated whether the method by which PINs are entered can be designed
in a way that is resilient to human shoulder surfers even if all input and output
is in plain sight, perhaps even if all input and output is recorded by a concealed
camera. At the same time, the method should be efficient and easily usable. Our
contribution is a novel design which consciously leverages the fact that certain
cognitive capabilities of humans are very limited, particularly humans’ ability to
store and retain information in their short term memory (Miller, 1956; Anderson,
2000; Vogel and Machizawa, 2004).
We refer to our principal design as an interactive cognitive trapdoor game between
a verifier and a prover where all input and output is in plain sight of an observer,
and authenticating oneself amounts to winning the game. The game is designed so
that winning the game is well within the bounds of human’s cognitive capacity if
the correct PIN is know. If, however, the PIN is not known then winning the game
requires cognitive capacity beyond what is typically found in humans.
A simple example may serve as an illustration of the general idea. Assume that
the prover wishes to enter a single digit using a PIN pad with the typical fixed
layout of keys and digits. Further assume that the verifier has the ability to set
the background color of each individual key to either black or white. The verifier
randomly partitions the set {0, 1, · · · , 9} of possible PIN digits into two equally
sized sets A and B. The digits in set A are displayed on white background and the
digits in set B are displayed on black background. If the prover’s digit is in set A
then she enters white, and black otherwise. After playing the game for a few rounds,
the verifier can uniquely determine the digit by intersecting the sets indicated by
the prover. The observer, on the other hand, does not know the digit and in order
to calculate the set intersection she has to quickly memorize or record at least one
1
see e.g., “Word Watch column—Shoulder Surfing.” The Atlantic Monthly. February, 1992.
set, its color, and the prover’s response in each round. The game is repeated until
all digits are entered.
In the remainder of the article, we elaborate on the design and its security. We describe multiple variations of it some of which are especially suitable for people with
certain handicaps such as blindness. We also present and discuss the results of several user studies we conducted with the goal to assess the security and the usability
of our most prominent variants. The outcome of the studies support the hypothesis that our primary method offers resilience against shoulder surfing while still
being reasonably usable—and thus have considerable practical value where shoulder surfing is a concern. Certain modifications of our design that we describe—the
introduction of ambiguity into provers’ answers—provide limited resilience even
against a single recording by a concealed camera. However, that has been noted
before by Baker (1995) and thus does not constitute a novel result.
2 Background
In this section we summarize background material that we assume as known in the
remainder of our article, namely: a description of our threat model, the psychological foundation of our mechanism design, as well as mathematical tools that we
applied in our usability evaluation. Readers who are familiar with these fields may
safely skip the corresponding sections and continue reading the description of our
mechanism design in §3.
2.1
Threat Model and Terminology
We model entry of a PIN as an interactive game between three parties: a machine
verifier, a human prover, and a human observer. Each game consists of a number of
rounds in which the verifier, in an abstract sense, poses questions and the verifier
inputs her answer. The objective of the prover is to authenticate himself to the
verifier by his PIN; the objective of the verifier is to decide whether the prover
knows the correct PIN. The observer observes all interactions between the prover
and the verifier; his objective is to impersonate the prover in subsequent games with
the same verifier.
The game also involves a setup phase with a trusted dealer who, using a confidential and authentic channel, distributes a token to the prover, and a master secret
to the verifier (alternatively, the verifier may query the dealer online during each
game over an authentic and confidential channel). The token, typically a magnetic
stripe (Count Zero, 1992) or chip card, contains information that uniquely identifies
the prover. It also contains information by which the verifier can verify whether the
prover’s input is correct and matches his identity.
We assume that the observer cannot verify the correctness of a given PIN unless he
also knows the master secret (and we assume he does not). Additionally, we expect
that the verifier keeps a record of how often an prover successively inputs a false
PIN. If the count reaches three, the verifier voids the prover’s authorization until
the prover receives a new PIN from the trusted dealer. Let the observer possess (a
perfect copy of) the prover’s token (obtained by theft or skimming). Most important, we assume that the resources of the observer are computationally and memory
bounded by the cognitive capacity of a human, particularly the short term memory
(“STM”).
Compared to an actual implementation, we make idealized and simplified assumptions. For instance, we assume that the PINs are generated in a uniformly distributed
fashion. Actually, the PIN distribution e.g., of the Eurocheque Card system was
shown to be skewed considerably (M¨oller, 1997; Kuhn, 1997), to a large degree
because recommendations in the applicable standards (ISO, 2002) were not fully
adhered to. Otherwise, our model corresponds closely to what is typically found in
the ATM world.
2.2
Background on Cognitive Psychology
The cognitive capabilities of a human have interesting limitations. Recently, Vogel
and Machizawa (2004) discovered a relationship between neural activity and memory capacity and found neurophysiological evidence that the human visual short
term memory (“STM”) is limited to three to four symbols. In their measurements,
they used a delay of one second between exposure and recall. Few subjects they
tested had the capacity to hold five symbols in their STM. This is even less than the
findings of Miller (1956) who suggested that the capacity of the STM is limited to
7 ± 2 symbols. However, retention of items in the STM, and transfer to long term
memory (“LTM”), appears to be critically dependent on the ability to rehearse the
information in the STM (Perterson and Peterson, 1959). A constant stream of new
information in short succession, as in the case of our mechanism design, is known
to impede rehearsal and thus later recall.
An effect that increases the capacity of the STM is referred to as chunking (Murdock, 1961). Multiple items are grouped together and represented as a single item
that occupies one “slot” in the STM. For instance, an American can probably remember the sequence 149217761941 easily because it can be grouped into three
chunks of four digits. Each of the chunks represents a year in which a historic event
occurred that is significant to Americans, and which can be represented as a single
item. It is not entirely clear to us at this point whether chunking may have an impact on our mechanisms. All 30240 black and white five digit numeric patterns are
equally likely, it appears that the opportunities for frequent chunking are marginal.
In summary, the limitations of humans’ STM are a promising starting point for
devising cognitive trapdoor games although there may be alternative approaches.
2.3
Background on Usability Testing
The concern of usability testing is to determine the probability that a change in user
performance is caused by a particular change of condition (e.g., the improvement
of a user interface) as opposed to random variance. Generally, usability testing is
based on empirical observations that are analyzed by means of statistical methods
(see e.g., Sachs (2002); Box et al. (1978) for an overview).
A basic tool is the Mann-Whitney U-test (Mann and Whitney, 1947), which tests
whether two independent samples are from the same population by comparing their
relative ranks. The individual observations must be comparable in the sense that
one can determine which observation is “greater.” For samples from different populations one would expect that their ranks (i.e., the overlap between samples) are
determined by random chance. The Mann-Whitney U-test tests that on the null hypothesis that the two samples are drawn from the same population. A table of its
critical values for different significance levels is given by Milton (1964). The test
is suitable to analyze ordinally scaled data such as data collected by Likert (1932)
scales. Likert scales require subjects to rate a given statement based on an odd number of ordered alternatives (e.g., agreement or disagreement with the statement on
a five to seven point rating scale). Likert scales are an ingredient of the Software
Usability Scale (“SUS”) described by Brooke (1996).
While the U-test is restricted to paired comparisons, difference hypotheses for more
than one independent sample can be tested with the Kruskal-Wallis H-test (Kruskal
and Wallis, 1952) which is an extension of the method of Mann and Whitney. If
ordinally scaled data is collected in repeated measurements such as “before” and
“after” comparisons then the test of Wilcoxon (1945) for matched pairs must be
used. Similar to other rank sum tests, Wilcoxon’s test is reasonably robust even if
the data is not equally or normally distributed.
The Student’s t-test and the analysis of variance (“ANOVA”) are methods which
analyze the summed squared deviation from the mean of a normal distribution. Put
simply, the Student’s t-test answers the question whether differences in the means
of two samples are due to chance. ANOVA serves as a basis for several well-known
statistical methods such as regression analysis and multivariate methods.
Which test is applied when depends on the design of the study and the type of
data that is acquired during the study. Typically, a combination of tools is used to
analyze measurements for statistically significant properties.
3 Cognitive Trapdoor Games
The general principle we apply is to consecutively display the set of PIN digits to
the verifier as two partitions. The verifier indicates the partition in which the current PIN digit is. After a few rounds, the prover determines the correct PIN digit
by intersecting the indicated partitions. The algorithm may be repeated for as many
digits as the prover wishes to enter. The input and output methods determine the
difficulty of the cognitive task that must be accomplished by the prover and the
observer. In §3.1, we describe two designs of such a task: the immediate choice
variant and the delayed choice variant. Our hypothesis is that in both designs the
task is of limited cognitive complexity if the PIN is known, and of significant cognitive complexity otherwise. Hence, the term cognitive trapdoor game.
We discuss and compare the properties of our designs in §3.2. Both variants achieve
significantly better resilience against shoulder surfers without automatic recording
devices than contemporary PIN entry methods (see §5.1 for experimental evidence).
In §3.4, we describe a modification which additionally provides limited resilience
even if shoulder surfers record all inputs and outputs with a camera, and we analyze
its security.
The designs we present in §3.1 are based on visual perception and tactile input.
However, the principles of cognitive trapdoor games easily extend to other input
and output modalities. In §3.5, we describe alternative designs which are particularly suited for handicapped people.
3.1
PIN Entry Using Key Pads
The immediate response design and the delayed response design for visual output
and haptic input that we present in this section are conceptually similar. Both designs require a display on which a key pad can be displayed, or a key pad for which
some perceptible aspect of each key is controllable by the verifier e.g., the color
of the keys can be changed from black to white and vice versa. Virtually all ATMs
provide a display suitable for our purpose. Additionally, the designs require two
input keys one of which denotes black and the other white. The pound (’#’) and
asterisk (’∗’) keys typically found at the lower left and right edges of an ATM’s
key pad are suitable. Principally, only a software upgrade would be necessary to
implement our designs on such devices. Additionally, our designs can coexist with
the contemporary method of entering one’s PIN by typing its digits into the key
pad. In the immediate response design, each PIN is entered as follows:
(1) The verifier produces the display of a key pad with the familiar fixed layout
of keys where half of the keys are displayed with white digits on black background and the other half with black digits on white background. The distribution of black and white colors must have certain properties, we present an
algorithm to compute suitable distributions in §3.3.
(2) The verifier prompts the prover for input. The prover responds by pressing the
key denoting white (e.g., the pound key) if his PIN digit is shown on white
background, and presses the key that denotes black (e.g., the asterisk key)
otherwise. Assume that S is the set of five digits with the same color than the
one that the prover selected.
input w
input w
input b
input b
1 2 3
1 2 3
1 2 3
1 2 3
4 5 6
4 5 6
4 5 6
4 5 6
7 8 9
7 8 9
7 8 9
7 8 9
0
0
0
0
next digit
or clear
Figure 1. This figure illustrates the immediate response design. Assume that the prover
wishes to enter digit ’3’. The verifier begins by presenting the leftmost color pattern.
Digit ’3’ is displayed on white background, therefore the prover enters white. The verifier changes the color patterns, this time digit ’3’ is displayed on black background. Hence,
the prover enters black. The procedure continues for two more rounds after which the verifier clears the display and calculates digit ’3’ by intersecting the white digits in the first
color pattern with the black digits in the second pattern and so forth. The algorithm for
calculating the color patterns is given in §3.3.
(3) The verifier repeats steps 1 and 2 four times.
(4) The verifier intersects the sets S1 , · · · , S4 selected by the prover. The set intersection contains the candidates for the PIN digit that the prover entered.
Assume that D is the set intersection.
(5) The verifier repeats steps 1 to 4 four times, one time for each of the four digits
D1 , · · · , D4 that constitute the prover’s PIN.
Overall, 16 input/output rounds have to be completed, four rounds per digit and
four repetitions for the four digits of the prover’s PIN. If any of the set intersections
contains either no digit or more than one digit then an error occurred during input.
In that case, the verifier notifies the prover of the error, increases the overall count
of false attempts for the alleged prover, and offers to repeat the entire procedure
unless three false attempts were counted. Otherwise, the verifier verifies that the
digits D1 , · · · , D4 constitute the correct PIN. Figure 1 illustrates steps 1 to 3.
Steps 1 and 2 must be repeated four times because four is the smallest number of
repetitions which guarantees that the set intersection always yields a unique solution to finding one digit out of ten possible digits. More generally, if the verifier must identify any one of N digits or characters then the prover must respond
log2 N times with a binary answer. The principal observation here is that with
each binary decision, the set of candidates can be halved.
The immediate response design owes its name to the fact that subsequent to each
output of the verifier (step 1), the prover has to input his response (step 2). In the
delayed response design, the verifier repeats step 1, the output, four times with a
delay of 0.5 seconds between consecutive outputs. Subsequent to the fourth output,
which is shown for 0.5 seconds as well, the verifier clears the display and prompts
the prover to enter the colors that his or her PIN digit had in the previous four outputs. The prover then enters the colors consecutively. Hence, the prover’s responses
are delayed until after all output that is required to determine a single PIN digit has
completed, which gives the delayed response design its name. The design rationale
0.5 sec
0.5 sec
1 2 3
1 2 3
1 2 3
4 5 6
4 5 6
4 5 6
7 8 9
7 8 9
7 8 9
0
0
0
0.5 sec
0.5 sec
clear display
prover enters
4 5 6
w,b,w,b
7 8 9
1 2 3
0
Figure 2. This figure illustrates the delayed response design. The input is the same as in
Figure 1. This time, however, the verifier changes the color patterns every 0.5 seconds,
rather than after each response of the prover. Only after all four patterns were displayed the
prover is prompted to enter the color sequence of the digit he or she wishes to enter. The
algorithm for calculating the color patterns is given in §3.3.
for the delayed response design is to limit the time for which the output is exposed
to the prover and also to any observer. For comparison, in the immediate response
design the output is displayed until the prover inputs his response. Hence, a slow
prover is more vulnerable to shoulder surfing than a fast prover. Figure 2 illustrates
four input rounds (entering one digit) of the delayed response design.
Assuming that Oy denotes the verifier’s output in round y, and Iyx denotes the
prover’s input of the color digit x had in round y, and further assuming that the
prover’s PIN is 1234, the input/output sequences of the two designs can be summarized as given below:
first digit
second digit
immediate response: O1 , I11 , O2 , I21 , O3 , I31 , O4 , I41 O5 , I52 , O6 , I62 , O7 , I72 , O8 , I82 · · ·
delayed response: O1 , O2 , O3 , O4 , I11 , I21 , I31 , I41 O5 , O6 , O7 , O8 , I52 , I62 , I72 , I82 · · ·
Obviously, more options exist to arrange inputs and output. The prover may respond
immediately to each output of the verifier as in the immediate response design, but
the verifier may run the first round for all PIN digits 1 to 4, followed by the second
round for all digits and so forth until all four rounds were completed for all four
PIN digits. We refer to that design as interleaved response, its round structure can
be illustrated in the same fashion as above:
first round, all digits
second round, all digits
interleaved response: O1 , I11 , O2 , I22 , O3 , I33 , O4 , I44 O5 , I51 , O6 , I62 , O7 , I73 , O8 , I84 · · ·
We analyze and discuss the properties, the psychologic rationale, as well as the
advantages and disadvantages of all these designs in §3.2 below. Furthermore, we
were interested how our designs would perform in practice and implemented several versions of them for the purpose of conducting security and usability studies.
The results of these studies are described in §5.1 and §5.2.
3.2
Comparison and Analysis
For all designs of cognitive trapdoor games we have presented above, it holds that,
if the observer can perfectly record or memorize all input and output then he or she
will be able to deduce the prover’s PIN in the same fashion as the verifier does it.
We describe modifications ouf our designs that provide limited resilience against
automatic recordings in §3.4. In this section, we assume that the observer has no
automatic recording devices such as concealed cameras, although he may use e.g.,
manual tools such as pencil and paper. This means that the observer’s resources are
constraint by humans’ cognitive capabilities as we summarized them in §2.2.
In the immediate response design, the prover must retrieve his or her PIN from LTM
and must decide which color his or her current PIN digit has before responding. In
the delayed response design, the prover must remember a sequence of four colors
in its STM for a few seconds. In both cases, the prover can focus his or her gaze
on the fixed position of the current PIN digit on the PIN pad, which eliminates the
need to maintain awareness of the digit itself. The immediate response design is
well within the cognitive capacity of a healthy human, the delayed response design
is within practical bounds.
In the immediate response design, the observer must memorize at least six symbols
in each round: five symbols of equal color (the new information presented in each
round) and the response of the prover. Alternatively, the observer must perceive
and manually record that information at the same speed at which it is presented.
If, on the other hand, the observer does not memorize or record information but attempts to derive the PIN digits directly then he or she must additionally remember
his or her current hypothesis what the set of probable PIN digits is, and must mentally intersect the hypothesis with the set indicated by the prover. In the delayed
response design, the observer has no means to prune the set of possible PIN digits
before the prover inputs his or her answers. This amounts to memorizing information worth at least 24 symbols which exceeds the capacity of humans’ STM (Vogel
and Machizawa, 2004; Miller, 1956) by a safe margin. Also note that the estimate
above is calculated for one PIN digit, the observer has to accomplish his attack four
times in rapid succession, once for each PIN digit.
Additionally, the six symbols that must be memorized or otherwise processed per
round are not available for rehearsal, the process by which information in the STM
is encoded in LTM (Perterson and Peterson, 1959). The symbols are rapidly replaced by new information that must be memorized or processed as well. A continuous stream of new information that must be processed, as generated by our
designs, is known to impede rehearsal (Anderson, 2000). Therefore, we are reasonably confident that the greater resources of the LTM cannot be brought to bear on
the observer’s task, particularly not in the delays response design where the exposure time for each round is limited to 0.5 seconds.
In the interleaved response design, the observer would have to memorize information generated in five rounds before any pruning can take place, which amounts to
memorizing 30 symbols. Before the first PIN digit can be derived unambiguously,
information from 13 rounds or 91 symbols must be memorized. On the other hand,
the prover cannot focus his gaze on one PIN digit for four consecutive rounds, but
has to cycle through his or her PIN digits four times.
In order to verify our hypothesis that the asymmetry of the cognitive overhead of the
prover’s and the observer’s task fulfills our requirements for a cognitive trapdoor
game, we conducted two studies. First, we measured subjects’ ability to record and
guess PIN digits in recorded PIN entry procedures. Second, we studied subject’s
ability to enter PINs using our designs. Additionally, we measured how well subjects accepted our designs. The results of our studies are presented in §5.1 and §5.2
respectively.
3.3
Randomizing the Color Patterns
For our PIN entry methods to be secure, the color patterns must be random or at
least pseudo random in a fashion that cannot be predicted by observers. Additionally, in each round the number of white digits should be equal to the number of
black digits. This can be justified as follows: let p0 be the probability that the digit
is white and let p1 = 1 − p0 the probability that it is black. The average amount of
information that can be gathered in each round (in other words, the observer’s un
certainty about the entered PIN digit) equals the entropy H(p0 , p1 ) = − pi log2 pi
per round. It is well known that the entropy is maximal for an equal distribution (Shannon, 1948), which is the case if the number of colors is two and both
colors are assigned the same number of digits.
The question then is how the display of color patterns can be computed so that the
aforementioned two criteria are met. The answer that is perhaps the simplest to give
and prove correct is based on symmetrical balanced trees as shown in Figure 3. The
height of a balanced tree (the maximum number of edges from its root to a leaf)
is log2 n for a tree with n leaf nodes. Each leaf node is randomly associated
with a unique digit. For instance, a tree that represents the digits [0, 1, · · · , 9] has
height log2 10 = 4. All nodes, internal as well as leaf nodes, are assigned color
labels such that a left sibling is black and a right sibling is white. Our algorithm for
coloring digits can now be formulated simply as follows: in round r, each digit is
assigned the color of its parent node at level r of the tree. Figure 3 illustrates the
algorithm for rounds one and two. The colors of digits which do not have a leaf
node at the last level are chosen randomly such that the number of black and white
digits is equal.
The symmetry of the tree ensures that the number of white and black digits is equal
even if the number of digits is not a power of two. We omit a proof for its simplicity.
The basic idea of the proof is to show that if a subtree with root v has not an equal
number of black and white leaf nodes then its symmetrical node v has as many
black leaf nodes as v has white leaf nodes and vice versa. The security follows from
9
5
4
3
0
1
9
8
2
5
4
7
7
1
3
6
0
8
2
6
Figure 3. This figure illustrates how the colors of digits are determined in each round. The
algorithm is based on a balanced tree which has a depth that is logarithmic in the number of
digits e.g., log2 10 = 4 for ten digits. As rounds progress from round one to round four,
the digits (the leaves of the tree) inherit the color of their parent nodes at the corresponding
tree level. The upper tree shows the color distribution at level one, and the lower tree shows
the color distribution at level two. Nodes at the respective levels are circled for clarity.
the condition that leaf nodes are randomly associated with a digit. Hence, each digit
can be represented by a unique path in the tree yet the association between paths
and digits (the sequence of colors that must be entered by the prover) is random.
A less abstract algorithm, which we present without proof, can be described as
follows: each digit is randomly associated with a card in a deck of cards. In each
round, the verifier randomly colors the digits represented by cards in the upper half
of the deck black and otherwise white, or vice versa. After each round, the verifier
performs a perfect riffle in-shuffle. 2 The resulting patterns also fulfill the criteria
we required above.
3.4
Resilience Against Camera Recording
Criminals increasingly employ concealed miniature cameras to observe and record
the PINs entered by victims (Wood, 2003; Summers and Toyne, 2003; Brader,
1998). The designs we have presented above are effective against human shoulder
surfers, as we found in our evaluation (see 5.1 for empirical evidence). However,
if an observer records all input and output then he can compute the prover’s PIN
number in the same fashion the verifier computes it—by intersecting the sets of
digits that the prover indicated.
2
Eric W. Weisstein. “Riffle Shuffle.” From MathWorld–A Wolfram Web Resource.
http://mathworld.wolfram.com/RiffleShuffle.html
9
4
1
5
7
3
0
2
6
8
Figure 4. This figure shows a tree with a reduced number of levels which yields a reduced
number of rounds for entering PIN digits. The reduction results in some degree of uncertainty about the verifier’s PIN even if the sequence of colors that the verifier enters is
perfectly recorded by an observer. In the given example, ten digits are identified by four
possible color sequences which means that each input sequence leaves 2.5 candidate digits
on average.
It turns out, though, that a simple modification of our design can provide limited
resilience even against automatic recordings of all input and output (Baker (1995)
describes an earlier approach which benefits from the same effect). The key is to
input less information than is necessary to uniquely identify the entered PIN. In
other words, subsequent to the prover’s input the intersection of sets yields not only
one candidate PIN but multiple ones which are all equally likely. We refer to the set
of candidate PINs as the shadow set. The verifier can efficiently verify candidates
in the shadow set and authorize transactions if one of the candidates is the correct
PIN. The observer, on the other hands, lacks knowledge of the master secret that is
part of the verification process and thus cannot do better than to try PIN numbers
from the shadow set at random. Given a limited number of allowed false entries,
typically three, this yields a certain success probability which depends on the size
of the shadow set. At the same time, the probability of success when guessing
PIN numbers blindly (without knowing a shadow set) increases. Therefore, the size
of the shadow set becomes a tradeoff between false acceptance rate, efficiency of
verification, and the security of the design against camera recording.
Before we analyze and quantify that tradeoff in detail below, we illustrate the modified design by an example. Consider the balanced tree in Figure 4. We reduced the
depth of the tree from four to two levels. Consequently, each leaf node is associated with multiple digits. This reduces the number of rounds necessary to enter a
PIN digit and, as a positive side effect, improves the overall usability of our design.
Although, only four different color input sequences are available to identify ten
different digits. This means that on average 2.5 candidate digits are identified per
two rounds of input, with a minimum of two and a maximum of three candidates
depending on the digit that the verifier enters. Assuming that the PIN consists of
four digits, the size of the shadow set is therefore between 24 = 16 and 34 = 81
with an average of 2.54 ≈ 39.
In more mathematical terms, given a shadow set size of s shadows the prover has
to perform t = log2 (N/s) = log2 (N ) − log2 (s) rounds of input where N is
the overall number of PIN numbers, which is 10, 000 in a typical setting with four
decimal digits per PIN. Assume that the observer steals the prover’s magnetic stripe
card and attempts to authenticate himself or herself to the verifier by random input.
success
A0 ∩ B0c
A1 ∩ B1c
Ac0
Ac1
∩
B0c
Ac0 ∩ B0
∩
···
···
···
B1c
Ac1 ∩ B1
Figure 5. The decision tree to compute the probability of successful impersonation of an
oracle by an adversary.
Given n attempts, his or her probability D of success would be:
D=
n
1
(
k=1
2t
· (1 −
1 k−1
) )
2t
(1)
Formula (1) therefore provides a lower bound of the probability with which the
observer succeeds to impersonate the prover. We derive additional bounds below.
Assume the observer has one camera recording and derives the shadow set from
it. He or she then attempts to authenticate him-/herself by entering PINs randomly
chosen from the shadow set. The observer succeeds if he or she guesses:
A: the correct PIN from the shadow set
B: a wrong PIN but the correct PIN is a shadow of the wrong PIN
Event B is an unfortunate side effect of the shadow sets. Note that the observer cannot guess a correct and a wrong PIN simultaneously and therefore in all attempts
A ∩ B is the empty event φ the probability of which is zero. Let Ac be the complementary event of A and therefore Pr[A] = 1 − Pr[Ac ]. The observer’s probability
AB of success in the k’th attempt can now be computed as the probability of the
decision tree shown in Figure 5. The events at each node of the tree are mutually
independent conditional to their parent node and thus the observer’s probability to
succeed in n or fewer attempts (counted from k = 0, · · · , n − 1) is the sum of the
probabilities of the leaves in the tree, or more precisely:
AB =
n−1
k−1
k=0
i=0
((Pr[Ak ∩ Bkc ] + Pr[Ack ∩ Bk ]) ·
Pr[Aci ∩ Bic ])
(2)
Formula (2) provides another lower bound on the probability that the observer succeeds to impersonate the prover. The probabilities of individual events can be calculated based on the observation that, conditional to choosing a PIN xk from the
shadow set, Ak and Bk are independent experiments. Therefore it holds that:
Pr[Ak ∩ Bkc ] = Pr[Ak ] · Pr[Bkc ]
1
Pr[Ak ] =
s−k
Pr[Ack ∩ Bk ] = Pr[Ack ] · Pr[Bk ]
1
Pr[Bk ] =
N −s+1
Unfortunately, the introduction of shadows also increases the probability that the
observer succeeds to impersonate the prover by randomly guessing a PIN without
1
0.14
0.25
0.12
AB(x)
CB(x)
D(x)
0.1
0.08
AB(x)
CB(x)
D(x)
0.0625
0.015625
0.06
0.00390625
0.04
0.000976562
0.02
0.000244141
0
20
40
60
80
Shadows
100
120
4
16
64 256 1024 4096
Shadows
Figure 6. Both displays show the same graphs. The left display is plotted in linear scale,
the right display is plotted in logarithmic scale. Function AB(x) is the probability to guess
a PIN with a shadow set of size x with 10,000 PIN numbers and three attempts (see Formula (2)). Function CB(x) is the same for the probability to guess correctly without knowing the shadow set (see Formula (3)). Function D(x) gives the probability to guess correctly
by entering random sequences in our design with reduced rounds (see Formula (1)).
knowing a shadow set. Let Ck be the event that the adversary guesses the correct
PIN from the entire set of PINs in the k’th attempt. The probability of event Ck is:
Pr[Ck ] =
1
N −k
Even if the observer guesses a wrong PIN, the correct PIN may still be a shadow
of the wrong PIN. By similar considerations as summarized above, we can derive
the success probability CB by substituting Ck for Ak in formula (2). This yields our
third and final lower bound:
CB =
n−1
k−1
k=0
i=0
((Pr[Ck ∩ Bkc ] + Pr[Ckc ∩ Bk ]) ·
Pr[Cic ∩ Bic ])
(3)
In Figure 6 we show graphs of the lower bounds we derived (formulas (1), (2)
and (3)) for different numbers of shadows and a PIN space of size 10, 000. Note
that the right half of the figure displays plots in logarithmic scale whereas the left
part shows an excerpt plotted in linear scale. The size of the shadow set is plotted
on the abscissa, probabilities are plotted on the ordinate. Where the size of the
shadow set is approximately 100, the graph of (1) breaks even with the graph of (2)
(approaching from above) at a probability of approximately 0.03.
We conclude that, unless better attacks become known, a shadow set of size 100
is the maximal size that is reasonable and yields approximately a 3% chance that
an observer impersonates a prover with or without knowledge of a shadow set. The
values of (3) remain significantly smaller than the values of (1) and (2) until the size
of the shadow set approaches the size of the PIN space, at which point the graphs
of (2) and (3) merge and approach 1.
At the same time, the overall number of rounds for which the prover has to play the
cognitive trapdoor game is theoretically about halved (log2 (10, 000) = log2 (100) +
log2 (100)), which considerably improves the usability of our PIN entry method. In
practice, our choice of the number of rounds is somewhat limited by the fact that
the number of decimal digits is not a power of two. Due to the probabilistic nature
of this recording resilience modification we refer to such designs as probabilistic
cognitive trapdoor games.
One caveat remains, though. In a typical scenario, the verifier resets its counter of
false attempts once a PIN is entered correctly. Hence, the observer may probe one
or two PINs taken from the shadow set. If these attempts fail then he waits until the
prover again (correctly) entered his genuine PIN. At that point, the verifier resets his
false attempts counter and the observer can probe one or two more PINs from the
shadow set. This strategy may be continued until the observer identified the genuine
PIN. In order to avoid the attack, the verifier must display the recorded number
of false attempts before the game, so that the prover is alerted. A consequential
denial of service condition due to intentional entry of false PINs can be avoided
by amending the identifying information stored on the prover’s token with a salt.
Hence, the token cannot be forged from obvious identifying information (such as
account numbers printed also on balance sheets) but must be stolen first (in which
case invalidation of access is in the best interest of the prover).
3.5
Alternative I/O Modalities
In previous sections, we laid out in detail how the PIN pad metaphor can be applied
to design the PIN entry procedure in a fashion that fends off shoulder surfing. The
concepts we applied are not limited to that particular metaphor nor are they limited
e.g., to output that must be perceived visually.
Consider an output device which consists of a board with eight palatable pins arranged in two arcs so that the small, ring, middle, and index finger of each hand
can be conveniently placed on top of the pins, and the thumbs come to rest on two
keys which are used for input. Assume that the verifier can raise or lower the pins
in a palatable fashion. The device can loosely be compared to a simplified Braille
display. 3 The prover’s PIN consists of a sequence of five fingers. For ease of description, we number the fingers of both hands excluding the thumbs from zero
to seven. Assume the prover’s PIN sequence consists of the following fingers: left
middle, left ring, right index, right middle, left index. Then we can represent the
PIN as the five digit base eight number 214538 . Each digit is entered as follows:
(1) The verifier raises four pins and lowers all others.
(2) If the pin that corresponds to the current PIN digit is raised then the prover
presses the key under his or her left thumb, and the key under his or her right
3
Braille—-a tactile reading and writing system for the blind based on dots raised above
the surface, named after its inventor Louis Braille, 1809–†1852
from raised to lowered
t = 1: left
from lowered to raised
t = 2: right
t = 3: left
Figure 7. This figure illustrates the input and output mechanisms based on palatable pins
which can be raised (black) or lowered (white) under the control of the verifier. Only the
thumbs are exceptions; thumbs denote keys which must be pressed to indicate whether a
particular pin is raised (left thumb, black) or lowered (right thumb, white). The prover’s
PIN number is represented by a sequence of fingers. In each round, the prover presses the
thumb key which denotes the state (raised or lowered) of the current PIN finger e.g., the
middle finger. In the example above, the pin under the middle finger is first raised, then
lowered, then raised again, hence the prover would would press the keys under his or her
thumbs in the following sequence: left, right, left.
thumb otherwise. Assume that S is the set of digits represented by the pins in
the same state as indicated by the prover.
(3) Steps 1 and 2 are repeated three times.
(4) The verifier calculates the entered digit by intersecting the sets S1 , S2 , S3 .
It is easy to see that we can devise variants of that algorithm analogous to the
variants of the visual designs we described in §3.1. However, the tactile design has
an advantage over the visual design: output is implicitly hidden by the fingertips
of the prover which means that a human observer learns no information about the
PIN that is entered. We must be aware, though, that attack and defense is an arms
race: it is not unconceivable that sophisticated observers eventually develop other
concealed measurement devices which allow them to capture the state of the pins
e.g., based on microphones.
The tactile design is particularly suited for blind provers who are unable to notice
active shoulder surfing. Although other handicapped provers may profit from such a
scheme as well e.g., people in wheelchairs who have difficulty to effectively shield
their input from the view of observers.
4 Related Work
The problem of how PIN numbers can be entered in the face of shoulder surfing
has inspired numerous related work. A common approach of which several variants were proposed is based on a key pad with randomized layout of keys (Hirsch,
1982, 1984; Cairns, 1990; Thrower, 1989; Rehm, 1985; Hoover, 2001; Collins,
1990; McIntyre et al., 2003; Baker, 1995). The prover must locate and press the
keys on the key pad that are labeled with his or her PIN digits. Of course, that
provides added security only if the observer cannot observe the labels on the keys
that the prover presses. It appears that the cognitive task of the prover is even more
difficult than that of the observer since the prover has to find the appropriate keys
whereas the observer may focus his or her attention simply on those keys that the
prover finally presses. These mechanisms bear little if any resemblance to our designs. Although, Baker (1995) describes a password entry method whereby the
verifier randomly arranges alphanumerical characters in a grid. Provers enter the
characters of their password by selecting the row or column in which the password
character is. After each selection, the grid is randomized again. In his description
of the mechanism, Baker already noted that uncertainty about the entered character
provides some resilience against camera recording.
A second prominent category of mechanisms requires that the prover mentally calculates and enters the results of an arithmetic function which takes the secret PIN
and a verifier supplied challenge as its input. For instance, the function could be a
per digit multiplication modulo ten (Wilfong, 1998, 1999) or the calculation of vector products between two vectors one of which is a challenge of the verifier and the
other one resembles the secret PIN of the prover (Hopper and Blum, 2000, 2001;
Li and Teng, 1999; Matsumoto, 1996). Since there is a certain probability that an
observer guesses the result of the vector product, multiple rounds are executed to
diminish the chances that an observer successfully impersonates the verifier. These
mechanisms are particularly interesting from a theoretical point of view since the
security assertions that can be made about them (e.g., that it is hard for the observer
to calculate the secret of the prover even if multiple sessions are recorded) are
well founded in mathematics and theoretical computer science. In the case of Hopper and Blum (2001) the authors concluded that the mechanisms are prohibitively
onerous in practice, as would be the case of the mechanisms described by Li and
Teng (1999) which require that provers memorize and operate on three keys with
different functions each of which has 20–40 bits worth of information.
However, Matsumoto (1996) developed ground breaking, and in our view excellent, designs to cope with the difficulties of provers to perform the necessary calculations. He devised precomputation of the results of the vector products for several challenges, arranged in columns which are indexed by representatives of the
prover’s secret vectors (which yields a tabular display of numbers). The prover’s
task is thereby reduced to a lookup of the answer in the cross section of the line
that resembles the current challenge and the columns that represent the secrets.
Additional designs of his are based on a map of train stations or charts describing
Janken games (better known as scissors, paper, stone). Although the underlying
mechanisms in our and Matsumoto’s designs are different their graphical presentations exploit similar traits. An advantage that our designs have is that the input and
output is somewhat simpler and more strongly exploits humans’ cognitive limitations to the prover’s advantage. As part of future work, we would like to study the
usability of Matsumoto’s designs and to compare the findings with the usability of
our designs.
Another common approach is to apply alphanumerical character association and
substitution problems (Johnson and Weber, 1997; A. James Smith, 2001; Patarin
and Ugon, 1998; Anvekar, 2003; Matsumoto and Imai, 1991; Swi, 2004). For instance, the prover must substitute characters in a challenge (that the verifier provides) with associated characters in an answer alphabet (Matsumoto and Imai,
1991). The difficulty for the observer is reconstructing the mapping between the
challenge alphabet and the answer alphabet. In other cases e.g., described by Anvekar (2003), the association between PIN digits and a unique code is displayed
and the prover must enter the code in place of the PIN digit.
The method described by Collins (1990) appears to be a multi-dimensional variant
and application of the Playfair cipher invented by Sir Charles Wheatstone in the
19th century. The basis of the method is a multidimensional matrix of random elements which must be secretly shared by the prover and the verifier. The verifier
challenges the prover with two elements not found in the same row or column of
the matrix, and the prover answers by entering those elements that complete the
rectangle or parallel-piped whose opposite corners where defined by the challenge.
In the mechanism described by Johnson and Weber (1997), the prover must substitute values of an environment variable into variable expressions in his or her secret
password, and the prover’s time to enter his or her secret may be limited.
A more diverse category of related designs is based on interactions with random
matrices of alphanumeric characters or geometric arrangements of elements. For
instance, in the case of A. James Smith (2003) the secret consists of a sequence of
characters associated with a pattern of positions in a master matrix. The remainder of the matrix is filled with random characters. The verifier presents various
matrices to the prover one of which is the correct matrix and the others being decoys. The prover selects the correct matrix and directly recreates the pattern in it
to authenticate himself or herself. In the case of Cottrell (1995), the secret consists of a geometric arrangement of elements which may consist of e.g., colored alphanumeric characters. In order to authenticate himself or herself, the prover must
recreate the correct combination of geometric arrangement of elements. An earlier design which includes elements from Cottrell (1995) is described by Martino
et al. (1994). Said mechanism resembles a jigsaw puzzle. By operating two or more
controls, the prover manipulates several elements of the puzzle at once until a secret subset of elements is in predefined positions. The mechanism described by
Narayanaswami (2004) uses a combination of images and positions as the secret.
The verifier flashes images at distinct positions of a touch-sensitive display. The
prover authenticates himself or herself by tapping on the locations where images
flashed of which his or her password is comprised. Finally, Romanoff disclosed
a mechanism by which the prover is challenged with a randomized grid of digits
where each digit appears multiple times. The prover’s secret consists of a sequence
of positions in the grid, he or she authenticates him- or herself by entering the
digits which are located at these positions. Multiple occurrence of the same digit
provides some resilience against observers and perhaps camera recordings, as in
the case of Baker (1995).
A detailed comparison of our designs with the aforementioned related work is be-
yond the scope of this paper. Although, it is probably fair to conclude that despite
occasional superficial similarities, our approach has significant unique traits to it.
It is difficult to judge the tradeoff between security and usability that said related
work can achieve since few of the authors provided a detailed study of their mechanisms. As future work, we consider filling that gap by conducting comparative
usability studies of our designs and those proposed in related work.
5 Security and Usability Study
We conducted three studies with the objective to assess the security and usability of our immediate response design (“IOC”) versus the delayed response design
(“DOC”) versus the regular PIN entry method (“REG”). The first study put subjects into the role of the shoulder surfer, the second study put them into the role
of the prover. We presented these studies in (Roth et al., 2004). For reference, we
give a brief summary of the results below. Based on the outcome of these studies,
we refined the user interface design of our implementations and conducted a third
study that focused on the usability and user acceptance of our PIN entry methods.
We describe the materials and methods we used (see §5.2) and the outcome of the
third study (see §5.3). In §5.4, we interpret the outcomes of our studies.
5.1
Summary of Earlier Studies
We implemented all three methods of PIN entry in software and deployed it on
a touch screen kiosk system. The software required four rounds of input per PIN
digit, and it logged all user input with a time stamp for subsequent analysis.
For each method, we filmed ten entry procedures of randomly chosen PINs with
a digital camera. The field of view was chosen so that the entire PIN pad visible
as well as the fingers of the first author who entered the PINs. Care was taken
not to unnecessarily obstruct the display during PIN entry. With this approach, we
intended to provide optimal conditions for an attack. Additionally, we produced
three separate example films for the purpose of explaining all input methods to the
subjects with whom we conducted the study.
We recruited 8 students of the local university as subjects for our first study. We
first briefed the subjects with the example films and a written explanation of the
principles of the methods and informed them that their task would be to determine
the PINs being entered. Subsequent to the briefings, the three films were projected
to a screen in front of the group. Breaks were offered after each PIN entry sequence
in order to allow the subjects to write down their guesses and for reflection and
discussion of their strategies. In order to prevent fatigue, we also offered breaks
between the films which totaled a length of approximately 10 minutes.
Correct digits
4
3 2
1
REG
100
0 0
0
IOC
0
0 5
8.75
DOC
0
0 5
7.5
Figure 8. The guessing rate in percent of total number of digits.
All eight participants were able to complete the study and there was no visual or
understanding problem in following the contents of the films. No participant was
able to guess even one of the PIN numbers entered with one of our methods while
all participants correctly guessed all PIN numbers in the REG condition. However,
some participants succeeded in guessing one or two digits of some of the PIN numbers in the IOC and DOC condition (see Table 8 for a summary of results).
The isolated successes appear to be a result of the strategies employed by the participants. In four cases, the subject focused on a randomly chosen digit and compared
the input to the pattern of that digit. Another strategy of subjects was to capture
the distribution of black and white buttons as a pattern that they sketched on paper. Some participants used prepared stencils as an aide to mark black and white
buttons. However, no strategy was particularly successful.
For our second (usability) study, we recruited 34 participants with academic education aged between 20 and 30 years. We chose a demographically homogenous
group of participants in order to limit the required number of subjects, and to maximize the impact of the condition factor on variance. Each participant was randomly
assigned one input method of which he had to complete 10 input cycles. All participants had to perform their input on the same kiosk system that we used in our first
study. As dependent variables, we measured user condition (pre- and post test), user
acceptance, time used for entry, and error rate. User condition was assessed using
the short scale of the BMS (Plath and Richter, 1984) which indicates the work load
in the sub-scales physiological fatigue, concentration, motivation, and emotional
state. The usability was measured with a subset of questions taken from the SUS
questionnaire (Brooke, 1996).
In summary, we found that subjects learned the new methods in three to four trials.
In the last three trials we found no significant difference in the error rate of all three
conditions. The new methods were rated significantly less usable (which in itself is
not a surprise) but were perceived to be significantly more secure than the regular
method. The user acceptance was high but failed to reach significance.
5.2
Usability Study
The results of our first two studies encouraged us to conduct a third study, for
which we improved the user interface of all methods based on user feedback that
Figure 9. The improved user interface.
we had received (figure 9 shows the improved system). Most notably, we reduced
the number of rounds per PIN digit from four rounds to three rounds. We added
a progress display that indicates the current PIN digit position (the green dot in
figure 9) and the number of rounds that are completed for the current digit (in
figure 9, two out of three rounds are completed). By pressing the “delete” button,
users can step back and correct input. We also introduced an intermediate state in
between changing the color patterns by briefly coloring the silhouette of the key pad
layout grey. The intermediate state facilitates detection of changing color patterns
for digits that retain their color in subsequent rounds. Lastly, we changed the input
device from touch screen to an external key pad which more closely resembles the
predominant input device used in ATMs. More precisely, our new test environment
consisted of a 12” Apple G4 iBook equipped with an external Trustmaster USB key
pad, the layout of which we modified to that of a typical local ATM.
We recruited nine female and 13 male subjects aged 22 years to 61 years for our
third study. All subjects were briefed about the purpose of the study and the functioning of all three input methods (REG, IOC, and DOC). We asked each subject to
choose a four digit PIN number he or she could remember easily as the PIN to be
used in subsequent trials (most subjects chose month and year of their birth date).
For each method, the subject had to enter his or her PIN six consecutive times. Our
software randomized the order in which the REG, IOC, and DOC methods were
tested. Subsequent to completion of each method, subjects had to fill out an electronic version of the SUS questionnaire (Brooke, 1996). The SUS questionnaire
generally consists of ten questions that must be rated on a five point Likert (1932)
scale. We adapted (and translated into German) eight relevant questions (see table 1), and increased the Likert scale to seven points in order to produce greater
Question (and translation)
REG∗IOC
REG∗DOC
IOC∗DOC
t = 1.4
t = 3.1
t = 1.8
p = .17
p < .01
p = .08
t = 3.5
t = 5.2
t = 1.7
p < .01
p < .01
p = .09
t = 2.1
t = 5.4
t = 3.4
p = .04
p < .01
p < .01
Ich hatte das Gef¨uhl, bei der Bedienung die
Kontrolle u¨ ber das System zu haben. (I had
control over the system at all times.)
t = 1.1
t = 4.3
t = 3.4
p = .3
p < .01
p < .01
Ich finde das System war umst¨andlich zu
bedienen. (I think, the system was complicated to use.)
t = 2.5
t = 6.3
t=4
p = .02
p < .01
p < .01
Mir hat die Darstellung des Systems sehr
gut gefallen. (I like the design of the system.)
t = 0.3
t = 2.7
t = 2.5
p = .8
p < .01
p = .01
Ich hatte das Gef¨uhl, dass das System viel
zu schnell ablief. (The system was too fast
for me.)
t = 0.4
t = 5.5
t = 5.3
p = .7
p < .01
p < .01
Die Benutzung hat mir Spass gemacht. (I
had fun using the system.)
t = −1.5
t = −0.6
t = 0.9
p = .1
p = .5
p = .4
Ich w¨urde das System gerne h¨aufiger verwenden. (I would like to use this system
more frequently.)
Ich finde das System unn¨otig komplex. (I
think, the system is unnecessarily complex.)
Ich finde das System war leicht zu bedienen. (I think the system was easy to use.)
Table 1
Results of pairwise comparison of SUS ratings using a two-sided t-test (DF=59). Significant
differences between conditions are typeset in boldface.
variance. That yields a summed SUS score from 0 to 8 · 7 = 56 with 56 being the
best result. In order to measure user attitude, we asked three additional questions in
conjunction with the SUS questionnaire (see table 2).
5.3
Results
Error rate False input (e.g., pressing the “white” button when the current digit
is not white) and pressing the “delete” button one or multiple times in direct succession was counted as one error. All conditions showed a slight learning effect.
The error probability varied significantly depending on condition (Kruskal-Wallis:
χ2 (2) = 338.48, p < .01). Pairwise comparison revealed a significantly higher
error rate for the DOC condition (¯
e = 0.2) while there was no significant difference between the REG (¯
e = 0.025) and IOC condition (¯
e = 0.023; Wilcoxon:
REG∗IOC: Z = 0.22, p = .83; REG∗DOC: Z = −9.52, p < 0.01; DOC∗IOC:
Question
Mean
Mean std err
Um meine Sicherheit zu erh¨ohen, nehme ich auch Mehraufwand
in Kauf. In order to increase my security I am willing to accept
additional effort.)
4.66
0.25
Aktuelle PIN-Eingabe-Verfahren sind ausreichend sicher. (Current PIN entry methods are sufficiently secure.)
1.40
0.24
An manchen Orten finde ich mich beobachtet, wenn ich meine 4.47
PIN eingebe. (At some places I feel observed while entering my
PIN.)
Table 2
Mean ratings for three acceptance questions (1: disagree, 7: agree).
0.29
0,4
0,4
0,3
DOC
REG
IOC
IOC
REG
DOC
0,3
0,2
0,2
0,1
0,1
0
1
2
3
4
5
6
0
1
-0,1
Repetitions
2
3
4
Value Number
Figure 10. Error rates depending on repetition (left) and value (right).
Z = −16.42, p < 0.01). The error rate in the IOC and DOC condition was also
influenced by fatigue, as can be seen in the left graph of figure 10. The error probability was also influenced by the digit position. Errors were particularly frequent
for the second digit in the REG and IOC condition (see figure 10, right graph).
Duration We found that subjects entered their PINS in the REG condition
about 15 times faster than in the IOC condition, and about 22 times faster than
in the DOC condition. Below, we summarize the duration by condition in milliseconds:
Condition
Mean
Mean std err
t-test
REG
1,130
106
tREG∗IOC = −19.58, p < 0.01
IOC
17,626
640
tIOC∗DOC = −8.48, p < 0.01
DOC
24,734
788
tDOC∗REG = −28.13, p < 0.01
We found a learning effect that completed after three repetitions, figure 11 shows
the average duration per PIN entry for each repetition. In other words, subjects
quickly acquired the skills necessary to operate the new methods. We also found
40000
REG
35000
IOC
DOC
30000
25000
20000
15000
10000
5000
0
1
2
3
4
5
6
Repetitions
Figure 11. Duration of PIN entry by repetition.
significant effects for age (REG: r = 0.15, p < 0.01; IOC: r = 0.12, p < 0.01;
DOC: r = 0.05, p = 0.03) and for gender (REG: r = 0.11, p = 0.04; IOC:
r = 0.08, p = 0.02; DOC: r = 0.06, p = 0.01) based on pairwise correlation
analysis in all conditions.
Usability A Kruskal-Wallis rank-sum test of the SUS ranks revealed a significant
effect of condition on usability rating (χ2 (2) = 21.29, p < .01). By pairwise comparison of the conditions we found that the DOC method was rated significantly
less usable than the other conditions (¯
a = 25.9) while there was no significant difference in ratings between REG (¯
a = 42.7) and IOC (¯
a = 37.7; Wilcoxon rank
sums for REG∗IOC: Z = 1.47, p = 0.14; for IOC∗DOC: Z = −3.25, p < 0.01;
for DOC∗REG: Z = 4.35, p < 0.01). We found neither age nor gender effects in
the SUS ratings. Table 1 gives the results by condition and SUS question.
Attitude Subjects did not consider current PIN entry methods as secure. They
also concurred with the statements “in order to increase my security I am willing to accept additional effort” and “at some places I feel observed while entering
my PIN” (see table 2). We found no effect of condition or demographics on user
attitude.
5.4
Interpretation
Our initial study of shoulder surfing attempts indicated a clear security advantage
of our PIN entry methods when compared to the regular method. Subjects with
no particular training in shoulder surfing observed all PINs in the REG condition
without errors whereas in the IOC and DOC condition subjects guessed only one or
two digits correctly in a few cases. Of course, one cannot generalize that result—
determined adversaries would perhaps invest a certain amount of training to improve their shoulder surfing skills when faced with our methods. It remains to be
investigated to what degree training may improve guessing probability. However,
it is probably fair to say that our mechanisms raise the bar for shoulder surfers
substantially.
Although, the security benefits come at the price of longer duration for PIN entry
paired with a higher level of required attention, particularly in the DOC condition.
This was to be expected—the question was to what magnitude the usability of the
IOC and DOC methods differ from that of the REG method. We were content to
find that in our current study, subjects’ usability rating of the IOC method was
comparable to the rating of the regular method. That is an improvement over our
earlier study with the previous version of the implementation. The REG and IOC
methods also exhibited similar characteristics with regard to age, gender, and error.
Unfortunately, the DOC method did not profit to the same degree from the revisions
we made to the test environment.
All conditions showed a learning effect. Subjects acquired the skills necessary to
operate our mechanisms within three repetitions. In summary, we conclude that
the IOC method may indeed be of high practical value, whereas the DOC method
appears to be too demanding for an actual application.
6 Conclusions
Towards a PIN entry method that is robust against shoulder surfing, we proposed
two variants of an interactive challenge-response protocol (the immediate and delayed choice variants) to which we refer as cognitive trapdoor games. The essential
feature of such a game is that it is easily won if the PIN is known, and hard to
win otherwise. The cognitive capabilities of a human are generally not sufficient to
derive the genuine PIN through observation of the entire game’s input and output.
As a defense against automatic recording for instance by miniature cameras, we
proposed a modification which maintains a certain level of uncertainty about the
genuine PIN even if automatic recording devices are deployed. Due to its probabilistic nature, we refer to this variant as a probabilistic cognitive trapdoor game.
Additionally, we presented a tactile variant based on Braille-type displays which
can be operated for instance by blind people with perfect secrecy against shoulder
surfers.
In order to assess the security and usability of our visual PIN entry methods, we
conducted three user studies. We reported on the results of the first two studies already in (Roth et al., 2004). In this article, we report results of our third study which
focused on the usability of a revised version of our software and its user interface. The results of these studies support the hypothesis that our immediate choice
method provides resilience against shoulder surfing while still being reasonably
usable, which is of significant value when entering PINs in a public environment.
Among the variants, the immediate choice method has shown considerable advantages over the delayed choice method with regard to usability, acceptance, entry
times, and error rates. Although the time required to enter a PIN with the immediate choice method is longer than the time required to enter a PIN with the regular
method, the usability rating of the immediate choice method was not significantly
different from the rating of the regular method. It appears that the additional effort,
when compared to the regular PIN entry method, is offset by users’ subjective and
objective security advantages gained by that method, which supports Sasse’s notion of users’ cost versus benefit calculation (Sasse, 2003). We conclude that the
immediate choice method is of practical value where shoulder surfing is a concern.
Our next objective is to conduct usability studies of our methods on a larger scale,
ideally within the scope of a field test. Any guidance on that subject is greatly
appreciated.
Acknowledgments
This article is a significantly revised and extended version of (Roth et al., 2004)
which we presented at the 11th ACM Conference on Computer and Communications Security. The described methods are Patent pending. We would like to thank
Abraham Bernstein and other (anonymous) reviewers very much for their detailed
and supportive comments which helped and guided us in improving our original
manuscript. We would also like to thank everyone who participated in our usability
studies for their time and support.
References
http://www.swiveltechnologies.com, July 2004.
Jr. A. James Smith. Method and apparatus for securing passwords and personal
identification numbers. US Patent # 6,253,328, United States Patent and Trademark Office, 4901Gulf Shore Boulevard Dr. North, Apt. 1903, Naples, FL 34103,
June 2001.
Jr. A. James Smith. Method and apparatus for securing a list of passwords and
personal identification numbers. US Patent #6,571,336, United States Patent and
Trademark Office, 4901Gulf Shore Boulevard Dr. North, Apt. 1903, Naples, FL
34103, May 2003.
John R. Anderson. Cognitive Psychology and its Implications. Worth Publishers,
5th edition, 2000. ISBN 0-7167-3678-0.
Dinesh Kashinath Anvekar. Method for non-disclosing password entry. US Patent
#6,658,574, United States Patent and Trademark Office, December 2003. Assignee: International Business Machines Corporation.
Daniel G. Baker. Nondisclosing password entry system. US Patent #5,428,349,
United States Patent and Trademark Office, 6982 SW 184th, Aloha, OR 97007,
June 1995.
George E. P. Box, William G. Hunter, and J. Stuart Hunter. Statistics for experimenters. Wiley-Interscience, 1st edition edition, 1978.
Mark Brader. Shoulder-surfing automated. Risks Digest 19.70, April 1998.
J. Brooke. SUS: A quick and dirty usability scale. In P. Jordan, B. Thomas,
B. Weerdmaster, and I. McClelland, editors, Usability evaluation in industry,
pages 1189–194. Taylor and Francis, London, UK, 1996.
John P. Cairns. System for cryptographing and identification. US Patent
#4,962,530, United States Patent and Trademark Office, Wilmington, DE, October 1990.
Earl R. Collins. Computer access security code system. US Patent #4,926,481,
United States Patent and Trademark Office, La Canada, CA, May 1990.
John Colville. Atm scam netted $620,000 australian. Risks Digest 22.85, August
2003.
Stephen R. Cottrell. Method to provide security for a computer and a device therefor. US Patent #5,465,084, United States Patent and Trademark Office, November 1995.
Count Zero. Card-o-rama: Magnetic stripe technology and beyond. Phrack, (37),
1992.
Steven B. Hirsch. Secure keyboard input terminal. US Patent #4,333,090, United
States Patent and Trademark Office, 305 Peck Dr., Beverly Hills, CA 90212, June
1982.
Steven B. Hirsch. Secure input system. US Patent #4,479,112, United States Patent
and Trademark Office, 305 Peck Dr., Beverly Hills, CA 90212, October 1984.
Douglas Hoover. Method and apparatus for secure entry of access codes in a computer environment. US Patent #6,209,102, United States Patent and Trademark
Office, March 2001. Assignee: Arcot Systems, Inc.
Nicholas J. Hopper and Manuel Blum. A secure human-computer authentication scheme. Technical Report CMU-CS-00-139, School of Computer Science,
Carnegie Mellon University, Pittsburgh, PA, May 2000.
Nicholas J. Hopper and Manuel Blum. Secure human identification protocols. In
C. Boyd, editor, ASIACRYPT, volume 2249 of Lecture Notes in Computer Science, pages 52–66. Springer Verlag, 2001.
ISO. Banking – Personal Identification Number (PIN) management and security –
Part 1: Basic principles and requirements for online PIN handling in ATM and
POS systems. International Organization for Standardization), May 2002. TC
68/SC 6.
William J. Johnson and Owen W. Weber. Method and system for variable password access. US Patent #5,682,475, United States Patent and Trademark Office,
October 1997. Assignee: International Business Machines Corporation.
W. H. Kruskal and W. A. Wallis. Use of ranks in one-criterion variance analysis. J.
Amer. Statist. Ass., (48):907–911, 1952.
Markus Kuhn. Probability theory for pickpockets – ec-PIN guessing. Available at
http://www.cl.cam.ac.uk/∼mgk25/, 1997.
Xiang-Yang Li and Shang-Hua Teng. Practical human-machine identification over
insecure channels. Journal of Combinatorial Optimization, 3(4), 1999.
Rensis Likert. A technique for the measurement of attitudes. McGraw-Hill, New
York, USA, 1932.
H. B. Mann and D. R. Whitney. On a test of whether one of two random variables
is stochastically larger than the other. Ann. Math. Statist., (18):50–60, 1947.
Michael J. Martino, Geoffrey L. Meissner, and Robert C. Paulsen Jr. Identity verification system resistant to compromise by observation of its use. US Patent
#5,276,314, United States Patent and Trademark Office, January 1994. Assignee:
International Business Machines Corporation.
T. Matsumoto and H. Imai. Human identification through insecure channel. In
D. W. Davies, editor, EUROCRYPT, volume 547 of Lecture Notes in Computer
Science, pages 409–421. Springer Verlag, 1991.
Tsutomu Matsumoto. Human-computer cryptography: an attempt. In Proceedings
of the 3rd ACM conference on Computer and communications security, pages
68–75. ACM Press, 1996. ISBN 0-89791-829-0. doi: http://doi.acm.org/10.
1145/238168.238190.
Keith Eric McIntyre, John Foxe Sheets, Dominique Andre Jean Gougeon, Curtis W.
Watson, Keven Paul Morlang, and Dave Faoro. Method for secure pin entry on
touch screen display. US Patent #6,549,194, United States Patent and Trademark
Office, April 2003.
G. A. Miller. The magical number seven, plus or minus two: Some limits on our
capacity for processing information. Psychological Review, 63:81–97, 1956.
R. C. Milton. An extended table of critical values for the Mann-Whitney
(Wilcoxon) two-sample statistic. J. Amer. Statist. Ass., pages 925–934, 1964.
Bodo M¨oller. Schw¨achen des ec-PIN-Verfahrens. Available at http://www.
informatik.tu-darmstadt.de/TI/Mitarbeiter/moeller,
February 1997. Manuscript.
B. B. Murdock. The retention of individual items. Journal of of Experimental
Psychology, 62:618–625, 1961.
Chandrasekhar Narayanaswami. Password protection using spatial and temporal
variation in a high-resolution touch sensitive display. US Patent #6,720,860,
United States Patent and Trademark Office, April 2004. Assignee: International
Business Machines Corporation (Armonk, NY).
Jacques Patarin and Michel Ugon. Process for entry of a confidential piece of
information and associated terminal. US Patent #5,815,083, United States Patent
and Trademark Office, September 1998.
L. R. Perterson and M. J. Peterson. Short-term retention of individual verbal items.
Journal of of Experimental Psychology, (58):193–198, 1959.
Hans-Eberhard Plath and Peter Richter. Erm¨udungs-Monotonie-S¨attigung-Stress
(BMS). Technical report, Psychodiagnostisches Zentrum, Dresden, Germany,
1984.
Werner J. Rehm. Security means. US Patent #4,502,048, United States Patent and
Trademark Office, 22 Lomatta St., The Gap, Queensland, 4061, AU, February
1985.
Volker Roth, Kai Richter, and Rene Freidinger. A PIN entry method robust against
shoulder surfing. In Proc. 11th ACM Conference on Computer and Communica-
tions Security, Washington, DC, USA, October 2004.
Lothar Sachs. Angewandte Statistik. Springer-Verlag, Berlin, Germany, 10. edition
edition, 2002.
M. A. Sasse. Computer security: Anatomy of a usability, and a plan for recovery.
Ft. Lauderdale, USA, April 2003.
Claude E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423 and 623–656, 1948.
Chris Summers and Sarah Toyne. Gangs preying on cash machines. BBC News
Online, October 2003.
Keith R. Thrower. Access control apparatus. US Patent #4,857,914, United States
Patent and Trademark Office, Old Cedar, 12 Wychcotes, Caversham, Reading,
RG4 7DA, GB2, August 1989.
Edward K. Vogel and Maro G. Machizawa. Neural activity predicts individual
differences in visual working memory capacity. Nature, 428:748–751, April
2004.
Chuck Weinstock. Atm fraud. Risks Digest 4.86, May 1987.
F. Wilcoxon. Individual comparisons by ranking methods. Biometrics, (1):80–83,
1945.
Gordon Thomas Wilfong. Method and apparatus for secure PIN entry. US Patent
#5,754,652, United States Patent and Trademark Office, May 1998. Assignee:
Lucent Technologies, Inc. (Murray Hill, NJ).
Gordon Thomas Wilfong. Method and apparatus for secure PIN entry. US Patent
#5,940,511, United States Patent and Trademark Office, May 1999. Assignee:
Lucent Technologies, Inc. (Murray Hill, NJ).
Danny Wood. Spain uncovers hi-tech cashpoint fraud. BBC News Online, January
2003.