Mobile 3D quality of experience evaluation:

Mobile 3D quality of experience evaluation:
A hybrid data collection and analysis approach
Timo Utriainena, Jyrki Häyrynenb, Satu Jumisko-Pyykköa, Atanas Boevb, Atanas Gotchevb, Miska
M. Hannukselac
a
Human-Centered Technology/bDept. of Signal Processing, Tampere University of Technology,
Korkeakoulunkatu 10, P.O. Box 527, FI-33101 Tampere, Finland
c
Nokia Research Center, P.O. Box 1000, FI-33721 Tampere, Finland
ABSTRACT
The paper presents a hybrid approach to study the user's experienced quality of 3D visual content on mobile autostereoscopic displays. It combines extensive subjective tests with collection and objective analysis of eye-tracked data.
3D cues which are significant for mobiles are simulated in the generated 3D test content. The methodology for
conducting subjective quality evaluation includes hybrid data-collection of quantitative quality preferences, qualitative
impressions, and binocular eye-tracking. We present early results of the subjective tests along with eye movement
reaction times, areas of interest and heatmaps obtained from raw eye-tracked data after statistical analysis. The study
contributes to the question what is important to be visualized on portable auto-stereoscopic displays and how to maintain
and visually enhance the quality of 3D content for such displays.
Keywords: experienced quality, visual quality, 3D, autostereoscopic display, binocular eye-tracking
1. INTRODUCTION
Subjective quality evaluation is used to identify critical system factors for development and objective modeling
purposes21. Conventionally, quantitative preference evaluation methods have been conducted following the guidelines of
International Telecommunication Union36,37. To complement these preference ratings and gain deeper understanding of
varied factors with heterogeneous stimuli, novel or high perceptual qualities both qualitative descriptive methods 42,86 and
objective eye-tracking has been proposed102. The importance of these complementing methods is to provide deeper
understanding beyond quantitative excellence evaluations. Under the high quality or heterogeneous stimuli, the
preference can be hard to identify while the descriptive data enables to differentiate between studied factors 5. The hybrid
methodological approach provides a rich description of subjectively experienced quality which also requires systematical
co-integration and interpretation of results for providing the appropriate insight to the phenomenon under the study and
benefit for further technical development.
This paper examined eye-tracking as objective method for examining visual 3D video quality on small display. Eyetracking reveals information about human visual attention. While several studies 17,27,62,70,72,102 have used eye-tracking in
data-collection, it is difficult to make meaningful interpretation of the results from the viewpoint of human attention and
comparisons between studies. Attention contains always a stimuli-driven bottom-up and a task driven top-down
component and these both play a role when measuring quality and defining the tasks for the experiments 47. There are
also numerous eye-movement parameters83,19,89 which reveal different aspects of human information processing (e.g.
fixations for coding information, saccades for searching whereas blink rate and pupil size indicate fatigue, emotion and
cognitive effort83,10,11). For the meaningful use of eye-movement data to complement the understanding of mobile 3D
visual quality, it is important to identify the relevant set of eye-tracking parameters. The goal of this paper is three-fold.
Firstly we present a review on eye-tracking meaningfully interpretable eye-tracking parameters to be used in quality
evaluation research. Secondly, we present an overview of use of eye-tracking on 3D quality research on small screens.
Finally, we present examples from two subjective quality evaluation studies where a set of identified interpretable
parameters are analyzed for synthetic and natural video contents.
2. OVERVIEW OF METHODS FOR QOE EVALUATION BASED ON EYE TRACKING
Eye-tracking is based on the eye-mind hypothesis, which assumes that the viewer‟s attention is directed at the object the
viewer is looking at13. Even though this assumption is usually valid, there are exceptions. During higher level cognitive
activity, such as intensive thinking, the location of the gaze does not provide accurate information on the focus of
attention100. Additionally, the point at which the eyes are focused on can deviate up to 1° away from the target of
attention and the human visual system is able to recognize targets up to 1.5-2.6° away from the actual target of
fixation68,65.
The problem for the validity of eye-tracking experiments is the sheer number of eye-tracking parameters one can extract
from the vast data provided by eye-trackers. To further complicate matters, there are no set guidelines as to what each
parameter can relay from human visual information processing. Literature uses the parameters in contradicting ways.
2.1 Eye-tracking parameters
Eye-tracking parameters are usually divided into three main categories: fixation based, saccade based and scanpath based
measures88,83. A fixation is a longer period of time when the eye is relatively stationary and taking in information from
the scene19. Saccades are rapid eye movements between fixations during which no information processing takes place 24.
A scanpath is a complete saccade-fixation-saccade sequence83. The thresholds used to classify eye movements into
saccades and fixations are not commonly agreed upon or standardized, which makes comparison of different studies
difficult. Even a small change in the thresholds that define a fixation can have dramatic changes to the results48,90. As the
thresholds in terms of acceleration, velocity and duration are not universally agreed upon, studies are difficult to compare
as the reported thresholds vary from study to study.
The mechanisms governing eye movements are divided into two competing mechanisms: bottom-up and top-down.
Their relative importance is debated within the research community 34,6. Bottom-up mechanism is controlled by low-level
primitives of the perceived scene – such as contrast, luminance or edge density – termed visual salience75,4,35. The topdown mechanism uses the semantic informativeness of the areas in relation to more complex higher level cognitive
functions (viewer‟s task, goals, familiarity with similar scenes) – such as faces and people, and other task dependent
important objects13,107,91. It is believed that bottom-up processes are more influential early after stimulus onset 76,35, while
the influence of top-down mechanisms increase with time96.
2.2 Recommendations of eye-tracking parameters
The review targeted published work in the fields of psychology, psychophysiology, human-computer interaction and
traffic research. A total of 88 publications were reviewed. The publications ranged from basic research to applied
research. The goal of the review of eye-tracking parameters was two-fold (Table 1): to identify and measure the locations
the participants regarded as important in the presented video clips and to measure the participant‟s emotional response to
the viewed stimuli.
2.2.1 Measures of important locations
Fixation position is the fundamental basis of all eye-tracking studies, as they are based on the eye-mind hypothesis13,107.
This hypothesis states that the viewer‟s visual attention is focused at the position his gaze is focused on. Higher cognitive
functions (such as intensive thinking, speaking or memory recollection) can however override the link between gaze and
attention100.
Fixation duration is used to indicate important objects13,69,58,16,57,88,70,67. Ninassi et al. used fixation durations to calculate
regions of interest in still images while varying impairments and tasks 67. Nyström and Holmqvist used fixation durations
to identify important areas to be used as an input for off-line foveation of video content70.
Fixation frequency on an area is used to indicate important locations23,57,58,83. Fitts et al. used it to measure the relative
importance of cockpit indicators in aviation research23. Poole and Ball noted that increased fixation frequency on a
particular area can indicate greater interest in the target83. The amount of re-fixations has also been used to measure
region importance with still image and user interfaces88,29,39,83.
First fixation position and latency has been used to indicate important and informative objects in a scene 58,20,14,28. Loftus
and Mackworth found that first fixation density was greater for semantically informative than uninformative regions and
the viewers fixated earlier on informative objects58. Ellis et al. used the time of first fixation to identify the most
important objects of web search user interfaces20. Byrne et al. used the location of first fixation in a study of menu items
in pull-down menus14. Häkkinen et al. used the locations and times for first fixations between areas to compare gaze
distributions for 2D and 3D versions of the same video content28. Later studies have however questioned the Loftus and
Mackworth conclusion that first fixation placement is affected by region informativeness or consistency18,61,30,29. These
studies suggest that the visual features of the scene determine initial fixation positions instead of semantic features.
Percentage of participants fixating on an area can be used as a between-subjects indicator of important objects39,83,6,28.
Birmingham et al. used the proportions of fixations as an indicator of important locations with video scenes of social
content6. Häkkinen et al. used the percentage of fixations on an area to explore how the locations change when
comparing 2D and 3D versions of the same video content28.
Table 1 Identified eye-tracking parameters
Goal
Parameter
Identifying Fixation position
important
locations
Fixation duration
Interpreted meaning
Example study
The location of viewer's visual How does gaze behaviour change
attention (eye-mind hypothesis) when changing the viewer's
task107
Important objects are fixated on Where do people look while
longer than unimportant
examining works of art13
locations
Fixation frequency Important objects are fixated
Relative importance of cockpit
often
controls23
First fixation
Important objects are fixated
Where do human observers look
position
first
in videos with social scenes6
Latency to first
Important objects are fixated
Where do human observers look
fixation
faster
in synthetic images with out-ofplace objects58
Percentage of
As more of the participants
The importance of eyes, faces,
participants fixating fixate on an object the more
bodies and foreground objects in
on an area
important it is
social scenes6
Measuring Pupil size
Pupil size changes according to Pupil responses while listening to
emotional
emotional response
negative, positive and neutral
response
sounds77
Blink rate
Blink rate changes according to Human reactions to images
emotional response
showing threatening situations74
Fixation length (on Objects with emotional
Where do people look in images
an object)
attachment are fixated on longer with criminal or neutral content57
Limitations
Higher cognitive functions can
override the link between gaze
and attention
Definition of saccades and
fixations, lack of engagement,
fatigue
Applicability with dynamic
content
Temporal masking, definition
of first fixation
Temporal masking, definition
of first fixation
References
13, 107, 19
Gender, age, experience
differences
58, 39, 83, 28, 6
Lighting level, fatigue, mental
workload
56, 32, 41, 11,
78, 95, 77
Fatigue, mental workload,
visual discomfort, eye dryness
Applicability with dynamic
content
11, 64, 84, 74
13, 69, 58, 88,
70, 67, 16, 57
23, 58, 57, 83
58, 20, 14, 28, 6
58, 20, 14, 28, 6
46, 16, 57
2.2.2 Measures of emotional response
Pupil size has a tendency to grow with positive emotions and shrink with negative 56,32,41,11,78,95. Pupil size can however
also decrease with fatigue32,59 and increase with mental workload95,82. Pupil size naturally also adjusts by the amount of
light entering the eye, which can also be a problem with changes in screen brightness if the room background
illumination is not sufficient95,66.
Blink rate has been shown to increase with negative emotions11,64,84 and decrease with attentional engagement74.
However, a higher blink rate can also indicate fatigue 11,10 and mental workload95,11,10,94. Blink rate can also change due to
external variables, such as visual discomfort, dryness of the eyes and masked emotions associated with deception83,84.
Fixation length on a single object has also been used as a measure of emotional attachment to that object, with a longer
fixation denoting stronger attachment46,16,57.
2.2.3 Test design considerations
Typical amount of incomplete experiments with eye-trackers hovers between 10-20%39,25, while individual lower yields
of acceptable data are also reported 92. Eyewear, contact lenses, small or large pupils and eyelids, and head movements
can cause tracking difficulties. According to Peli et al., a typical yield of acceptable eye sample data per participant in a
successful testing session is between 91-98%80. The remaining data is lost due to blinks, temporarily losing tracking of
the pupil(s), head movements of the participant and random errors introduced by the eye-tracker.
Huge individual differences exist between participants in visual strategies and gaze behavior 107,15,3,9,26,25,2. For this reason,
within subject (i.e. repeated measures) design is encouraged 25,19. Age, gender and experience level also affect gaze
distribution80,26,23,15,101.
Brasel and Gips note that content has an impact on gaze dispersion9. In their experiment with a nature documentary that
included advertisement breaks, the adverts had larger gaze dispersion than the documentary. Gaze dispersion is more
uniformly distributed between participants when using moving video than still images, movement is more important than
color in drawing viewers‟ gaze, and written text (such as subtitles or other textual information) has a tendency to draw
the attention of the viewers97.
Repeated viewings of the same stimuli change viewing behavior70,9,60,55. Mannan et al. found that fixations are not the
same when viewing the same scene for the second time60. In a study by Nyström and Holmqvist, viewers described their
viewing patterns as natural during the first viewing and after multiple viewings they began “to look more around the
video” in search of quality impairments70. Brasel and Gips found that previously seen advertisements exhibit larger gaze
dispersion than during their first exposure9. Le Meur et al. found that repeated viewings changed visual attention
deployment, even though introducing quality impairments did not 55.
The task of the viewer affects how eye movements are distributed when looking at a scene 107,46,72,102,67. Yarbus noted that
participants looked at faces when the task was to evaluate age, and looked at other areas when instructed to evaluate the
prosperity of the family in the same image107. Similarly, a quality evaluation task made participants look at areas they did
not look at during a free-viewing task72. Ninassi et al. noted that fixation durations increased and the visual strategy
changed when evaluating quality instead of free-viewing67.
3. EYE TRACKING OF 3D CONTENT
3.1 Overview of 3D cues
Human visual system uses several methods to distinguish distances of objects in a scene. These separate subsystems
work together to enable 3D vision98. The different cues used in perception of depth depend on the distance of the
observed target. Visual system combines “layers” of these subsystems to get accurate depth estimation104. These
different layers of 3D vision are presented in Figure 1.
~10-1m
~101m
~102m
~103m
+ inf
Accommodation
Binocular Disparity
Pictorial Cues
Motion Parallax
Figure 1 Separate “layers” of 3D cues.
Accommodation is the ability of the eye to change the optical power of its lens. This is needed to focus targets from
different distances to the retina. Ciliary muscles of the eye are controlling the curvature of the lens and the control is
based on the retinal blur98. As shown in Figure 1 the accomodation is used only on short viewing distances.
Binocular disparity leads from the fact that the human eyes are separated by a small distance and share partially same
visual field. This enables two binocular depth cues: vergence and stereopsis. In vergence the both eyes are rotated so that
the target is visualized in the fovea. This gives signals from the oculomotor system to the visual system about the angle
of rotation and this information is interpreted as depth98. Because of separation of the eyes they capture slightly different
view of the target object stereopsis uses this disparity to depth estimation. The usability of binocular depth cues is limited
by the fact that from certain point forward eyes point straight and the oculomotor signal and disparity remains the same.
For longer distances the human visual system uses pictorial cues and motion parallax for depth asesment. Both, pictorial
and motion parallax, cues can be perceived also with one eye only. Pictorial cues are for example linear perspective,
shadows and scale. Occlusion is a strong pictorial cue that can give sure information about the objects depth
relationship104. Scale on the other hand is related to familiarity of real sizes of objects such as people, cars and buildings.
As illustrated in Figure 1 the pictorial depth cues are affecting the depth perception also on shorter distances.
In motion parallax observer moves in relation to the surrounding scene. Depth cue is created when objects in the scene
move relatively to each other. Similarly as pictorial cues the motion parallax can be utilized on wide distance scale.
Even with short distances is possible to observe the function of motion parallax.
3.2 Optical characteristics of portable auto-stereoscopic displays
Stereoscopic displays create illusion of depth by projecting separate images to the eyes of the observer. Displays that can
create 3D illusion without requiring the observer to wear special 3D glasses are known as auto-stereoscopic displays.
Wearing 3D glasses is considered impractical for mobile applications, and thus most portable 3D displays are autostereoscopic. The most common design of such displays involves TFT-LCD display and additional optical layer mounted
on top. The layer makes the visibility of each TFT color element (also known as sub-pixel) a function of the observation
angle. As a result, from each observation angle different group of sub-pixels is visible. The image, formed by the visible
pixels is called a view. Since portable 3D displays are meant for a single observer, they typically have two views as
shown in Figure 2a99,106,1. When the observer is in the proper position (called sweet-spot), each eye is supposed to see
only half of the sub-pixels. However, due to less-than-perfect optics, or wrong observation angle, it is possible that part
of the image intended for one eye is visible by the other. This process is modeled as inter-channel crosstalk51. In order to
be visualized properly, the sub-pixels of a stereo-pair need to be reordered, so that each view contains the proper image.
This process is called interleaving51. The binary map, which describes the mapping of TFT sub-pixels to one or another
view, is known as interleaving map. Usually, the resolution of one view is lower than the resolution of the TFT matrix.
Also, the resolution of one view is typically two times lower in one direction (most often horizontal) than the other 106,1.
Thus, the interleaving process involves downsampling, and requires anti-aliasing filter, designed specifically for the
interleaving map of the target display.
The two most pronounced artifacts, visible on an auto-stereoscopic display are Moiré (caused by aliasing) and ghosting
(caused by crosstalk)51. The amount of the visible crosstalk defines the sweet-spots, at which the image would be seen
with sufficient quality. According to subjective quality experiments 52,79, crosstalk levels beyond 25% produce
stereoscopic image with unacceptable quality. The sweet-spot that allows the display to be seen uniformly lit with the
least amount of crosstalk is the optimal observation position of the display. The distance between the optimal observation
position and the display is known as its optimal viewing distance (OVD) 8. In this work, we define the minimum viewing
distance (VDmin) and the maximum viewing distance (VD max) as the distances from which an observer with interpupilar
distance of 65mm is still able to perceive image with level of crosstalk lower that 25%. The positions of OVD, VDmin
and VDmax are shown in Figure 2b.
Optical layer
TFT-LCD
TFT-LCD
Optical layer
IPD=65mm
Left view
Right view
VDmax OVD
VDmin
Figure 2 a) Mobile autostereoscopic display - principle of operation and b) optimal observation position.
The display we selected for the experiments is an autostereoscopic 3D display with HDDP arrangement, produced by
NEC99. It has resolution of 427x240 pixels at 157 DPI. One particular feature of the HDDP display is that it has the same
resolution in 2D and 3D. The optimal observation distance of the display is 40cm, and its VD min and VDmax are 25cm and
28cm respectively7. For the experiments, we selected observation distance of 30cm as it was good compromise between
visual quality and eye-tracking precision. Our eye-tracker has resolution of 0.1 degrees, and for viewing distance of
30cm this yields tracking precision of approximately 3 pixels.
3.3 Overview of binocular eye tracking
Eye tracking has been utilized in visual attention studies for years. Major part of these studies has been conducted with
monocular test setup. This is reasonable with two-dimensional content, as the hypothesis is that both eyes are fixated on
the same point. When eye tracking is used with three-dimensional content the scenario is different as the eye disparity
differs with the content. Therefore both eyes need to be tracked separately in order to calculate the point-of-gaze (pointof-regard/point-of -interest) in three-dimensional space. Same methods that are utilized in monocular eye tracking are
also feasible in binocular eye tracking. The research in binocular eye tracking is focused mainly on three things: how to
estimate the point-of-gaze in three dimensions, how to utilize this information in user interfaces and development of
binocular eye tracking systems that allow non-restricted head movements.
Several eye tracker manufacturers are currently offering solutions for binocular eye tracking. Both head mounted and
desktop mounted devices are available. The currently mostly used method in eye tracking is video based eye tracking
with pupil and corneal reflection detection. Manufacturers are offering devices with similar technology and the biggest
difference is in the sample rate of the devices. Systems are available from different manufacturers, offering sampling
rates of 120 Hz, 220 Hz, 500 Hz and 1000 Hz. The need of higher sampling rate depends on the aim of the study. To be
able to track properly saccades and micro saccades a 500 Hz sampling rate is recommended. In user interface studies the
sampling rate can be lower as the interest is more in easy calibration of the system. Research groups have constructed
their own binocular eye tracking systems for the user interface studies. For example Shih and Liu have constructed
binocular tracking system with easy calibration and 30 Hz sampling rate 93. Their system has good accuracy of 1 degree
of visual field, however the freedom of head movement is limited. Kwon et al. have been working on binocular eye
tracking system that is used to control user interface on autostereoscopic display 53. Their system has 15 Hz sampling rate
and uses pupil center information to calculate geometrically the depth of gaze.
Biggest difficulties are faced in the calibration process and calculation of the gaze depth. Traditional calibration method
in the video based eye trackers is to show calibration targets on the display screen and ask the test subject to focus on
them. The relation between tracked eye and calibration target is solved to get the point-of–gaze information. Essig et al.
have introduced improvements into the traditional calibration and depth calculation process by utilizing artificial neural
network in the estimation of 3D gaze point22. They have utilized a binocular eye tracker and anaglyph 3D display in their
research. The selection of anaglyph display was based on the similar vergence movement of the eyes as with natural
content. Their new 3D calibration process utilized 3x3 calibration grids that were positioned on three depth levels
forming a 3x3x3 calibration matrix. 3D gaze point was calculated based on both the traditional 2D calibration and the
new 3D calibration. Depth calculation from 2D calibration was based on geometrical solution of gaze depth and in 3D
calibration the gaze depth was estimated with artificial neural network based on parameterized self-organizing maps
(PSOM). They received considerably better results from the 3D gaze depth estimations when compared to geometrical
methods. The downside of their method is the complexity of the calibration process.
The work of Essig et al. has been repeated with similar results by Pfeiffer et al.81. The test setup in the study by Pfeiffer
et al. was more extensive as they utilized two eye tracking systems and more complex test stimuli. Both eye trackers
were head mounted devices. 3D display was based on shutter glasses. Pfeiffer et al. studied three points: benefits of
PSOM calibration versus the geometrical calibration, expensive eye tracker versus cheap eye tracker and the usefulness
of depth information in object selection. Results of this study were clear on the better performance of PSOM when
compared to geometrical estimation of 3D gaze depth. Also the cheaper eye tracker got partly better results in the
experiments. However this can be partly explained by the calibration difficulties with the more expensive eye tracker.
The used shutter glasses were disturbing the eye tracker cameras. The result of the third research interest was maybe
most interesting as according to Pfeiffer et al. the used of depth information did not bring benefit to their object selection
test when compared to 2D method. Only in situations where the objects were occluded the depth information gave slight
improvements. The calibration problems caused by the test setup were noted as one source of poor results.
In more recent research Hennessey and Lawrence introduced their novel binocular eye tracking system for 3D point-ofgaze estimation31. Their system consisted of eye tracking camera with 200 Hz sampling rate and volumetric display setup
with 2D display and a Plexiglas mounted on rails. The calibration process was performed in different depth levels
similarly as in the study by Essig et al22. System developed by Hennessey and Lawrence offers significant benefits as it
is noncontact (user does not wear glasses for 3D visualization) and head-free (opposed to other systems where user needs
to wear the eye tracker or use chin support or bite-bar). The biggest difference in most other eye tracking studies is the
use of eye models in 3D point-of-gaze estimation. By improving the eye model the tracking results for the 3D point-ofgaze can also be improved as opposite to geometrical or neural network solution.
Binocular eye tracking research has also been done in more traditional studies of visual system properties. Jainta et al.
studied the binocularity during reading40. They tested the disparity during fixations and determined the minimum
disparity values. Their work was performed on two-dimensional content, but results of the varying disparity values in
different parts of fixation are interesting because in three-dimensional case the gaze depth is calculated based on eye
disparity. Wismeijer et al. have performed binocular eye tracking studies on correlation between perception and
spontaneous eye movements105. They have utilized polarization-multiplexed 3D display and SR research Eyelink II eye
tracker. The experiment stimuli contained two depth cues disparity and perspective with varying amount of conflicting
information. Their results indicated that for small conflicts the depth cues were averaged, but for large conflict one of the
depth cues was dominating and the dominating cue varied between test subjects.
To the best of authors‟ knowledge binocular eye tracking studies has not been performed with high speed eye tracking
system and autostereoscopic display. Previous studies with autosterescopic displays are limited to research by Kwon et
al. and in their case the focus was on the gaze interaction with 3D display. Also the use of mobile display is new aspect
to the binocular eye tracking research.
4. EARLY RESULTS
4.1 Experiment 1
Goal of the experiment was to examine influence of depth, motion and object size on the visual attention with synthetic
content.
4.1.1 Test content design - synthetic content
The synthetic content consists of 63 stereoscopic movies, each 10 seconds long. The movies were prepared using
software for rendering of 3D images called POV-Ray85. In each movie, one simple object (a ball) is moving in 3D space,
with its position and apparent depth known for every frame. The movement is restricted within a space with minimum
apparent depth of 28.5cm (corresponding to -20px disparity) and maximum apparent depth of 31.5cm (corresponding to
20px disparity). The content of each synthetic movie (Figure 3b) is controlled by four parameters:
1.
Ball size – we used two ball sizes of the ball in our experiment: “small”, with angular size of less than 1 degree and
“big”, with angular size of more than 2 degrees. The reason is to have objects with angular size are smaller of
bigger than the fovea. In our case, the “small” ball is 20px wide, and the “big” one is 100px wide.
2.
Movement direction – there are 5 types of movement direction: “x”, “y”, “xy”, “z”, and “xyz”. Types “x” and “y”
and “xy” are planar movements in the display plane, where the ball translating in horizontal, vertical or diagonal
direction respectively. Type “z” indicates movement in depth without changing the planar coordinates, and type
“xyz” is unrestricted 3D movement in arbitrary direction.
3.
Ball speed – the movement in our experiments is with three possible speeds: “slow”, “fast” and “sudden”. “Slow”
is within the smooth-pursuit tracking speed of the eyes, which is 2 deg/s. “Fast” is comparable with the fast moving
objects in cinematic content – 11 deg/s. In “sudden” movement, the object changes its position within a single
frame. Movements in depth are calculated, so the object moves with constant apparent speed – 1cm/s for “slow”,
and 5.8cm/s for “fast” speed. The “fast” speed is calculated for object passing through the cinema screen for 5
seconds, and typical angular screen size of 55 degrees.
4.
Background type – this is combination of background texture type and background depth. There are three possible
values for this parameter: “none”, “textured” and “deep”. If set to “none”, the background is of uniform black color
without any particular depth. “Textured” background is one with rich texture and apparent depth at display level
(with zero disparity). “Deep” background is with rich texture and apparent depth of 31.5cm (20px disparity).
4.1.2 Research method
Participants – A total of 13 naïve assessors, assessors equally stratified by gender and by age between 18 and 45 years
participated to the study.
Procedure – The test procedure contained three parts. Firstly, sensorial tests (visual acuity (20/40), color vision and
acuity of stereo vision (.6)) and Simulator Sickness Questionnaire (SSQ49) were measured. Secondly, calibration with 9
measurement points with the required accuracy (worst point error <1.5 degree, average error <1.0 degree) was
conducted. During the actual tests, participants were given a task to follow the ball presented in the scenes. After
completing the actual evaluation task (duration 15 min), SSQ was filled again.
Viewing conditions and presentation of stimuli – The experiment took a place in the controlled laboratory
circumstances38. The 3D display used was NEC horizontally double-density pixel arrangement (HDDP) using the native
resolution of 427x240px at 155DPI and the physical size of 3.5”99. The display utilizes lenticular sheet technology to
provide the stereoscopic 3D effect. The 3D display was placed on a mount above the eye-tracker‟s desktop unit which
housed the eye-tracking camera and IR lights. The participant‟s head was kept still by using a headrest and the viewing
distance to display was set to 30cm.
The stimuli were presented one by one in a randomized order with a 5 second pre-stimulus marker between each 10
second stimulus (Figure 3b). The pre-stimulus marker consisted of a 3 second mid-gray (50%) clip, followed
immediately with a 2 second pre-stimulus marker with a white (100%) cross at the center of the screen and dark-gray
(25%) background. The cross had a disparity value of 0, i.e. it was situated on the display surface.
SEQUENCE
Pre-stimulus marker
2s
Stimulus
10s
Break
3s
Pre-stimulus marker
2s
Stimulus
10s
Break
3s
Figure 3 a) Viewing conditions, and b) sequence of stimuli presentation.
Apparatus – The used eye-tracker was EyeLink 1000 from SR Research Inc. The eye-tracker‟s Desktop Mount unit
follows the eye using a combination of pupil and corneal reflection tracking. The eye-tracker was set to use binocular
tracking at a 1000Hz sampling frequency. The desktop mount unit was placed facing the participant below the display
(Figure 3b). The eye-tracker used following settings during the experiments: Saccade velocity threshold = 30°/sec (min.
velocity), Saccade acceleration threshold = 8000°/sec2 (min. acceleration), Saccade motion threshold = 0.1° (min.
motion), Saccade pursuit fixup = 60°/sec (max. pursuit velocity), Fixation update interval = 50ms, Fixation update
accumulate = 50ms, Blink offset verify time = 12ms.
4.2 Experiment 2
Goal of the experiment was to examine influence of depth on visual attention and emotion with natural content.
4.2.1 Test content design - Natural content
The natural content consists of 30 stereoscopic movies with mobile resolution. We used 10 high-resolution multi-camera
sequences as source. From each multiview sequence, three stereo-movies were created. One of the cameras was selected
as “left” camera, and was used in all stereo-movies created by this particular multiview sequence. Typically, this was the
leftmost camera, but in some cases different camera was used, in order to avoid cases where neighboring cameras had
noticeably different color balance. The “left” camera (after cropping and resizing) was paired with various “right”
cameras from the multiview sequence, thus creating stereo-pairs with different camera baseline. The movies were resized
and eventually cropped with the aim to achieve the desired final frame size and aspect ratio. From each multi-camera
sequence the following pairs were created – monoscopic, where the same camera was used for both channels, narrow
baseline stereo, and wide baseline stereo. The “wide” camera baseline was selected to represent 3D video with
pronounced depth, while still in the range, allowing comfortable observation. The “short” baseline was selected to
represent down-scaled HD content with limited, shallow depth, but still distinguishable from 2D content. The resulting
disparity ranges be seen in Table 2, while screenshots of each content are available in Figure 4.
Table 2 Natural content properties and disparity ranges.
Original
Sequence name resolution
Newspaper
1024x768
Dog
1280x960
Nagoya balloons 1024x960
Pantomime
1280x960
Undo dancer
1920x1080
Undo sneakers
1920x1080
Ghost town duel 1920x1080
Kendo
1024x960
Lovebirds 1
1024x768
Ghost town flight 1920x1080
Short baseline Wide baseline
Spatial Temporal disparity range disparity range
details
motion
min
max
min
max
Medium
Low
-6
12
-7
17
Medium
Low
0
3
0
6
Medium Medium
-2
8
-2
11
Medium Medium
-4
6
-8
11
Low
Medium
-2
2
-4
3
High
Low
-1
3
-1
5
Medium
Low
0
4
0
7
Medium Medium
-4
5
-5
8
High
Low
1
7
5
17
Low
High
-5
5
-4
10
Figure 4 Screenshots of the natural content stimuli, from top left to bottom right: a) Newspaper, b) Dog, c) Nagoya balloons,
d) Pantomime, e) Undo dancer; f) Undo sneakers, g) Ghost town duel, h) Kendo, i) Lovebirds1, j) Ghost town flight
4.2.2 Research method
Participants – A total of 40 naïve assessors equally stratified by gender and by age between 18 and 45 years participated
study. The sample contained mostly naive or untrained (80%) participants with no prior experience of quality evaluation
experiments, were not experts in technical implementation and were not studying, working or otherwise engaged in
information technology or multimedia processing37,38.
Procedure – The test procedure contained four parts. At the beginning, the sensorial tests and SSQ were measured
identically to the experiment 1. The actual test contained two parts and the calibration was conducted prior and SSQ
filled after each of them: 1) Free-viewing test – participant‟s task was to view content and any other additional tasks
were not given. 2) Quality evaluation task – participant‟s task was to evaluate overall quality on a discrete unlabeled
scale from 0 to 10 and the acceptance of quality for viewing mobile 3DTV (binary yes/no scale) during a 5-second
answer time between clips38,44. Combined anchoring and training where participants were shown the extremes of the
sample qualities and all contents were conducted prior the start of evaluation task. During the evaluation task, the stimuli
were presented one by one, rated independently and retrospectively. To reduce the participant movements during the
evaluation task, the assessor said aloud their quality judgment and it was marked to the answer sheet by moderator. To
reduce the experimenter expectancies the moderator did not see the stimuli assessed. In the final part, the participants
impressions of quality per stimuli were gathered on 17 descriptive dimensions, constructed from the model of
Descriptive Quality of Experience for Mobile 3D Video45.
Viewing conditions, presentation of stimuli, apparatus – The laboratory conditions, used devices and apparatus were
identical to the experiment 1. A total of 30 stimuli were presented in all evaluation tasks in random order. The duration
of stimuli was 10s and the use of pre-stimulus markers was identical to the experiment 1.
Stimuli content – Ten visual stimuli contents were used with variable spatial and temporal details. The content were
called as „Newspaper‟, „Dog‟, „Nagoya balloons‟, „Pantomine‟, „Undo dancer‟, „Undo sneakers‟, „Ghost town duel‟,
„Kendo‟, „Lovebirds1‟ and „Ghost town flight‟.
4.3 Early results on synthetic content
We started working on our 3D gaze model based on the eye tracking results from the experiments with synthetic content.
In the first stage we worked on the content that contains small and big objects moving suddenly in x-, y- and z-
directions. The object movement is so fast that it needs to be followed with saccadic eye movements. The parameters we
are interested are: reaction time, travel time and time to arrive. Reaction time is the time from the target movement to the
time when the eyes are moving towards the new location of the target. This time contains the time to react and launch the
saccadic eye movement. In the Figure 5a is illustrated the target movement and eye tracking data related to it. Target is
moving in x-direction on the screen level. Target movement is plotted with red color and eye movement with blue.
Figure 5 a) Example of eye movement parameter measurement from observer data and b) table of average reaction times.
The reaction times (saccadic latencies) are researched previously with monocular eye tracking 103 and in reading
context88. Initial results from our experiments are shown in Figure 5b. Obtained results are in accordance with the
reaction times between 150 – 400ms introduced previously in literature88,103. According to our research the reaction time
in horizontal movement is faster than in vertical and depth movement. Our initial expectation was that reaction time in
depth would be significantly larger than the other reaction times, but it turned out to be close to reaction time in vertical
movement.
4.4 Early results on natural content
This section presents the results of influence of depth on visual attention in three different parameters and emotion in the
analysis of blink rates. The analysis is conducted for the free-viewing tasks.
4.4.1 Visual attention
Analysis – In this paper, the analysis is presented for the content called ‘Newspaper’ from experiment 2. The procedure
of analysis contained 3 steps: 1) Identification of Areas of Interest (AOIs, Figure 5a). They were determined by
examining frame-by-frame heatmaps (Figure 6b) and frame-by-frame disparity maps for the studied content. Disparity
maps were used to identify objects that were at different depths within the scenes and typical determined AOIs can
include protruding objects, moving objects and people97. EyeLink Data Viewer v1.10.123 by SR Research Ltd was used
in this part of analysis. The left eye data was used for analysis as the calibration was based on it and it did not change
according to the required disparity compared to the right eye. 2) Analysis was based on three parameters and identified
AOIs: a) The relative importance of each AOI as the percentage of participants fixating on it. The higher the percentage
the more important the AOI. b) The important locations as the latency of the first fixation to that AOI (Mean IA First
fixation time). The lower the latency, the more important the AOI. c) The total fixation duration (Mean IA Dwell time)
on each AOI. The higher the total duration, the more important the AOI. 3) Finally statistical analysis was conducted.
The relative importance of AOI – The AOI man, plant and woman were the mostly fixated objects in the scene while
table was significantly less fixated (Figure 7a). The AOI screen represents the entire area of the display. The results show
that the depth levels influenced on frequency of participants fixating on the identified AOI‟s (Friedman: FR=8.167, df=2,
p<.05) when averaged across the AOI‟s. When presenting content with stereoscopic short baseline significantly higher
number of participants fixated on the important objects on the scenes (Figure 7a; Wilcoxon: p<.01) and similar tendency
was shown in the content by content analysis. This result indicates that use of stereoscopic presentation mode can attract
visual attention over monoscopic presentation mode.
Figure 6 a) Areas of Interest (AOI) for Newspaper content, b) Heatmap of frame 25 of Newspaper content. The more
participants fixating, the brighter the area.
The important locations as the latency of the first fixation - The man was the fastest fixated object in the scene (within
the first second after stimulus onset), followed by significantly slower fixated plant (in 2-4 seconds), and woman (4-5 in
seconds) (Figure 7b). The results show that the depth levels influenced on latency of the first fixation when averaged
across the AOI‟s (Friedman: FR=3.320, df=2, p=.190; Figure 7b). There is a small tendency that the duration of the first
fixation is shorter when the stereoscopic presentation mode is used, but the differences are not statistically significant.
However, for the most important object, man, the distribution of fixation time of stereoscopic short baseline is smaller
compared to others.
The total fixation duration on each AOI – The AOI man has significantly higher total dwell duration, confirming its
importance in the scene compared to the other objects (Figure 8a). For this object, the results show the tendency that with
the stereoscopic presentation, its importance is decreasing compared to monoscopic presentation (Friedman: FR=4.850,
df=2, p=.088; Mono vs. wide baseline: Wilcoxon: Z=-2.13, p<.05).
4.4.2 Emotion – blink rate
Analysis: Blink is defined here as the amount of times the participant blinks during a 10 second stimulus. The analysis
presented here contains six contents excluding ‘Dog’, ‘Undo dancer’, ‘Undo sneakers’ and ‘Ghost town duel’.
Results show that blink rate slightly decreases when stereoscopic presentation is used (F R=13.029, df=2, p<.01; Figure
8b). Blink rate decreased with increasing 3D effect while free-viewing indicating a presence of positive emotions
(Wilcoxon: Wide baseline vs. others: p<.01; Short baseline vs. Mono = ns).
Figure 7 a) Percentage of participants fixating on an AOI. Higher value: more important AOI. b) First fixation time. Lower
value indicates more important AOI.
Figure 8 a) Total time spent on each AOI (mean IA dwell time). Higher value indicates more important AOI. b) Mean blink
count. Blink rate increases with negative emotions and visual fatigue, and decreases with positive emotions.
5. CONCLUSIONS
Subjective quality evaluation is used to identify critical system factors for development and objective modeling purposes.
Conventionally, quantitative preference evaluation methods have been conducted following the guidelines of
International Telecommunication Union. To complement these preference ratings and gain deeper understanding of
varied factors with heterogeneous stimuli both qualitative descriptive methods and objective eye-tracking have been
proposed. The importance of these complementing methods is to provide deeper explanation to the quantitative quality
preference ratings. There is no previous published work to systematically combine all three data-sets to understand visual
quality. While several studies have used eye-tracking in data-collection, it is difficult to make meaningful interpretation
of the results from the viewpoint of human attention and comparisons between studies.
A review of literature was performed to identify a meaningful set of eye-tracking parameters to be used for hybrid
evaluation experiments. The review targeted at published work in the fields of psychology, psychophysiology, humancomputer interaction and traffic research. The goal of the review was two-fold: to identify the locations the participants
regarded as important in the presented video clips and to measure the participants‟ emotional response to the viewed
stimuli.
Our early results show that the identified parameters provide meaningful results with binocular eye-tracking and small
portable 3D displays, and are in line with previous research. Future work will continue to deepen the understanding of
visual quality on 3D displays.
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
3D-LCD product brochure, MasterImage, available online at http://masterimage.co.kr/new_eng/data/masterimage.zip?pos=60
Aaltonen, A. Hyrskykari, A. and Räihä, K., “101 Spots, or how do users read menus?,” Proc. CHI 98 Human Factors in
Computing Systems, 132-139 (1998).
Andrews, T. J. and Coppola, D. M., “Idiosyncratic characteristics of saccadic eye movements when viewing different visual
environments,” Vision Research, 39(17), 2947-2953 (1999). ISSN 0042-6989
Baddeley, R. J. and Tattler, B. W., “High frequency edges (but not contrast) predict where we fixate: A Bayesian system
identification analysis,” Vision Research 46, 2824-2833 (2006).
Bech S. and Zacharov N., “Perceptual audio evaluation - Theory, method and application,” J. Acoust. Soc. Am. 122(1), 16-16
(2007).
Birmingham, E., Bischof, W. F. and Kingstone, A., “Saliency does not account for fixations to eyes within social scenes,”
Vision Research 49(24), 2992-3000 (2009). ISSN 0042-6989, doi:10.1016/j.visres.2009.09.014.
Boev, A. and Gotchev, A., “Comparative study of autostereoscopic displays for mobile devices,” Proc. Multimedia on Mobile
Devices, Electronic Imaging Symposium 2011, (2011).
Boher, P., Leroux, T., Bignon, T. and Collomb-Patton, V., “A new way to characterize auto-stereoscopic 3D displays using
Fourier optics instrument,” Proc. SPIE Stereoscopic displays and applications XX, (2008).
Brasel, S. A. and Gips, J., “Points of view: Where do people look when we watch TV?,” Perception 37, 1890-1894 (2008).
Brookings, J. B., Wilson, G. F. and Swain, C. R., “Psychophysiological responses to changes in workload during simulated air
traffic control,” Biological Psychology 42(3), 361-377 (1996). ISSN 0301-0511, doi: 10.1016/0301-0511(95)05167-8
Bruneau, D., Sasse, M. A. and McCarthy, J. D., “The eyes never lie: The use of eye tracking data in HCI research,” Proc. CHI
'02 Workshop on Physiological Computing, (2002).
Burt, P. and Julesz, B., "Modifications of the classical notion of Panum's fusional area," Perception 9(6), 671-682 (1980).
Buswell, G., [How people look at pictures: A study of the psychology of perception in art], The University of Chicago Press,
Chicago, Illinois (1935).
Byrne, M. D., Anderson, J. R., Douglas, S. and Matessa, M., “Eye tracking the visual search of click-down menus,” Proc. HCI
99, 402-409 (1999).
Card, S. K., “Visual search of computer command menus,” In Bouma H. & Bouwhuis D.G. (Eds) Attention and Performance X,
Control of Language Processes, Hillsdale, NJ: Lawrence Erlbaum Associates (1984).
Christianson, S. Å., Loftus, E. F., Hoffman, H. and Loftus, G. R., “Eye fixations and memory for emotional events,” Journal of
Experimental Psychology: Learning, Memory, and Cognition 17(4), 693-701 (1991).
Cui, L. C., “Do experts and naive observers judge printing quality differently?,” Proc. SPIE 5294, 132-145 (2003).
De Graef, P., Christiaens, D. and d'Ydewalle, G., “Perceptual effects of scene context on object identification,” Psychology
Research 52, 317-29 (1990).
Duchowski, A. T. [Eye Tracking Methodology: Theory and Practice], Second Edition, Springer-Verlag, New York (2007).
Ellis, S., Candrea, R., Misner, J., Craig, C. S., Lankford, C. P. and Hutshonson, T. E., “Windows to the soul? What eyes tell us
about software usability,” Proc. Usability Professionals Association Conference 1998, 151-178 (1998).
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
Engeldrum, P. G., [Psychometric Scaling. A Toolkit for Imaging systems development], Winchester: Imcotek Press (2000).
Essig, K., Pomplun, M. and Ritter, H., ”A neural network for 3D gaze recording with binocular eye trackers,” International
Journal of Parallel, Emergent and Distributed Systems 21(2), 79-95 (2006).
Fitts, P. M., Jones, R. E. and Milton, J. L., “Eye movements of aircraft pilots during instrument-landing approaches,”
Aeronautical Engineering Review 9(2), 24-29 (1950).
Fuchs, A. F., “The saccadic system,” In Bach-y-Rita, P., Collins, C. C. & Hyde, J. E. (Eds) The Control of Eye Movements,
343-362, NY: Academic Press (1971).
Goldberg, H.J. and Wichansky, A. M., “Eye tracking in usability evaluation: A practitioner‟s guide,” In Hyönä, J., Radach, R.,
& Deubel, H. (eds) The Mind‟s Eye: Cognitive and Applied Aspects of Eye Movement Research, Amsterdam, Elsevier, 493-516
(2003).
Goldstein, R. B., Woods, R. L. and Peli, E., “Where people look when watching movies: Do all viewers look at the same
place?,” Computers in Biology and Medicine 37(7), 957-964 (2007).
Gulliver S. R. and Ghinea G., "Stars in their eyes: what eye-tracking reveals about multimedia perceptual quality,” IEEE
Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 34(4), 472-482 (2004).
Häkkinen. J., Kawai, T., Takatalo, J., Mitsuya, R. and Nyman, G., “What do people look at when they watch stereoscopic
movies?,” In Woods, A. J., Holliman, N. S. & Dodgson, N. A. (Eds) Electronic Imaging: Stereoscopic Displays & Applications
XXI 7524(1), 75240E, 10 pages (2010).
Henderson, J. M. and Hollingworth, A., “High-level scene perception,” Annual Review of Psychology 50, 243-71 (1999).
Henderson, J. M., Weeks, P. A. Jr. and Hollingsworth, A., “The effects of semantic consistency on eye movements during
complex scene viewing,” Journal of Experimental Psychology: Human Perception and Performance 25(1), 210-228 (1999).
Hennessey, C. and Lawrence, P., "Noncontact Binocular Eye-Gaze Tracking for Point-of-Gaze Estimation in Three
Dimensions," IEEE Transactions on Biomedical Engineering 56(3), 790-799 (2009).
Hess, E. H., “Pupillometrics,” In Greenfield, N.S. & Sternbach, R. A. (Eds) Handbook of Psychophysiology, Holt, Richard &
Winston, New York, NY, 491-531 (1972).
Itti L., Koch C. and Niebur E., “A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on
Pattern Analysis and Machine Intelligence 20(11), 1254–1259 (1998).
Itti, L. and Koch, C., “Computational modelling of visual attention,” Nat. Rev. Neurosci., 2001/03 (2001).
doi:10.1038/35058500
Itti, L., “Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes,” Visual Cognition
12(6), 1093-1123 (2006).
ITU-R BT.1438, “Subjective assessment of stereoscopic television pictures,” Rec. ITU-R BT.1438, ITU Telecom. Sector of
ITU, (2000).
ITU-R BT.500-11, “Methodology for the subjective assessment of the quality of television pictures,” International
Telecommunications Union – Radiocommunication sector, (2002).
ITU-T P.911 Recommendation, “Subjective audiovisual quality assessment methods for multimedia applications,” International
Telecommunications Union (ITU) – Telecommunication sector (1998).
Jacob, R. J. K. and Karn, K. S., ”Eye tracking in Human–Computer Interaction and usability research: Ready to deliver the
promises,” In Hyona, Radach & Deubel (Eds) The Mind‟s Eye: Cognitive and Applied Aspects of Eye Movement Research,
Oxford, England (2003).
Jainta, S., Hoormann, J., Kloke, W. B. and Jaschinski, W., “Binocularity during reading fixations: Properties of the minimum
fixation disparity,” Vision Research 50(18), 1775-1785 (2010).
Janisse, M. P., “Pupil size, affect and exposure frequency,” Social Behavior and Personality 2, 125-146 (1974).
Jumisko-Pyykkö S., Häkkinen J. and Nyman G., ”Experienced quality factors - Qualitative evaluation approach to audiovisual
quality,” Proc. IST/SPIE conference Electronic Imaging, Multimedia on Mobile Devices, (2007).
Jumisko-Pyykkö, S. and Hannuksela, M. M., “Does context matter in quality evaluation of mobile television?,” Proc. 10th
MobileHCI '08, 63-72 (2008).
Jumisko-Pyykkö, S., Malamal Vadakital, V. K. and Hannuksela, M. M., “Acceptance threshold: Bidimensional research method
for user-oriented quality evaluation studies,” International Journal of Digital Multimedia Broadcasting 712380, 20 pages (2008).
doi:10.1155/2008/712380
Jumisko-Pyykkö, S., Strohmeier, D., Utriainen, T. and Kunze, K., “Descriptive quality of experience for mobile 3D television,”
Proc. NordiCHI 2010, 1–10 (2010). ISBN 978-1-60558-934-3
Just, M. A. and Carpenter, P. A., “Eye fixations and cognitive processes,” Cognitive Psychology 8, 441-480 (1976).
Kahneman, D., [Attention and effort], Englewood Cliffs, NJ: Prentice-Hall (1973).
Karsh, R. and Breitenbach, F. W., “Looking at looking: The amorphous fixation measure”, In Groner, R., Menz, C., Fisher, D.
F., & Monty, R. A. (Eds) Eye Movements and Psychological Functions: International Views, Hillsdale, NJ: Erlbaum, 53-64
(1983).
Kennedy, R., Lane, N., Berbaum, K. and Lilienthal, M., “Simulator sickness questionnaire: An enhanced method for quantifying
simulator sickness,” Int. J. Aviation Psychology 3(3), 203-220 (1993).
Knoche, H. and Sasse, M. A., “The sweet spot: How people trade off size and definition on mobile devices,” Proc. ACM
Multimedia 2008, 21-30 (2008).
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
Konrad, J. and Agniel, P., "Subsampling models and anti-alias filters for 3-D automultiscopic displays," IEEE Trans. Image
Process. 15, 128-140 (2006).
Kooi, F. and Toet, A., “Visual comfort of binocular and 3D displays,” Displays 25(2-3), 99-108 (2004). ISSN 0141-9382,
doi:10.1016/j.displa.2004.07.004
Kwon, Y., Jeon, K. and Kim, S., "Research on gaze-based interaction to 3D display system," Proc. SPIE 6392, 63920J (2006).
Le Meur, O., Le Callet, P. and Barba, D., “Predicting visual fixations on video based on low-level visual features,” Vision
Research 47(19), 2483-2498 (2007). ISSN 0042-6989, doi:10.1016/j.visres.2007.06.015.
Le Meur, O., Ninassi, A., Le Callet, P. and Barba, D., “Do video coding impairments disturb the visual attention deployment?,”
Signal Processing: Image Communication 25(2010), 597-609 (2010).
Loewenfeld, I. E., “Pupil size,” Survey of Opthalmology 11, 291-294 (1966).
Loftus, E. F., Loftus, G. R. and Messo, J., “Some facts about „weapon focus‟,” Law and Human Behavior 11, 55-62 (1987).
Loftus, G. R. and Mackworth, N. H., “Cognitive determinants of fixation location during picture viewing,” Journal of
Experimental Psychology: Human Perception and Performance 4(4), 565-572 (1978).
Lowenstein, O. and Loewenfeld, I. E., “The sleep-waking cycle and pupillary activity,” Annals of the New York Academy of
Sciences 117, 142-156 (1964).
Mannan, S. K., Ruddock, K. H. and Wooding, D. S., “Fixation sequences made during visual examination of briefly presented
2D images,” Spatial Vision 11, 157-78 (1997).
Mannan, S., Ruddock, K. H. and Wooding, D. S., “Automatic control of saccadic eye movements made in visual inspection of
briefly presented 2-D images,” Spatial Vision 9, 363-86 (1995).
McCarthy, J. D., Sasse M. A. and Miras D., “Sharp or smooth?: Comparing the effect of quantization vs. frame rate for streamed
video,” Proc. 2004 conference on Human factors in computing systems, 535-542 (2004).
Moorthy, A. K. and Bovik, A. C., “Perceptually significant spatial pooling techniques for image quality assessment,” Proc. SPIE
Electronic Imaging 2009 7240, (2009).
Narayanan, N. H. and Schrimpster, D. J., “Extending eye tracking to analyse interactions with multimedia information
presentations,” Proc. HCI 2000 on People and Computer XIV - Usability or else, 271-286 (2000).
Nelson, W. W. and Loftus, G. R., “The functional visual field during picture viewing,” Journal of Experimental Psychology:
Human Learning and Memory 6, 391-399 (1980).
Ninassi, A., Le Meur, O., Le Callet, P. and Barba, D., “Does where you gaze on an image affect your perception of quality?
Applying visual attention to image quality metric,” Image Processing 2007: ICIP 2007 2, II-169 – II-172 (2007).
Ninassi, A., Le Meur, O., Le Callet, P., Barba, D. and Tirel, A., “Task impact on the visual attention in subjective image quality
assessment,” 14th European Signal Processing Conference: EUSIPCO 2006, 5 pages (2006).
Nodine, C. F., Carmody, D. P. and Herman, E., “Eye movements during visual search for artistically embedded targets,”
Bulletin of the Psychonomic Society 13, 371-374 (1979).
Nodine, C. F., Carmody, D. P. and Kundel, H. L., “Searching for Nina,” In Senders, J., Fisher, D. F. & Monty, R. (Eds) Eye
movements and the higher psychological functions, Hillsdale, NJ: Erlbaum, 241-258 (1978).
Nyström, M. and Holmqvist, K., “Deriving and evaluating eye-tracking controlled volumes of interest for variable resolution
video compression,” Journal of Electronic Imaging 16(1), 013006 (2007).
Nyström, M. and Holmqvist, K., “Effect of compressed offline foveated video on viewing behavior and subjective quality,”
ACM Trans. on Multimedia Computing, Communications, and Applications 6(1), 14 pages (2008).
Nyström, M. and Holmqvist, K., “Semantic override of low-level features in image viewing – Both initially and overall,”
Journal of Eye Movement Research 2(2), 2:1-2:11 (2008).
Oulasvirta, A., Tamminen, S., Roto, V. and Kuorelahti, J., “Interaction in 4-second bursts: The fragmented nature of attentional
resources in mobile HCI,” Proc. CHI 2005, 919-928 (2005).
Palomba, D., Sarlo, M., Angrilli, A., Mini, A. and Stegagno, L., “Cardiac responses associated with affective processing of
unpleasant film stimuli,” International Journal of Psychopsysiology 36, 45-57 (1999).
Parkhurst, D. J. and Niebur, E., “Scene content selected by active vision,” Spatial Vision 16, 125-154 (2003).
Parkhurst, D., Law, K. and Niebur, E., “Modeling the role of salience in the allocation of overt visual attention,” Vision
Research 42, 107-123 (2002).
Partala, T. and Surakka, V., “Pupil size variation as an indication of affective processing,” Int. J. Hum.-Comput. Stud. 59(1-2),
185-198 (2003). doi:10.1016/S1071-5819(03)00017-X
Partala, T., Jokiniemi, M. and Surakka, V., “Pupillary responses to emotionally provocative stimuli,” Proc. 2000 Symposium on
Eye Tracking Research & Applications: ETRA '00, 123-129 (2000). doi:10.1145/355017.355042
Pastoor, S., “Human factors of 3D images: Results of recent research at Heinrich-Hertz-Institut Berlin,” Proc. IDW‟95 3D-7, 6972 (1995).
Peli, E., Goldstein, R. B. and Woods, R. L., “Scanpaths of motion sequences: Where people look when watching movies,”
Starkfest Conference on Vision and Movement in Man and Machines, 18-21 (2005).
Pfeiffer, T., Latoschik, M. E. and Wachsmuth, I., “Evaluation of binocular eye trackers and algorithms for 3D gaze interaction in
virtual reality environments,” Journal of Virtual Reality and Broadcasting 5(16), (2008).
Pomplun, N. and Sunkara, S., “Pupil dilation as an indicator of cognitive workload in Human-Computer Interaction,” Proc. HCI
International 2003(3), 542-546 (2003).
83.
84.
85.
86.
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
Poole, A. and Ball, L. J., “Eye tracking in Human-Computer Interaction and usability research: Current status and future
prospects,” In Ghaoui, C. (Ed) Encyclopedia of Human Computer Interaction, (2004).
Porter, S. and ten Brinke, L., “Reading between the lies: Identifying concealed and falsified emotions in universal facial
expressions,” Psychological Science 19(5), 508-514 (2008).
POV-Ray, “The Persistence of Vision Raytracer,” available online at http://www.povray.org
Radun, J., Leisti, T., Häkkinen, J., Nyman, G., Olives, J.-L., Ojanen, H. and Vuori, T., "Content and quality: Interpretationbased estimation of image quality," ACM Transactions on Applied Perception 4(4):2, (2008).
Rajashekar, U., Bovik, A. C. and Cormack, L. K., “Gaffe: A gaze-attentive fixation finding engine,” IEEE Transactions on
Image Processing 17, 564–573 (2008).
Rayner, K., “Eye movements in reading and information processing: 20 years of research,” Psychological Bulletin 124(3), 372422 (1998).
Rötting, M., “Parametersystematik der Augen- und Blickbewegungen für arbeitswissenschaftliche Untersuchungen,” PhD
Thesis, Technische Universität Berlin, (2001).
Salvucci, D. D. and Goldberg, J. H., “Identifying fixations and saccades in eye-tracking protocols”. Proc. 2000 Symposium on
Eye tracking research & applications: ETRA '00, 71-78 (2000). doi:10.1145/355017.355028
Sarter, M., Givens, B. and Bruno, J. P., “The cognitive neuroscience of sustained attention: Where top-down meets bottom-up,“
Brain Research Reviews 35(2), 146-160 (2001).
Schnipke, S. K. and Todd, M. W., “Trials and tribulations of using an eye-tracking system,” CHI '00 Extended Abstracts on
Human Factors in Computing Systems: CHI '00, 273-274 (2000).
Shih, S. and Liu, J., "A novel approach to 3-D gaze tracking using stereo cameras," IEEE Transactions on Systems, Man and
Cybernetics, Part B: Cybernetics 34(1), 234-245 (2004).
Stern, J. A. and Dunham, D. N., “The ocular system,” In Cacioppo, J. T. & Tassinary, L. G. (Eds) Principles of
Psychophysiology: physical, social and inferential elements, Cambridge: Cambridge University Press, 513-553 (1990).
Takahashi, K., Nakayama, M. and Shimizu, Y., “The response of eye-movement and pupil size to audio instruction while
viewing a moving target,” Proc. 2000 Symposium on Eye Tracking Research & Applications: ETRA '00, 131-138 (2000).
doi:10.1145/355017.355043
Tattler, B. W., Baddeley, R. J. and Gilchrist, I. D., “Visual correlates of fixation selection: effects of scale and time,” Vision
Research 45(5), 643-659 (2005). ISSN 0042-6989, doi:10.1016/j.visres.2004.09.017
Tosi, V., Mecacci, L. and Pasquali, E., “Scanning eye movements made when viewing film: Prelimary observations,” Int. J.
Neuroscience 92(1-2), 47-52 (1997).
Tovee, M. J., [An introduction to the visual system], Cambridge university press, Cambridge, 2008.
Uehara, S., Hiroya, T., Kusanagi, H., Shigemura, K. and Asada, H., “1-inch diagonal transflective 2D and 3D LCD with HDDP
arrangement,” Proc. SPIE-IS&T Electronic Imaging 2008, Stereoscopic Displays and Applications XIX 6803, (2008).
Viirre, E., Van Orden, K., Wing, S., Chase, B., Pribe, C., Taliwal, V. and Kwak, J., “Eye movements during visual and auditory
task performance,” Society for Information Display SID 04 Digest, (2004).
Vu, C. T., Larson, E. C. and Chandler, D. M., “Visual fixation patterns when judging image quality: Effects of distortion type,
amount, and subject experience,” Proc. 2008 IEEE Southwest Symposium on Image Analysis and Interpretation: SSIAI, 73-76
(2008). doi:10.1109/SSIAI.2008.4512288
Vuori, T., Olkkonen, M., Pölönen, M., Siren, A. and Häkkinen, J., “Can eye movements be quantitatively applied to image
quality studies?,” Proc. Third Nordic Conference on Human-Computer interaction: NordiCHI '04 82, 335-338 (2004).
doi:10.1145/1028014.1028067
Walker, R. and McSorley, E., “The parallel programming of voluntary and reflexive saccades,” Vision Research 46(13), 20822093 (2006).
Wandell, B. A., [Foundations of vision], Sinauer Associates, Inc, Sunderland, Massachusetts, (1995).
Wismeijer, D. A., Erkelens, C. J., van Ee, R. and Wexler, M., “Depth cue combination in spontaneous eye movements,” J Vis
10(6):25, (2010).
Woodgate, G. J. and Harrold, J., “Autostereoscopic display technology for mobile 3DTV applications,” Proc. SPIE 6490A-19,
(2007).
Yarbus, A. L., [Eye movements and vision], NY: Plenum Press, (1967).