Measuring and Visualizing Attention
in Space with 3D Attention Volumes

Dr. Thies Pfeiffer

A.I. Group, Faculty of Technology, Bielefeld University

ETRA 2012 March 28th, 2012

Motivation

Joint Attention with Virtual Agents

Motivation

Gaze Interaction with Virtual Agents: Situation

  • Face-to-Face interactions with virtual agents
  • Complex environments with many objects in the background
  • Small objects and fingers can be brought into the interaction space between interlocutors
  • Human hand can be brought into interaction space (non-virtual object)

Problems with 2D gaze tracking in 3D worlds

  • Difficult to disambiguate between gaze on foreground and background objects
  • This is also true for real environments!

Overview

Overview

  • Review of Visualization Methods for 2D Gaze Tracking
  • Going from 2D to 3D
  • Measuring the 3D Point of Regard
    • Geometry-based Approaches
    • Holistic Approaches
  • Visualizing Attention in Space
  • Conclusion

2D Visualisations for Gaze

Scanpath

  • Basis of the association between eye fixation and visual context is the point of regard (PoR) (Dodge 1907)
  • Scanpaths show the sequence of several point of regards (Yarbus 1967, but russian original earlier in 1965)
  • Scanpaths are a qualitative visualization
  • Several variations exist: size of PoR scaled to duration, animated scanpaths, etc.
Yarbus (1967): Scanpath on a 2D image
Scanpath on a 2D image
Yarbus, A. L. (1967). Eye Movements and Vision. Plenum Press.

2D Visualisations for Gaze

Regions of Interest

  • Aggregate fixations over certain regions
  • Regions of Interest (RoI) provide quantitative feedback
  • Several variations exists, example: links between regions reveal transition probabilities
Fitts, Jones and Milton (1950). Eye movements of aircraft pilots during instrument-landing approaches
Mapping from fixations to Regions of Interest with transition probabilities
Fitts, Jones and Milton (1950). Eye movements of aircraft pilots during instrument-landing approaches. Aeronautical Engineering Review.

2D Visualisations for Gaze

Heatmaps / Fixation Maps / Attentional Landscapes

  • Concept introduced by Pomplun, Ritter and Velichkovsky (1996), today often referred to as Heatmaps
  • Re-Interpretation of Attentional Landscapes (Elias, Sherwin and Wise, 1984) on images
  • Elaborated by Wooding (2002) as Fixation Maps
  • Model area of high acuity of gaze as gaussian distribution (SD 1 degree of visual angle)
  • Aggregate over several PoR
  • Heatmaps provide qualitative feedback
Pomplun, Ritter and Velichkovsky (1996). Heatmap of gaze on Boring figure
Heatmap of gaze on Boring figure
Pomplun, Ritter and Velichkovsky (1996). Disambiguating complex visual information: Towards communication of personal views of a scene. Perception.

2D Stimuli

Research Question

How do results from studies on 2D (or 2.5D) images scale to reality?

Motiviation for investigating 3D stimuli

  • Problematic areas: navigation, motor control, spatial language, ...
  • Eye tracking is leaving the desktop-lab: mobile eye tracking, eye tracking in cockpits, eye tracking in virtual reality
  • 3D content is becoming widespread (commercial interests)
  • Virtual prototyping is taking up speed
  • ... you might know some more

Going from 2D to 3D

Going from 2D to 3D

Requirements (see paper for details)

  • eye tracker: monocular or binocular (better)
  • body tracking: outside-in or inside-out tracking of at least head position and orientation
  • data fusion unit: integrates eye and body tracking
  • solution for calibration: depends on set-up, could be laser pointer, marker, 3D display
  • geometry model database: a must for geometry-based approaches

Focus of this talk

  • 3D point of regard estimating unit: geometry-based or holistic approach
  • 3D gaze visualization

Estimating the 3D Point of Regard

Geometry-based Approaches

2D Gaze Tracking

2D Gaze Tracking

  • is already a 3D point of regard estimation
  • but it is based on hard constraints
    • a fixed position of the screen plane
    • a (relatively) fixed position of the user
    • the screen is normally not the object of interest (they are on the screen)

Geometry-based Approaches

2.5D Gaze Tracking

Rötting, Göbel and Springer (1999). Automatic object identification and analysis of eye movement recordings. MMI-Interaktiv
2.5D object geometry acquisition used by Rötting et al. (1999)
Rötting, Göbel and Springer (1999). Automatic object identification and analysis of eye movement recordings. MMI-Interaktiv

2.5D Gaze Tracking

  • offline process for the semi-automatic detection of 2.5D PoR (Rötting et al. 1999)
  • based on monocular eye tracking, scene-camera and Ascension Flock of Birds 6DoF tracker
  • 2 staged process
    1. object regions were manually labeled in different views to extract 2.5D geometry-model
    2. object regions were projected onto scene-camera for each frame and fixations clustered on the image plane

Geometry-based Approaches

Tanriverdi and Jacob (2000)

  • based on Virtual Reality with Head-Mounted-Display (HMD) and monocular eye tracking
  • cast visual line of sight into 3D world (geometries known) to identify model of interest
  • use model-based dwell times for the selection of objects: interactive use
  • HMD simplifies approach, as eye-screen transformation is fixed

Pfeiffer (2008)

  • extended this approach to CAVE-like setups with 3D projection screens
  • problem: eye-screen transformation is dynamic, several screens
  • idea: virtual calibration screen interlocked with head movements

Geometry-based Approaches

Duchowski et al. (2001)

  • use binocular eye tracking with a HMD
  • designed for diagnostic use
  • returns visual line of sight for each eye and computes intersection point
  • creates virtual line of sight (cyclopean)
  • computes geometry intersection based on virtual line of sight
    • depth information of 3D PoR is thrown away in favor of geometry-based approach (!)
Duchowski et al. (2001): Binocular Eye Tracking in VR for Visual Inspection Training. Virtual Reality Software and Technology.
Setup for binocular eye tracking in HMD-VR
Duchowski et al. (2001): Binocular Eye Tracking in VR for Visual Inspection Training. Virtual Reality Software and Technology.

Geometry-based Approaches

Advantages

  • Object-centered
    • Moving objects are easier to handle
  • requires only monocular eye tracking
  • only standard calibration needed
  • suggest a high achievable precision (?)

Disadvantages

  • Object-centered
    • No distribution of attention on other objects
  • based on strong assumptions
    • tracking has a high acuity (small objects/letters?, partial occlusions?)
    • first model hit by the ray always wins (transparencies?, geometries with holes?)
    • static dominant eye (changes based on task?, dual target problem?)
  • problems with foreground/background disambiguation

Holistic Approaches

  • determine the 3D PoR based on measurements alone
  • require at least two viewing directions (binocular eye tracking or temporal monocular eye tracking)
  • mapping to geometries only done for quantitative interpretation
Pfeiffer et al. (2009). Evaluation of Binocular Eye Trackers and Algorithms for 3D Gaze Interaction in Virtual Reality Environments. JVRB.
3D PoR triangulation based on vergence
Pfeiffer et al. (2009). Evaluation of Binocular Eye Trackers and Algorithms for 3D Gaze Interaction in Virtual Reality Environments. JVRB.

Holistic Approaches

Essig et al. (2006)

  • binocular eye tracking on desktop VR with anaglyph stereo presentation of target dots
  • machine-learning approach (parameterized self-organizing map) to 3D PoR estimation
  • significant reduction of error compared to naive 3D triangulation (45 percent)
Essig et al. (2006). A neural network for 3D gaze recording with binocular eye trackers. The International Journal of Parallel, Emergent and Distributed Systems.
Setup with anaglyph stereo glasses and binocular eye tracker (Eye Link I)
Essig et al. (2006). A neural network for 3D gaze recording with binocular eye trackers. The International Journal of Parallel, Emergent and Distributed Systems.

Holistic Approaches

Pfeiffer et al. (2009)

  • extended the approach of Essig et al. (2006) to generic 3D environments
    • Virtual Reality environments presented with shutter glasses
    • Real World environments
  • applied this approach to object selection tasks
Binocular eye tracker (Arrington Research), polarized stereo glasses and marker for optical tracking
Binocular eye tracker (Arrington Research), polarized stereo glasses and marker for optical tracking

Holistic Approaches

Advantages

  • scene-centered
    • spreading attention build in
  • requires binocular eye tracking for real-time performance
  • grounded in the measurements, not in geometry assumptions
  • does not require geometry models
  • less accurate with increasing distance, but fallback to geometry-based approach possible

Disadvantages

  • scene-centered
    • problem with moving objects
  • more expensive (binocular vs. monocular)
  • more effort in calibration
  • visualization requires large memory (think of several hundreds of parallel heatmaps) or GPUs

Visualizing Attention in Space

Visualizations for Geometry-based Approaches

Attentional Maps

  • proposed by Stellmach, Nacke and Dachselt (2010)
  • come in different flavours
    • Projected Attentional Maps (2D projection == common heatmap)
    • Object-based Attentional Maps
    • Surface-based Attentional Maps
  • proposed solution based on geometry-based 3D POR/Desktop VR

Visualizations for Holistic Approaches

Target Structure

Interactive visualization of the example structure in immersive virtual reality used for testing. Press 'a' for viewing the full scene.

Example for Virtual Reality

  • Dense structure within a volume of $30\,cm \times 30\,cm \times 30\,cm$
  • Constructed out of a virtual version of a wooden toy-kit
  • Extension of individual building blocks about $1\,cm$ to $2\,cm$ (not considering bars)

Visualizations for Holistic Approaches

3D Scanpath - VP 09

3D Scanpath of VP 09
Individual 3D scanpath of VP 09, specific sequence of objects, holistic 3D PoR, static radius

3D Scanpath

  • $POR_{sphere}(\vec{x}): (\vec{x}-\vec{p_{POR}})^2 \le r(\vec{y})^2$
  • $r(\vec{y}): |\vec{y}-\vec{p_{eye}}|\tan \alpha$
  • with
    • $POR_{sphere}$: membership function for $\vec{x}$
    • $\vec{p}_{POR}$: 3D point of regard
    • $\vec{p}_{eye}$: 3D position of the observing eye
    • $\alpha$: angle of high visual acuity
    • $\vec{y}$: either $\vec{p_{POR}}$ (static for one POR, as in 2D) or $\vec{x}$ (dynamic)

Visualizations for Holistic Approaches

3D Attention Volumes

3D Attention Volume aggregating over 10 participants
3D Attention Volume aggregating over 10 participants, holistic 3D PoR

3D Attention Volumes

  • Basic idea: compute the share of visual attention for every point $\vec{x}$ in space (instead of a plane)
  • $3DAV(\vec{x}): d(t) e^{-\frac{|\vec{x} - \vec{p}_{POR}|^2}{\sigma(\vec{p}_{eye},\vec{x})}}$
  • with $d(t)$: amplification factor depending on the duration
  • and computing $\sigma$ as a function of the actual distance of $\vec{x}$ from the eye $\vec{p}_{eye}$ (here cyclopean)
  • Image to the left shows version with fixed $\sigma$, dynamic $\sigma$ shown in video on the next slide

Visualizations for Holistic Approaches

3D Attention Volumes

Visualizations for Holistic Approaches

3D Attention Volumes for Real Objects

Object

Image of the 3D object (Real World)
Image of the real world 3D object

Attention Volumes from Different Perspectives

Conclusion

  • Reviewed of the state-of-the-art to show that
    • 3D gaze tracking is just around the corner
    • basic algorithms are there for tracking and visualization
    • costs are still high (eye tracking system plus motion capturing), but mobile systems with lower costs are within reach
  • Identified necessary steps for 3D gaze tracking and visualization
  • Presented 3D Attention Volumes
    • which extend the concept of heatmaps to 3D space
    • can not only visualize geometry-based 3D POR but also holistic 3D PORs
    • are independent of a 3D geometry model (unless the application needs a semantic interpretation)
    • can be applied to virtual and real worlds

Open Questions / Future Work

  • How can we increase the validity of the estimated distribution of attention around the 3D POR, especially in depth? Is there appropriate data available?
  • Can/Should we compare important findings based on 2D stimuli with 3D counterparts?
  • How to handle dynamic environments?
  • How can we solve the problem of real-time model acquisiton and/or update in the real world?