Eye gaze tracking, i.e. the measurement of a person’s point of gaze, is a well-established method for evaluation purposes in the HCI field, but it can also be used for implicit human computer interaction. Similar to the interaction between people, implicit HCI is very much based on contextual information, such as body postures, gestures, voice, eye movement and gaze. The knowledge of where people are looking at allows a system to react accordingly or to act pro-actively without explicit user input, e.g. by automatically adapting the information provided on a display to show more details or information of possible interest for the user.

We have developed a new approach for gaze tracking on large-scale public displays based on head pose estimation with a single camera. The underlying principle is to locate facial features like eyes, nose tip and the total face area, and to estimate the viewing direction from (i) the relative positions between these features as well as (ii) their positions with respect to that of the detecting camera. Thus, we did not track the eye gaze using e.g. the pupil corneal reflection technique, but rather estimated the head pose and inferred the user’s point of gaze by assuming that the eyes were directed straight ahead. For many public display applications, we consider it sufficient to derive the focus of visual attention from the head pose, as the large display dimension forces the user to change the head orientation accordingly.

Our approach was to divide the display into four quadrants, which allowed us to recognize whether users were looking to the left or to the right (horizontal gaze estimation), and wehther they were looking up or down (vertical gaze estimation). For the horizontal estimation, we considered the distance of the left and the right eye to the nose tip, and for the vertical estimation, the average position of the forehead has been taken into account:

The prototype has been implemented using the OpenCV library for capturing and processing the input stream from the camera as well as the SHORE object recognition engine of the Fraunhofer institute for the recognition of facial features like the eyes, nose tip and forehead positions. We used a large display (1162 x 1364 mm, 1920 x 2240 px) and a video camera (Logitech Quickcam Pro 9000, 960 x 720 px) mounted at a height of 2.1 m on top of the display.


2010 A. Sippl, C. Holzmann, D. Zachhuber, A. Ferscha – Real-Time Gaze Tracking for Public Displays – Proceedings of the 1st International Joint Conference on Ambient Intelligence (AmI-10), Malaga, Spain, 2010, pp. 167-176. [springer] [pdf]


Institute for Pervasive Computing, Johannes Kepler University Linz, Austria


Clemens Holzmann
Department of Mobile Computing, University of Applied Sciences Upper Austria, Austria
clemens.holzmann [at] fh-hagenberg.at

Andreas Sippl*, Doris Zachhuber, Alois Ferscha
Institute for Pervasive Computing, Johannes Kepler University Linz, Austria
alois.ferscha [at] jku.at

* … student in the Master’s degree programme Pervasive Computing