Ph.D. student:

Razan Jaberibraheem, DSV
Expert reviewers:

Pawel Wozniak, Docent, Uppsala Universitet/Universiteit Utrecht

Chiara Rossitto, Assistant Proffesor, DSV

Main Supervisor:

Professor Barry Brown, DSV


Donald McMillan, Assistant Professor, DSV

Kenton O’Hara, Principal Research Manager, Microsoft


Designing Gaze Interaction: Tama—a Gaze Activated Smart Speaker


Speech technologies are increasing in popularity by offering new interaction modalities for users. Despite the prevalence of these devices, and the rapid improvement of the underlying technology, there has been a slower improvement in interaction with them.

Spoken interaction design centres around the use of a wake-word to initiate interaction, and the transcription of the users spoken instruction to complete the task. However, in human-to-human conversation, speech is initiated by and supplemented with a range of other modalities, such as gaze and gesture.
These modalities are mostly ignored in the current generation of speech systems, as highlighted by the included review of 10 years of mobile speech interactive systems. This work addresses the need to better understand how human-technology `conversations' can be improved by borrowing from human-human interaction.
This research presents Tama, a gaze activated smart speaker, designed to explore the use of gaze in conversational interaction. Tama uses gaze as an indication of attention and intent to interact on behalf of the user, and as feedback. This work includes a multi-user study that explored this theme with multiple people and including ongoing, non-system directed speech.

The findings of the study suggests that gaze can be used as an effective way of initiating interaction with a system, and highlights the problems of using gaze as an input method to establish and maintain mutual gaze with the system. Further analysis was conducted using machine learning methods to categorise how and when users looked at the system in relation to the conversational interaction, exposing patterns of looking both similar to and distinct from human-human interaction.
In future work I propose integrating more complex use of gaze as both an input and output for conversational interaction with Tama, exploring the use of physical gestures and posture to enhance interaction, and supplementing the current use of transcribed human speech with non-lexical information such as prosody and volume.