Project Details


Acronym: DIRHA
Full name: Distant-speech Interaction for Robust Home Applications
Reference: FP7-ICT-2011-7
Project number: 288121
Contract type: Specific Targeted Research Project
Start date: January 1, 2012
End date: December 31, 2014
Coordinator: FBK - Fondazione Bruno Kessler (Trento, Italy)
CORDIS page on DIRHA: http://cordis.europa.eu/fp7/ict/language-technologies/project-dirha_en.html


 

Challenge

The DIRHA project addressed the challenge of natural spontaneous speech interaction with distant microphones in a home environment. The main fields on which research was conducted, and for which suitable solutions were identified and embedded in real-time prototypes, were: multichannel acoustic processing, distant speech recognition and understanding, speaker identification/verification, and spoken
dialogue management. The project also aimed to investigate the use of a new type of acquisition device consisting of MEMS (Micro Electrical-Mechanical System) digital microphone arrays.

The project addressed four languages: Italian, Greek, Portuguese and German. For comparison purposes, the English language was also explored. The final prototype was integrated in automated homes and evaluated by real users.

Objective and Innovation

One of the most challenging and innovative aspects of the project was the development of a distant speech interaction system, robust to speaker position, even in a noisy and reverberant environment and eventually in a multi-speaker context. Many other projects had addressed this concept and tried to realize some early solutions. However, DIRHA investigated on novel techniques which allow the realization of distant-speech interaction in a multi-room environment and possibly with multiple users.

Among the most relevant innovative aspects, it deserves to be mentioned that acoustic scene analysis was run in an “always listening” mode (i.e., without the need of any push-to-tak button), with the goal of understanding acoustic/speech activities concurring in the given environment, and eventually delivering speech chunks to the recognition and understanding components. To this end, one needs to realize robust technologies able to tackle unforeseen acoustic environments and noisy conditions. At that time, such goals were new and far beyond the state of the art, not only for an application in the home scenario but also for other
domains.

Target group of the project

The targeted application included voice-enabled interaction with appliances and other automatic services available in a household. Although in some cases users could simply try to speak close to the microphone and in a rather controlled way, the expectation was that they would have required being able to interact at four-five meters from microphones in a crowded room, with music playing, and other possible active sound sources. For some individuals (e.g. motor impaired), this is a strong immediate requirement, which is the main reason for addressing firstly this category of users under the DIRHA project. To this purpose, a group of possible end-users were involved from the beginning, in order to define concrete and realistic user requirements. It was foreseen that the most advanced technologies resulting from the project would have been integrated in a real-time prototype installed in automated homes, and daily used by the end-users for evaluation purposes.

The result

The DIRHA project aimed both to make advances at research level in the given scientific fields and to progress at technological level, with the development of a proof-of-concept system which can represent the starting point for a next exploitation action to be addressed by the involved industrial partners. Research activities also included the creation of experimental tasks and corpora which enabled initiatives of dissemination and benchmarking at international level. As for the final prototype, it runs based on microphone devices installed in different rooms in order to monitor selectively acoustic and speech activities observable inside any space of the household. In the targeted scenario, the user can speak from any position in space, i.e. any point in any room of a house given any background noise and acoustic conditions typical of a household, and no matter of where the closest microphones are. A spoken dialogue session can be activated based on a user request, for instance in order to have access to appliances and devices, to services regarding emergency situations, to the mediacenter (e.g., to search for a given song and play it), or to fill-in and send an SMS message.

Impact

The final objective of the project targeted application of automatic speech recognition in four languages with common multimicrophone front-end, spoken dialogue management, and user interface. This has a relevant impact in terms of synergetic approach to the development of spoken language interaction systems and to the immediate evidence of a possibly easy portability to other languages. The project also represented a milestone for developers and integrators of home automation systems, since the targeted prototype can be a first proof-of-concept realization in a real world context, based on concrete and realistic user requirements and operational constraints.

The DIRHA consortium aimed to examine the impact of its novel technologies primarily with collaborative users. In other words, the DIRHA system was conceived for subjects who have, in principle, no difficulty in understanding the way to access the system in order to obtain the highest satisfaction (e.g., based on a high completion rate in the proposed tasks) and who have a very good attitude towards this experimentation. Once the basic technology was established and evaluated as reliable, other categories of users (e.g., elderly people) would have been addressed in other projects.

Another impact of the project regarded the portability of the foreseen solutions to other possible domains. In fact, the DIRHA approach and the resulting technologies could eventually be applied to several application contexts characterized by noisy environment and by the need of talking far from the microphone as, for instance, robotics, surveillance, telepresence, gaming, industry sector and manufactory.

 

This image represents the layout of one of apartments in which the prototype was installed.

apartment