Microphone-Array Front-end Processing for Speech Enhancement
Athena-RC-IAMU developed a showcase that demonstrates a state-of-the-art multi-channel speech enhancement system based on an array of MEMS (microelectromechanical sensors). The showcase was implemented based on MATLAB. It performs source localization for TDOA estimation and time alignment and then speech enhancement. It also displays the waveforms and spectrograms of the noisy and enhanced signals, the estimated source location, and the frame-based SNR of the noisy and enhanced signals (see Figure 1) as an objective speech quality measure. The application permits real-time recording from a MEMS microphone array (see Figure 1) provided by the DIRHA partner ST-Italy. The real-time recording MATLAB module was developed by Athena-RC-IAMU. It was implemented using objects and methods from MATLAB’s DSP System Toolbox.
Figure 1: Left: MEMS Array and Front-end Processing. Right: Display of showcase.
Click here to see a bigger resolution image of the showcase
Home control demo for Portuguese, Spanish and English
INESC-ID developed a Java application that demonstrates baseline close-talking ASR technology for European Portuguese, Spanish and English. The application proposes a simple dialogue in which several services and objects in a household can be activated/deactivated using voice commands. The dialogue consists of a first always-listening stage, in which a certain activation command is expected (i.e. "DIRHA activate"); and a second stage in which a combination of object, action and room is expected (i.e. "Turn on the light of the kitchen"). The demo exploits the already available in-house ASR engine and text-to-speech technology based on both in-house engine (for Portuguese) and Festival (for Spanish and English). This demo is an intermediate step towards the adoption of HTK (offline) and FBK (real-time) ASR technologies.
User requirements collection and tasks definition
For the Requirements gathering New Amuser led the definition of different techniques and the final decision of the method used to conduct the qualitative study.
Within the T1.1 the following activities have been carried out:
- Choice of the preferable technique to collect the requirements and individuation of tips for the check list and the semi-structured interview
- Preparation of the material for the survey
- Conduction of the interviews
The conduction and the analysis of the interviews have been in charge to New Amuser, involving full time the reference person both to go to the users’ houses to interview them and then to analyze the collected material.
The results of this survey and some possible application scenarios have been described in D1.1.
According to the results presented in this first deliverable, a first draft of the dialogue flow has been proposed and discussed among partners to identify the application scenarios and the features that the user interface should provide to the end-users.
To share this hypothesis among partners, a dialogue flow representing the voice interaction with the main services has been circulated.
The result of these discussions is a design dialogue that has been also assessed with a Wizard of Oz experiment.
This experiment has been conducted at December 2012 in a home automated apartment in Trento (ITEA apartment) with 11 Italian subjects that had not any motor impairment.
To improve the voice interface and understand possible problems carried out by the initial design, the operating person (wizard) followed the dialogue flow shared by the partners, asking the subjects to solve some tasks (based on the services/facilities that gained the best scores in the users' requirement study) and then to answer to a questionnaire .
This assessment activity has been partially used to define the user interface of the Dialogue manager for the show case (D5.1), and it will be reported in detail in D1.3.