Work Package 2: Simulated-real corpora and tasks

This work package has the goal both of defining the experimental tasks for research activities conducted under WP3, WP4, WP5, and of realizing real and simulated acoustic corpora for training, development and test purposes.
Several acoustic data collections will be conducted in real multi-room environments in order to acquire different background noise sequences, acoustic events, and signals that can be exploited to derive impulse responses useful to produce simulated data. Simulations will be primarily based on the application of established contamination methods, and on the use of clean speech corpora (e.g., TIMIT).
Real acoustic corpora will be designed and collected in the four targeted languages at different partner labs, based on the characteristics of the application scenarios.
Text data collection is also foreseen, with the purpose of deriving grammars, language models, and other information to train the speech understanding component.
The following tasks are present in this work-package:
  • T2.1 – Experimental task definition
  • T2.2 – Acoustic measurement in home environments
  • T2.3 – Simulation corpora development
  • T2.4 – Real corpora collection
  • T2.5 – Data transcription


A description of the DIRHA simcorpora can be found at this link.

A more detailed description of the FBK simulated corpus can be found in the FBK SHINE web pages, at this link.


