Monthly Archives: August 2017

Marc Delcroix and Keisuke Kinoshita: NTT far-field speech processing research

Marc Delcroix is a senior research scientist at NTT Communication Science Laboratories, Kyoto, Japan. He received the M.Eng. degree from the Free University of Brussels, Brussels, Belgium, and the Ecole Centrale Paris, Paris, France, in 2003 and the Ph.D. degree from the Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan, in 2007. His research interests include robust multi-microphone speech recognition, acoustic model adaptation, integration of speech enhancement front-end and recognition back-end, speech enhancement and speech dereverberation. He took an active part in the development of NTT robust speech recognition systems for the REVERB and the CHiME 1 and 3 challenges, that all achieved best performances on the tasks. He was one of the organizers of the REVERB challenge, 2014 and of ARU 2017. He is a visiting lecturer at the Faculty of Science and Engineering of Waseda University, Tokyo, Japan.

Keisuke Kinoshita is a senior research scientist at NTT Communication Science Laboratories, Kyoto, Japan. He received the M.Eng. degree and the Ph.D degree from Sophia University in Tokyo, Japan in 2003 and 2010 respectively. He joined NTT in 2003 and since then has been working on speech and audio signal processing. His research interests include single- and multichannel speech enhancement and robust speech recognition. He was the Chief Coordinator of REVERB challenge 2014, an organizing committee member of ASRU-2017. He was honored to receive IEICE Paper Awards (2006), ASJ Technical Development Awards (2009), ASJ Awaya Young Researcher Award (2009), Japan Audio Society Award (2010), and Maeshima Hisoka Award (2017). He is a visiting lecturer at the Faculty of Science and Engineering of Doshisha University, Tokyo, Japan.

Their talk takes place on Monday, August 28, 2017 at 13:00 in room A112.

NTT far-field speech processing research

The success of voice search applications and voice controlled device such as the Amazon echo confirms that speech is becoming a common modality to access information. Despite great recent progress in the field, it is still challenging to achieve high automatic speech recognition (ASR) performance when using microphone distant from the speakers (Far-field), because of noise, reverberation and potential interfering speakers. It is even more challenging when the target speech consists of spontaneous conversations.

At NTT, we are pursuing research on far-field speech recognition focusing on speech enhancement front-end and robust ASR back-ends towards building next generation ASR systems able to understand natural conversations. Our research achievements have been combined into ASR systems we developed for the REVERB and CHiME 3 challenges, and for meeting recognition.

In this talk, after giving a brief overview of the research activity of our group, we will introduce in more detail two of our recent research achievements. First, we will present our work on speech dereverberation using weighted prediction error (WPE) algorithm. We have recently proposed an extension to WPE to integrate deep neural network based speech modeling into the WPE framework, and demonstrate further potential performance gains for reverberant speech recognition.

Next, we will discuss our recent work on acoustic model adaptation to create ASR back-ends robust to speaker and environment variations. We have recently proposed a context adaptive neural network architecture, which is a powerful way to exploit speaker or environment information to perform rapid acoustic model adaptation.