Marc Delcroix and Keisuke Kinoshita: NTT far-field speech processing research
Marc Delcroix is a senior research scientist at NTT Communication
Science Laboratories, Kyoto, Japan. He received the M.Eng. degree from
the Free University of Brussels, Brussels, Belgium, and the Ecole
Centrale Paris, Paris, France, in 2003 and the Ph.D. degree from the
Graduate School of Information Science and Technology, Hokkaido
University, Sapporo, Japan, in 2007. His research interests include
robust multi-microphone speech recognition, acoustic model adaptation,
integration of speech enhancement front-end and recognition back-end,
speech enhancement and speech dereverberation. He took an active part in
the development of NTT robust speech recognition systems for the
REVERB and the CHiME 1 and 3 challenges, that all achieved best
performances on the tasks. He was one of the organizers of the REVERB
challenge, 2014 and of ARU 2017. He is a visiting lecturer at the
Faculty of Science and Engineering of Waseda University, Tokyo, Japan.
Keisuke Kinoshita is a senior research scientist at NTT Communication
Science Laboratories, Kyoto, Japan. He received the M.Eng. degree and
the Ph.D degree from Sophia University in Tokyo, Japan in 2003 and 2010
respectively. He joined NTT in 2003 and since then has been working on
speech and audio signal processing. His research interests include
single- and multichannel speech enhancement and robust speech
recognition. He was the Chief Coordinator of REVERB challenge 2014, an
organizing committee member of ASRU-2017. He was honored to receive
IEICE Paper Awards (2006), ASJ Technical Development Awards (2009), ASJ
Awaya Young Researcher Award (2009), Japan Audio Society Award (2010),
and Maeshima Hisoka Award (2017). He is a visiting lecturer at the
Faculty of Science and Engineering of Doshisha University, Tokyo, Japan.
Their talk takes place on Monday, August 28, 2017 at 13:00 in room A112.
NTT far-field speech processing research
The success of voice search applications and voice controlled device
such as the Amazon echo confirms that speech is becoming a common
modality to access information. Despite great recent progress in the
field, it is still challenging to achieve high automatic speech
recognition (ASR) performance when using microphone distant from the
speakers (Far-field), because of noise, reverberation and potential
interfering speakers. It is even more challenging when the target
speech consists of spontaneous conversations.
At NTT, we are pursuing research on far-field speech recognition
focusing on speech enhancement front-end and robust ASR back-ends
towards building next generation ASR systems able to understand natural
conversations. Our research achievements have been combined into ASR
systems we developed for the REVERB and CHiME 3 challenges, and for
In this talk, after giving a brief overview of the research activity of
our group, we will introduce in more detail two of our recent research
achievements. First, we will present our work on speech dereverberation
using weighted prediction error (WPE) algorithm. We have recently
proposed an extension to WPE to integrate deep neural network-based
speech modeling into the WPE framework, and demonstrate further
potential performance gains for reverberant speech recognition.
Next, we will discuss our recent work on acoustic model adaptation to
create ASR back-ends robust to speaker and environment variations. We
have recently proposed a context adaptive neural network architecture,
which is a powerful way to exploit speaker or environment information to
perform rapid acoustic model adaptation.
Miloslav Druckmüller is a Professor of Applied Mathematics at the Institute of mathematics, Faculty of Mechanical Engineering, Brno University of Technology and the head of the Department of Computer Graphics and Geometry. His main interests are numerical methods of image analysis, digital image processing, computer graphics and complex variable analysis. During the last 10 years he has been cooperating widely with the Institute for Astronomy, University of Hawaii in the field of solar coronal plasma research. He created a large archive of K-corona (photospheric light scattered on free electrons) images and temperature maps based on Fe and Ni ions observing based on data obtained during total solar eclipses during last two decades. Nowadays his research is mainly focused on processing and analysis of data obtained by NASA SDO spacecraft. His talk takes place in POSTPONED.
Vlastimil Havran is Associate professor at the Czech Technical University in Prague. His research interests include data structures and algorithms for rendering images and videos, visibility calculations, geometric range searching for global illumination, software architectures for rendering, applied Monte Carlo methods, data compression etc. POSTPONED
Kevin Köser is a senior researcher at the GEOMAR Helmholtz Centre for Ocean Research, Kiel. His main research interest lies in novel camera-based measurement techniques for (deep) sea environments and processes (3D underwater vision). These help to study resources, to explore and monitor (deep) sea habitats or to assess hazards, e.g. with respect to gas flux or seafloor dynamics. In the past years Dr. Köser has taught the classes 3D Photography and Computer Vision Lab at the Swiss Federal Institute of Technology (ETH Zurich) and has worked as a senior researcher in ETH’s Computer Vision and Geometry Lab on shape and motion extraction from photos and videos, geolocalization and image registration. POSTPONED