Reinhold Häb-Umbach is a professor of Communications Engineering at the University of Paderborn, Germany. His main research interests are in the fields of statistical signal processing and pattern recognition, with applications to speech enhancement, acoustic beamforming and source separation, as well as automatic speech recognition and unsupervised learning from speech and audio. He has more than 200 scientific publications, and recently co-authored the book Robust Automatic Speech Recognition – a Bridge to Practical Applications (Academic Press, 2015). He is a fellow of the International Speech Communication Association (ISCA). His talk takes place on Monday, April 24th, at 1pm in room D0207.
Neural Network Supported Acoustic Beamforming
for Speech Enhancement and Recognition
Abstract: With multiple microphones spatial information can be exploited to extract a target signal from a noisy environment. While the theory of statistically optimum beamforming is well established the challenge lies in the estimation of the beamforming coefficients from the noisy input signal. Traditionally these coefficients are derived from an estimate of the direction-of-arrival of the target signal, while more elaborate methods estimate the power spectral density matrices (PSD) of the desired and the interfering signals, thus avoiding the assumption of an anechoic signal propagation. We have proposed to estimate these PSD matrices using spectral masks determined by a neural network. This combination of data-driven approaches with statistically optimum multi-channel filtering has delivered competitive results on the recent CHiME challenge. In this talk, we detail this approach and show that the concept is more general and can be, for example, also used for dereverberation. When used as a front-end for a speech recognition system, we further show how the neural network for spectral mask estimation can be optimized w.r.t. a word error rate related criterion in and end-to-end setup.