Monthly Archives: November 2023

Ondřej Klejch: Deciphering Speech – a Zero-Resource Approach to Cross-Lingual Transfer in ASR

OndřejOndřej Klejch is a senior researcher in the Centre for Speech Technology Research in the School of Informatics at the University of Edinburgh. He obtained his Ph.D. from the University of Edinburgh in 2020 and received his M.Sc. and B.Sc. from Charles University in Prague. He has been working on building automatic speech recognition systems with limited training data and supervision within several large projects funded by EPSRC, H2020, and IARPA. His recent work investigated semi-supervised and unsupervised training methods for automatic speech recognition in low-resource languages.

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

Automatic speech recognition technology has achieved outstanding performance in recent years. This progress has been possible thanks to the advancements in deep learning and the availability of large training datasets. The production models are typically trained on thousands of hours of manually transcribed speech recordings to achieve the best possible accuracy. Unfortunately, due to the expensive and time-consuming manual annotation process, automatic speech recognition is available only for a fraction of all languages and their speakers.

In this talk, I will describe methods we have successfully used to improve the language coverage of automatic speech recognition. I will describe semi-supervised training approaches for building systems with only a few hours of manually transcribed training data and large amounts of crawled audio and text. Subsequently, I will discuss training dynamics of semi-supervised training approaches and why a good language model is necessary for their success. I will then present a novel decipherment approach for training an automatic speech recognition system for a new language without any manually transcribed data. This method can “decipher” speech in a new language using as little as 20 minutes of audio and paves the way for providing automatic speech recognition in many more languages in the future. Finally, I will talk about open challenges when training and evaluating automatic-speech-recognition models for low-resource languages.

His talk takes place on Thursday, December 14, 2023 at 14:00 in A113.

Hynek Hermansky: Learning: It’s not just for machines anymore

Hynek Hermansky has been active in speech research for over 40 years, is a Life Fellow of IEEE, Fellow of the International Speech Communication Association, authored or co-authored more than 350 papers with over 20,000 citations, holds more than 20 patents and received IEEE James L. Flanagan Speech and Audio Processing Award, and ISCA Medal for Scientific Achievements. He started his career in 1972 at Brno University of Technology, obtained his D.Eng.. degree from the University of Tokyo, worked for Panasonic Technologies, U S WEST Advanced Technologies, the Oregon Graduate Institute, IDIAP Martigny, the Johns Hopkins University, and Google Deep Mind. Currently, he is a Researcher at Speech@FIT BUT, and an Emeritus Professor at the Johns Hopkins University.

Learning: It’s not just for machines anymore

Machine recognition of speech requires training on a large amount of speech training data. Subsequently, research in machine recognition of speech consists mainly of getting hands-on large amounts of speech training data combined, often by a try-and-error, with the appropriate combination of processing modules. Advances are mostly being evaluated by error rates observed in recognition of test data. Such a process may be missing one of the prime goals of scientific endeavor, which is to obtain new knowledge, applicable to other applications. We argue that speech data can be used to obtain relevant hearing knowledge, which is used in decoding messages in speech, and report on some experiments, which support this notion.

His talk takes place Wednesday, November 22, 2023 at 14:00 in E105.

Video recording of the talk is publicly available.