Tanel Alumäe is the head of Laboratory of Language Technology at Tallinn University of Technology (TalTech). He received his PhD degree from the same university in 2006. After that, he has worked in several research teams, including LIMSI/CNRS, Aalto University and Raytheon BBN Technologies. His recent research has focused on practical approaches to low-resource speech and language processing.
Weakly supervised training for speaker and language recognition
Speaker identification models are usually trained on data where the speech segments corresponding to the target speakers are hand-annotated. However, the process of hand-labelling speech data is expensive and doesn’t scale well, especially if a large set of speakers needs to be covered. Similarly, spoken language identification models require large amounts of training samples from each language that we want to cover.
This talk will show how metadata accompanied with speech data found on the internet can be treated as weak and/or noisy labels for training speaker and language identification models. Speaker identification models can be trained using only the information about speakers appearing in each of the recordings in training data, without any segment level annotation. For spoken language identification, we can often treat the detected language of the description of the multimedia clip as a noisy label. The latter method was used to compile VoxLingua107, a large scale speech dataset for training spoken language identification models. The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives. It contains data for 107 languages, with 62 hours per language on the average. A model trained on this dataset can be used as-is, or finetuned for a particular language identification task using only a small amount of manually verified data.
His talk takes place on Tuesday, November 9, 2021 at 13:00 in “little theater” R211 (next to Kachnicka student club in “Stary Pivovar”). The talk will be streamed live and recorded at
https://youtu.be/fpsC0jzZSvs – thanks FIT student union for support!