Shuai Wang
SpeechLab in Shenzhen Research Institute of Big Data, Chinese University of Hong Kong (Shenzhen)
Shuai Wang obtained a Ph.D. degree at Shanghai Jiao Tong University in 2020.09, under the supervision of Kai Yu and Yanmin Qian. During his Ph.D., his research interests included deep learning-based approaches for speaker recognition, speaker diarization, and voice activity detection. After graduation, he joined Tencent Games as a senior researcher, where he (informally) led a speech group and extended his research interest to speech synthesis, voice conversion, music generation, and audio retrieval. Currently, he is with the SpeechLab at Shenzhen Research Institute of Big Data, Chinese University of Hong Kong (Shenzhen), led by Haizhou Li.
Speaker Representation Learning: Theories, Applications and Practice
Speaker individuality information is one of the most critical elements of speech signals. By thoroughly and accurately modeling this information, it can be applied in various intelligent speech applications, such as speaker recognition, speaker diarization, speech synthesis, and target speaker extraction. In this talk, I would like to approach the speaker characterization problem from a broader perspective, extending beyond just speaker recognition. First, I will present the developmental history and paradigm shifts in speaker modeling within the framework of deep representation learning. Next, I will discuss recent advances in pre-trained model-based methods and self-supervised training techniques. I will also cover topics such as robustness, efficiency, and interpretability, as well as the various applications of speaker modeling technologies. Finally, I will introduce two open-source toolkits I developed: wespeaker and wesep. Wespeaker is currently one of the most popular toolkits for speaker embedding learning, while wesep extends its capabilities to target speaker extraction, seamlessly integrating with wespeaker. You can find related works and recommended references in my overview paper titled “Overview of Speaker Modeling and Its Applications: >From the Lens of Deep Speaker Representation Learning”.
His talk takes place on Tuesday, September 10, 2024 at 13:00 in A112. The talk will be streamed live at https://youtube.com/live/FMY5_smgrYY.