Barbara Schuppler: Automatic speech recognition for conversational speech, or: What we can learn from human talk in interaction

barbaraBarbara Schuppler (Signal Processing and Speech Communication Laboratory, Graz University of Technology, Austria) pursued her PhD research at Radboud Universiteit Nijmegen (The Netherlands) and at NTNU Trondheim (Norway) within the Marie Curie Research Training Network “Sound to Sense”. The central topic of ther thesis was the analysis of conditions for variation in large conversational speech corpora using ASR technology. Currently, she is working on a FWF-funded Elise-Richter Grant entitled ”Cross-layer prosodic models for conversational speech,” and in October 2019 starts her follow up project “Cross-layer language models for conversational speech.” Her research continues to be interdisciplinary; it includes the development of automatic tools for the study of prosodic variation, the study of reduction and phonetic detail in conversational speech and the integration of linguistic knowledge into ASR technology.

Automatic speech recognition for conversational speech, or: What we can learn from human talk in interaction

In the last decade, conversational speech has received a lot of attention among speech scientists. On the one hand, accurate automatic speech recognition (ASR) systems are essential for conversational dialogue systems, as these become more interactional and social rather than solely transactional. On the other hand, linguists study natural conversations, as they reveal additional insights to controlled experiments with respect to how speech processing works. Investigating conversational speech, however, does not only require applying existing methods to new data, but developing new categories, new modeling techniques and including new knowledge sources. Whereas traditional models are trained on either text or acoustic information, I propose language models that incorporate information on the phonetic variation of the words (i.e., pronunciation variation and prosody) and relate this information to the semantic context of the conversation and to the communicative functions in the conversation. This approach to language modeling is in line with the theoretical model proposed by Hawkins and Smith (2001), where the perceptual system accesses meaning from speech by using the most salient sensory information from any combination of levels/layers of formal linguistic analysis. The overal aim of my research is to create cross-layer models for conversational speech. In this talk, I will illustrate general challenges for ASR with conversational speech, I will present results from my recent and ongoing projects on pronunciation and prosody modeling, and I will discuss directions for future research.

Her talk takes place on Thursday, October 31, 2019 at 13:00 in “little theater” R211 (next to Kachnicka student club in “Stary Pivovar”).