Boliang Zhang: End-to-End Task-oriented Dialog Agent Training and Human-Human Dialog Collection

Srikanth Boliang Zhang is a research scientist at DiDi Labs, Los Angeles, CA. Currently, he works on building intelligent chatbots to help humans fulfill tasks. Before that, he has interned at Microsoft, Facebook, and AT&T Labs. He received his Ph.D. in 2019 at Rensselaer Polytechnic Institute. His thesis topic focuses on applications of neural networks for information extraction for low-resource languages. He has a broad interest in applications of natural language processing. He participated in DARPA Low Resource Languages for Emergent Incidents (LORELEI) project, where he, as a core system developer, built named entity recognition and linking system for low-resource languages, such as Hausa and Oromo, and achieves first place in the evaluation four times in a row. At DiDi Labs, he leads a small group to compete in the Multi-domain Task-oriented Dialog Challenge of DSTC9 and tied for first place among ten teams.

End-to-End Task-oriented Dialog Agent Training and Human-Human Dialog Collection

Task-oriented dialog systems aim to communicate with users through natural language to accomplish a wide range of tasks, such as restaurant booking, weather querying, etc. With the rising trend of artificial intelligence, they have attracted attention from both academia and industry. In the first half of this talk, I will introduce our participation in the DSTC9 Multi-domain Task-oriented Dialog Challenge and present our end-to-end dialog system. Compared to traditional pipelined dialog architecture where modules like Natural Language Understanding (NLU), Dialog Manager (DM), and Natural Language Generation (NLG) work separately and are optimized individually, our end-to-end system is a GPT-2 based fully data-driven method that jointly predicts belief states, database queries, and responses. In the second half of the talk, as we found that existing dialog collection tool has limitations in the real world scenario, I will introduce a novel human-human dialog platform that reduces all agent activity (API calls, utterances) to a series of clicks, yet maintains enough flexibility to satisfy users. This platform enables real-time agents to do real tasks, meanwhile stores all agent’s actions that are used for training chatbots later on.

The talk will take place on Tuesday April 20th at 17:00 CEST (sorry for late hour, but Boliang is on the US West Coast), virtually on zoom https://cesnet.zoom.us/j/95296064691.

Video recording of the talk is publicly available.