Sanjeev Khudanpur: Statistical Language Modeling Turns Thirty-Something: Are We Ready To Settle Down?

Sanjeev Khudanpur received the B.Tech. degree in Electrical Engineering from the Indian Institute of Technology, Bombay, in 1988, and the Ph.D. degree in Electrical and Computer Engineering from the University of Maryland, College Park, in 1997. His doctoral dissertation was supervised by Prof. Prakash Narayan and was titled Model Selection and Universal Data Compression. Since 1996, he has been on the faculty of the Johns Hopkins University. Until June 2001, he was an Associate Research Scientist in the Center for Language and Speech Processing and, from July 2001 to June 2008, an Assistant Professor in the Department of Electrical and Computer Engineering and the Department of Computer Science. He became an Associate Professor in July 2008. He is also affiliated with the Johns Hopkins University Human Language Technology Center of Excellence. In Fall 2000, he held a visiting appointment in the Institute for Mathematics and its Applications (IMA), University of Minnesota, Minneapolis, MN. He organized two IMA workshops on the role of mathematics in multimedia – “Mathematical Foundations of Speech Processing and Recognition,” and “Mathematical Foundations of Natural Language Modeling.” The talk of Sanjeev Khudanpur takes place on Friday, July 4, 1pm, at E104.

Statistical Language Modeling Turns Thirty-Something: Are We Ready To
Settle Down?

Abstract: It has been 14 years since Roni Rosenfeld described “Two Decades of Statistical Language Modeling: Where Do We Go From Here?” in a special issue of the Proceedings of the IEEE (August 2000).  Perhaps it is time to review what we have learnt in the years since? This lecture will begin with what was well known in 2000 — n-grams, decision tree language models, syntactic language models, maximum entropy (log-linear) models, latent semantic analysis and dynamic adaptation — and then move on to discuss new techniques that have emerged since, such as models with sparse priors, nonparametric Bayesian methods (including Dirichlet processes), and models based on neural networks, including  feed-forward, recurrent and deep belief networks. Rather than just a survey, the main goal of the lecture will be to expose the core mathematical and statistical problems in language modeling, and to explain how various competing methods address these issues.  It will be argued that the key to solving what appears at first blush to be a hopelessly high-dimensional, sparse-data estimation problem is to structure the model (family) and to guide the choice of parameter values using linguistic knowledge. It is hoped that viewing the core issues in this manner will enable the audience to gain a deeper understanding of the strengths and weaknesses of various approaches. And, no, we are not yet ready to settle down yet.  But we now know what we are looking for: it varies from application to application.  To each his own!