Itshak Lapidot: Speaker Diarization and a bit more

Itshak Lapidot emigrated from the USSR to Israel in 1971. He received his B.Sc., M.Sc., and Ph.D. degrees in Electrical and Computer Engineering Department from Ben-Gurion University, Beer-Sheva, Israel in 1991, 1994 and 2001, respectively. During one year (2002-2003) he held a postdoctoral position at IDIAP Switzerland. Dr. Lapidot was previously a lecturer at the Electrical and Electronics Engineering Department at Sami Shamoon College of Engineering (SCE), in Beer-Sheva, Israel and served as a Researcher at the Laboratoire Informatique d’Avignon (LIA), University of Avignon in France during one year (2011-2012). Recently, Dr. Lapidot assumed a teaching position with the Electrical Engineering Department at the Afeka Academic College of Engineering and joined the ACLP research team. Dr. Lapidot’s primary research interests are speaker diarization, speaker clustering and speaker verification. He is also interesting in clustering and time series analysis from theoretical point of view.

Speaker Diarization and a bit more

In the talk will be presented three approaches applied to speaker and speech technologies, but can be applied to other machine learning (ML) technologies:
1. Speaker diarization – it is answering the question “Who spoke when?” when there is no knowledge about the speakers and the environments, no prior knowledge can be used and the problem is of unsupervised type. When no prior information can be use, even to train GMM, Total Variability matrix or PLDA, a different approach must take place, which use only the data of the given conversation. One of the possible solutions is using Viterbi based segmentation of hidden-Markov-models (HMMs). It assumes a high correlation between the log-likelihood and the diarization error rate (DER). This assumption leads to different problems. One possible solution will be sown, not only probabilistic to system but to a much broader family of solution named hidden-distortion-models (HDMs).
2. In different applications like homeland security, clustering of large amount of short segments is very important. The number of segments can be from hundreds to tens of thousands and the number of speakers from 2 up to tens of speakers (about 60 speakers). Several variants of the mean-shift clustering algorithm will be presented to solve the problem. An automatic way to estimate the clustering validity will be presented as well. It is very important, as clustering can be viewed as the preprocessing before other tasks, e.g., speaker verification. Using bad clustering will lead to poor verification results. As manual qualification of the clustering is not visible, an automatic tool is almost “must” tool.
3. Data-homogeneity measure for voice comparison – given two speech utterance for speaker verification, it is important that the utterances are valid for reliable comparison. Maybe the utterances are too short, or do not share enough common information for comparison. In this case high or low likelihood ratio is meaningless. The test of the data quality should be verification system independent. Such entropy based measure will be presented and the relations with verification performance will be shown.
4. Database assessment – when the data divided into train, development and evaluation datasets it sequential data as speech it is very difficult to know whether the sets are statistically meaningful for learning (even a fair coin can fall 100 times on tail). It is important to verify the statistical validity of the datasets prior to the training, development and evaluation process and it should be verified independent from the verification system/approach. Such data assessment will be presented, based on an entropy of the speech waveform.

His talk takes place on Tuesday, January 15, 2019 at 13:00 in room A113.