Title: Noise Robust Speech Recognition : Acoustic Model Compensation and Beyond
Speaker: Khe Chai Sim, Assistant Professor, National University of Singapore
时间:5月24日下午3点至4点半
地点:上海交通大学电信群楼3楼528
Abstract:
Due to the widespread of modern portable devices such as the smartphones and tablets, voice input has gained increasing popularity for these devices. This is evident from the emergence of many voice-enabled apps such as voice search, speech translation and personal assistant, to name a few. These apps rely on Automatic Speech Recognition (ASR) technology to convert spoken utterances into text for further processing. However, ASR suffers from substantial performance degradation in noisy environment. A wide range of techniques have been developed and studied for many years to improve the robustness of ASR in the presence of noise. This talk will present two orthogonal approaches to improve ASR performance in noisy environment. The first part of this talk will describe model-based noise compensation techniques for ASR, such as Parallel Model Combination (PMC) and Vector Taylor Series (VTS). In particular, a novel technique called Trajectory-based PMC (TPMC) will be introduced. TPMC extends the traditional PMC by incorporating the Trajectory HMM formulation to yield a unified compensation for both static and dynamic parameters. The second part of this talk presents Haptic Voice Recognition (HVR), a novel multimodal interface that combines both speech and touch inputs to improve the efficiency and noise robustness of text entry on modern portable devices. HVR extends ASR by augmenting additional complementary information using touch input to improve the reliability of ASR. Simulations and empirical studies show that the initial letter of the words in the utterance provide simple and yet powerful cues to improve the efficiency and robustness of HVR.
Bio:
Dr. Khe Chai Sim is Assistant Professor at the School of Computing (SoC), National University of Singapore (NUS). He received the B.A. and M.Eng degrees in Electrical and Information Sciences from the University of Cambridge, England in 2001. He worked on the Application Programming Interface (API) for Hidden Markov Model Toolkit (HTK) (known as the ATK) for his Undergraduate final year project under the supervision of Prof. Steve Young. He was then awarded the Gates Cambridge Scholarship and completed his M.Phil dissertation "Covariance Matrix Modelling using Rank-One Matrices" in 2002 under the supervision of Dr. Mark Gales. He joined the Machine Intelligence Laboratory (MIL) (formerly the Speech, Vision and Robotics (SVR) group),Cambridge University Engineering Department in the same year as a research student, supervised by Dr. Mark Gales. He received his Ph.D degree in July 2006. He is also an alumni of Churchill College. His main research interest is in statistical pattern classification and acoustic modelling for automatic speech recognition. He also worked on the DARPA funded Effective, Affordable and Reusable Speech-to-text (EARS) project from 2002-2005 and the Global Autonomous Language Exploitation (GALE) project between 2005-2006. He was also in the IIR team which participated in the NIST 2007 Language recognition Evaluation (LRE) and the NIST 2008 Speaker Recognition Evaluation (SRE).