Zhaofeng Lin
Title: Multimodal and Agile Deep Learning Architectures for Speech Recognition
Supervision Team: Naomi Harte, TCD / Robert Ross, TU Dublin
Description: Speech recognition is central to technology such as Siri and Alexa, and works well in controlled environments. However, machines still lag behind humans in our ability to seamlessly interpret multiple cues such as facial expression, gesture, word choice, mouth movements to understand speech in more noisy or challenging environments. Humans also have a remarkable ability to adapt on the fly to changing circumstances in a single conversation, such as intermittent noise or speakers with significantly different speaking styles or accents. These two skills make human speech recognition extremely robust and versatile. This PhD seeks to develop deep learning architectures that can better integrate the different modalities of speech and also be deployed in an agile manner, allowing continuous adaptation to external factors. These two aspects are inherently intertwined and are key to developing next-generation speech recognition solutions.