Title: Multi-Lingual Lip-Synch – Text to Speech & Audio Representation
Description: his project will apply some of the latest deep learning techniques to build specialised datasets and train advanced AI neural network models to deliver a real-time multi-lingual lip-synch for speakers in a video. This project will focus on conversion of text subtitles into an intermediate speech representation suitable across multiple languages (e.g. phonemes). The preparation and automated annotation of specialised datasets provides an opportunity for high-impact research contributions from this project. The researcher on this project will collaborate with a 2nd PhD student who will focus on photo-realistic lip-synching of the speech data. Both PhDs will have a unique opportunity to collaborate with engineers from Xperi, the supporting industry partner. The end goal is a practical production pipeline, inspired by Obamanet, for multi-lingual over-dubbing of video content from multi-lingual subtitles.