Michael Gian V. Gonzales
Title: On-Device Neural Speech Understanding for consumer devices
Supervision Team: Michael Schukat, UoG / Naomi Harte, TCD / Peter Corcoran, UoG / Gabriel Costache, Xperi / Martin Walsh, Xperi
Description: Today’s voice-based interfaces rely on a cloud-based infrastructure for data processing and interpretation that causes issues with regard to access to personal voice-data by large corporations. Therefore, there is a growing trend in industry to move data-processing and analysis closer to the source of data – the microphone that senses speech data. This can be achieved using newly-developed neural accelerators (i.e. NVIDIA’s Jetson, Google’s TPU, Xilinx Vitus-AI and Perceive’s Ergo) that implement emerging neural-processing techniques. This research aims to investigate emerging trends in speech analysis and understanding with a focus on neural implementations and optimizations for the above accelerator platforms. It will examine memory and data-bandwidth aspects of recurrence in neural speech analysis, explore neural speech enhancement techniques to pre-process voice signals that are picked up from low-cost microphones, explore speech representations for neural accelerator platforms, and deliver a proof-of-concept smart-speaker, demonstrating feasibility of a practical stand-alone neural speech interface.