Liangliang Cao

Senior Staff Research Scientist and Manager, Google
Research Associate Professor (Affiliated) UMass CICS
[LinkedIn], [Medium],
[Google Scholar], [DBLP], [arXiv]


Liangliang is a senior staff research scientist and manager in Google AI. Recently he is responsible for deploying the cutting-edge end2end speech models for Google's enterprise customers. He is also interested in computer vision and cross-dataset recognizers. He won the 1st place of ImageNet LSVRC Challenge in 2010. He was a recipient of ACM SIGMM Rising Star Award. In his spare time, he enjoys playing with his son, helping young students, and debugging machine learning algorithms. Here is his (outdated) CV.


- Memory of my Ph.D. Advisor Prof. Thomas Huang


- ICASSP'21 "Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data" [arXiv]

- ICASSP'21 "Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition" [arXiv]

- ICASSP'21 "Learning Word-Level Confidence For Subword End-to-End ASR" [arXiv]

- Google's On-Premise Speech2Text is launched! It is the first RNN-T model on-premise. I am thankful for the great experience to work as tech lead/manager and to collaborated with many fantastic colleagues. See reports from Forbes, TechTarget, ZDNet.

- SLT'21 "RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions" [arXiv]

- ECCV'20: Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions [paper]

- MICCAI'20: Deep Active Learning for Effective Pulmonary Nodule Detection [paper]

- Invited talk "Improve recognition on out-domain data", on the CVPR 2020 "vision for agriculture" workshop.

- ICASSP'20 "Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models" [paper][dataset]

- MICCAI'19 "3DFPN-HS2: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection" [paper]

- CVPR'19 "Automatic adaptation of object detectors to new domains using self-training" [code] [paper] [project]

- TPAMI'19 "Focal Visual-Text Attention for Memex Question Answering" [code and dataset] [paper]