Liangliang Cao

Senior Staff Research Scientist and Senior Manager, Google
[LinkedIn], [Medium],
[Google Scholar], [DBLP], [arXiv]


I am a senior staff research scientist in Google AI, working as the tech lead for Google Cloud Vision modeling quality. Previoulsy I was the tech lead for Google Cloud Speech modeling, responsible for deploying the cutting-edge end2end speech models for Google's enterprise customers. Before Google, I worked in IBM Watson Research Centers and Yahoo Labs. From 2016 to 2018, I co-founded Switi Inc and worked as the CTO. I enjoyed teaching and was a part-time/adjunct associate professor at Columbia University and UMass. I won the 1st place of ImageNet LSVRC Challenge in 2010 and was a recipient of ACM SIGMM Rising Star Award. Here is my (outdated) CV.


- Memory of my Ph.D. Advisor Prof. Thomas Huang


- Talk at Rutgers University: "Reducing Longform Errors in End2End Speech Recognition"

- IEEE Journal of Selected Topics in Signal Processing: "BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition" [arXiv]

- "Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition" [arXiv]

- ICASSP'22: "Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition" [arXiv]

- INTERSPEECH'21: "Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models" [arXiv]

- INTERSPEECH'21: "Residual Energy-Based Models for End-to-End Speech Recognition" [arXiv]

- INTERSPEECH'21: "Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction" [arXiv]

- INTERSPEECH'21: "Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models" [arXiv]

- ICASSP'21 "Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data" [arXiv]

- ICASSP'21 "Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition" [arXiv]

- ICASSP'21 "Learning Word-Level Confidence For Subword End-to-End ASR" [arXiv]

- Google's On-Premise Speech2Text is launched! It is the first RNN-T model on-premise. I am thankful for the great experience to work as tech lead/manager and to collaborated with many fantastic colleagues. See reports from Forbes, TechTarget, ZDNet.

- SLT'21 "RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions" [arXiv]

- ECCV'20: Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions [paper]

- MICCAI'20: Deep Active Learning for Effective Pulmonary Nodule Detection [paper]

- ICASSP'20 "Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models" [paper][dataset]

- MICCAI'19 "3DFPN-HS2: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection" [paper]

- CVPR'19 "Automatic adaptation of object detectors to new domains using self-training" [code] [paper] [project]

- TPAMI'19 "Focal Visual-Text Attention for Memex Question Answering" [code and dataset] [paper]