Liangliang Cao

Senior Staff Research Scientist and Manager, Google
llcao[at]google.com
[LinkedIn], [Medium],
[Google Scholar], [DBLP], [arXiv]

curiosity.jpg

I am a senior staff research scientist and manager in Google AI. Recently I am the quality Tech lead for Google Cloud Speech, responsible for deploying the cutting-edge end2end speech models for Google's enterprise customers. I am also interested in computer vision and artificial general intelligence. I won the 1st place of ImageNet LSVRC Challenge in 2010 and was a recipient of ACM SIGMM Rising Star Award. Before Google, I worked in IBM Watson Research Centers and Yahoo Labs. From 2016 to 2018, I co-founded Switi Inc and worked as the CTO. I enjoyed teaching and was a part-time/adjunct associate professor at Columbia University and UMass. Here is my (outdated) CV.

Memory

- Memory of my Ph.D. Advisor Prof. Thomas Huang

News

- INTERSPEECH'21: "Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models" [arXiv]

- INTERSPEECH'21: "Residual Energy-Based Models for End-to-End Speech Recognition" [arXiv]

- INTERSPEECH'21: "Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction" [arXiv]

- INTERSPEECH'21: "Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models" [arXiv]

- ICASSP'21 "Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data" [arXiv]

- ICASSP'21 "Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition" [arXiv]

- ICASSP'21 "Learning Word-Level Confidence For Subword End-to-End ASR" [arXiv]

- Google's On-Premise Speech2Text is launched! It is the first RNN-T model on-premise. I am thankful for the great experience to work as tech lead/manager and to collaborated with many fantastic colleagues. See reports from Forbes, TechTarget, ZDNet.

- SLT'21 "RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions" [arXiv]

- ECCV'20: Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions [paper]

- MICCAI'20: Deep Active Learning for Effective Pulmonary Nodule Detection [paper]

- ICASSP'20 "Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models" [paper][dataset]

- MICCAI'19 "3DFPN-HS2: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection" [paper]

- CVPR'19 "Automatic adaptation of object detectors to new domains using self-training" [code] [paper] [project]

- TPAMI'19 "Focal Visual-Text Attention for Memex Question Answering" [code and dataset] [paper]