Liangliang CaoSenior Staff Research Scientist and Senior Manager, Google
[Google Scholar], [DBLP], [arXiv]
I am a senior staff research scientist in Google AI, working as the tech lead for Google Cloud Vision modeling quality. Previoulsy I was the tech lead for Google Cloud Speech modeling, responsible for deploying the cutting-edge end2end speech models for Google's enterprise customers. Before Google, I worked in IBM Watson Research Centers and Yahoo Labs. From 2016 to 2018, I co-founded Switi Inc and worked as the CTO. I enjoyed teaching and was a part-time/adjunct associate professor at Columbia University and UMass. I won the 1st place of ImageNet LSVRC Challenge in 2010 and was a recipient of ACM SIGMM Rising Star Award. Here is my (outdated) CV.
- Talk at Rutgers University: "Reducing Longform Errors in End2End Speech Recognition"
- IEEE Journal of Selected Topics in Signal Processing: "BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition" [arXiv]
- "Input Length Matters: An Empirical Study Of RNN-T And MWER Training For Long-form Telephony Speech Recognition" [arXiv]
- ICASSP'22: "Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition" [arXiv]
- INTERSPEECH'21: "Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models" [arXiv]
- INTERSPEECH'21: "Residual Energy-Based Models for End-to-End Speech Recognition" [arXiv]
- INTERSPEECH'21: "Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction" [arXiv]
- INTERSPEECH'21: "Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models" [arXiv]
- ICASSP'21 "Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data" [arXiv]
- ICASSP'21 "Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition" [arXiv]
- ICASSP'21 "Learning Word-Level Confidence For Subword End-to-End ASR" [arXiv]
- Google's On-Premise Speech2Text is launched! It is the first RNN-T model on-premise. I am thankful for the great experience to work as tech lead/manager and to collaborated with many fantastic colleagues. See reports from Forbes, TechTarget, ZDNet.
- SLT'21 "RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions" [arXiv]
- ECCV'20: Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions [paper]
- MICCAI'20: Deep Active Learning for Effective Pulmonary Nodule Detection [paper]
- MICCAI'19 "3DFPN-HS2: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection" [paper]