Liangliang Cao

Scientist at Apple Inc.
[LinkedIn], [Google Scholar], [DBLP], [arXiv]


I am a principal scientist at Apple in Cupertino, California. Previously I worked as a scientist/engineer at Google, Yahoo!, and IBM, as well as as an adjunct associate professor at Columbia University and UMass. Before that, I studied in UIUC as a Ph.D. student, in CUHK as a master, and in USTC as a bachelor student. During my Ph.D. study, I interned in Kodak, Microsoft, and NEC labs. I feel very fortunate to learn from many fantastic colleagues and mentors from these companies and universities.

I had a lot of experience in integrating cutting-edge research with products. I was a recipient of the ACM SIGMM Rising Star Award. I won 1st place in the ImageNet LSVRC Challenge in 2010. In 2016, I co-founded a startup named Switi Inc and worked as the CTO. After the startup was acquired, I worked as the tech lead for Google Cloud speech modeling and then the tech lead for Cloud vision modeling. I also helped Google Cloud win one of the largest contract in the history of Cloud AI.

Here is my (outdated) CV.



Most of my recent papers are available on arXiv. If you are looking for papers published before 2019, see here.

  • Preprint: "STAIR: Learning Sparse Text and Image Representation in Grounded Tokens" [arXiv]
  • Preprint: "Exploiting Category Names for Few-Shot Classification with Vision-Language Models" [arXiv]
  • IEEE JSTSP'22: "BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition", IEEE Journal of Selected Topics in Signal Processing. [arXiv]
  • ArXiv: "Input Length Matters: Improving RNN-T and MWER Training for Long-form Telephony Speech Recognition" [arXiv]
  • ICASSP'22: "Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition" [arXiv]
  • INTERSPEECH'21: "Residual Energy-Based Models for End-to-End Speech Recognition" [arXiv]
  • INTERSPEECH'21: "Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction" [arXiv]
  • ICASSP'21: "Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition" [arXiv]
  • ICASSP'21: Non-Streaming Model Distillation On Unsupervised Data [arXiv]
  • Interspeech'21: Targeted Universal Adversarial Perturbations [arXiv]
  • Interspeech'21: Bridging the gap between streaming and non-streaming ASR [arXiv]
  • SLT'21: "RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions" [arXiv]
  • ICASSP'21 "Learning Word-Level Confidence For Subword End-to-End ASR" [arXiv]
  • ECCV'20: Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions [paper]
  • MICCAI'20: Deep Active Learning for Effective Pulmonary Nodule Detection [paper]
  • ICASSP'20 "Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models" [paper][dataset]
  • MICCAI'19 "3DFPN-HS2: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection" [paper]
  • CVPR'19 "Automatic adaptation of object detectors to new domains using self-training" [code] [paper] [project]
  • TPAMI'19 "Focal Visual-Text Attention for Memex Question Answering" [code and dataset] [paper]