Liangliang Cao

Scientist at Apple Inc.
llcao[at]apple.com
[LinkedIn], [Google Scholar], [DBLP], [arXiv]

curiosity.jpg

I am a principal scientist at Apple in Cupertino, California. Previously I worked as a scientist/engineer at Google, Yahoo!, and IBM, as well as as an adjunct associate professor at Columbia University and UMass. Before that, I studied in UIUC as a Ph.D. student, in CUHK as a master, and in USTC as a bachelor student. During my Ph.D. study, I interned in Kodak, Microsoft, and NEC labs. I feel very fortunate to learn from many fantastic colleagues and mentors from these companies and universities.

I had a lot of experience in integrating cutting-edge research with products. I was a recipient of the ACM SIGMM Rising Star Award. I won 1st place in the ImageNet LSVRC Challenge in 2010. In 2016, I co-founded a startup named Switi Inc and worked as the CTO. After the startup was acquired, I worked as the tech lead for Google Cloud speech modeling and then the tech lead for Cloud vision modeling. I also helped Google Cloud win one of the largest contract in the history of Cloud AI.

Here is my (outdated) CV.

Essays

Papers

Most of my recent papers are available on arXiv. If you are looking for papers published before 2019, see here.

If you are interested in my work in Google Cloud Speech, here is an overview talk at Rutgers University. It was based on the following paper:

Other recent papers:

  • ICASSP'22: "Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition" [arXiv]
  • INTERSPEECH'21: "Residual Energy-Based Models for End-to-End Speech Recognition" [arXiv]
  • INTERSPEECH'21: "Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction" [arXiv]
  • ICASSP'21 "Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition" [arXiv]
  • ICASSP'21 "Learning Word-Level Confidence For Subword End-to-End ASR" [arXiv]
  • ECCV'20: Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions [paper]
  • MICCAI'20: Deep Active Learning for Effective Pulmonary Nodule Detection [paper]
  • ICASSP'20 "Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models" [paper][dataset]
  • MICCAI'19 "3DFPN-HS2: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection" [paper]
  • CVPR'19 "Automatic adaptation of object detectors to new domains using self-training" [code] [paper] [project]
  • TPAMI'19 "Focal Visual-Text Attention for Memex Question Answering" [code and dataset] [paper]