Liangliang CaoScientist at Apple Inc.
[LinkedIn], [Google Scholar], [DBLP], [arXiv]
I am a principal scientist at Apple in Cupertino, California. Previously I worked as a scientist/engineer at Google, Yahoo!, and IBM, as well as as an adjunct associate professor at Columbia University and UMass. Before that, I studied in UIUC as a Ph.D. student, in CUHK as a master, and in USTC as a bachelor student. During my Ph.D. study, I interned in Kodak, Microsoft, and NEC labs. I feel very fortunate to learn from many fantastic colleagues and mentors from these companies and universities.
I had a lot of experience in integrating cutting-edge research with products. I was a recipient of the ACM SIGMM Rising Star Award. I won 1st place in the ImageNet LSVRC Challenge in 2010. In 2016, I co-founded a startup named Switi Inc and worked as the CTO. After the startup was acquired, I worked as the tech lead for Google Cloud speech modeling and then the tech lead for Cloud vision modeling. I also helped Google Cloud win one of the largest contract in the history of Cloud AI.
Here is my (outdated) CV.
- Can content generation AI become the next Web search?
- Peering into the future of speech and visual recognition
- Two ways of iterating AI systems
- Memory of my Ph.D. Advisor Prof. Thomas Huang
If you are interested in my work in Google Cloud Speech, here is an overview talk at Rutgers University. It was based on the following paper:
- RNN-T model long form erros (SLT'21)
- Non-Streaming Model Distillation On Unsupervised Data （ICASSP 2021)
- Targeted Universal Adversarial Perturbations (Interspeech 2021)
- Bridging the gap between streaming and non-streaming ASR (Interspeech 2021)
- BigSSL (IEEE Journal of Selected Topics in Signal Processing 2022)
- Input Length Matters (submitted)
Other recent papers:
- ICASSP'22: "Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition" [arXiv]
- INTERSPEECH'21: "Residual Energy-Based Models for End-to-End Speech Recognition" [arXiv]
- INTERSPEECH'21: "Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction" [arXiv]
- ICASSP'21 "Confidence Estimation for Attention-based Sequence-to-sequence Models for Speech Recognition" [arXiv]
- ICASSP'21 "Learning Word-Level Confidence For Subword End-to-End ASR" [arXiv]
- ECCV'20: Label-Efficient Learning on Point Clouds using Approximate Convex Decompositions [paper]
- MICCAI'20: Deep Active Learning for Effective Pulmonary Nodule Detection [paper]
- ICASSP'20 "Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models" [paper][dataset]
- MICCAI'19 "3DFPN-HS2: 3D Feature Pyramid Network Based High Sensitivity and Specificity Pulmonary Nodule Detection" [paper]
- CVPR'19 "Automatic adaptation of object detectors to new domains using self-training" [code] [paper] [project]
- TPAMI'19 "Focal Visual-Text Attention for Memex Question Answering" [code and dataset] [paper]