Columbia University E6894, Spring 2017 (7:00-9:30pm, Wednesday, 627 Seeley W. Mudd Building)
Deep Learning for Computer Vision, Speech, and Language
Requirements for students' presentations
- Students are suggested to form a team with two persons.
- Each team is required to select one paper and prepare a 20 minutes presentation.
- The list of papers to present will be provided in this page soon, organized according to the topics of different
lectures.
- A team should contact the instructor if they want to present one paper not on the list.
- Presentation slides should be at least 15 pages long, which sent to the instructor one day before the class (for the benefits of discussion)
- The instructor may prepare questions for teams to discuss in class.
Suggested papers to present
- DNNs, CNNs, RNNs, LSTMs for speech (2/15, week 5)
-
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
-
Feature Engineering in Context-Dependent Deep Neural Networks for Conversational Speech Transcription
-
1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs
-
Convolutional Neural Networks for Speech Recognition
-
Deep Bi-directional Recurrent Networks Over Spectral Windows
-
Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling
-
Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
- End-to-End Speech Recognition (2/22, week 6)
-
Towards End-to-End Speech Recognition with Recurrent Neural Networks
-
Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition
-
Deep Speech: Scaling up end-to-end speech recognition
-
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
-
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
- Distributed representations, text categorization and language modeling (3/1, week 7)
- GloVe: Global Vectors for Word Representation http://www-nlp.stanford.edu/pubs/glove.pdf
- Improving Distributional Similarity with Lessons Learned from Word Embeddings http://www.aclweb.org/anthology/Q15-1016
- Enriching Word Vectors with Subword Information https://arxiv.org/abs/1607.04606
- Bag of Tricks for Efficient Text Classification https://arxiv.org/abs/1607.01759
- A Convolutional Neural Network for Modelling Sentences https://arxiv.org/abs/1404.2188
- Visualizing and Understanding Recurrent Networks https://arxiv.org/abs/1506.02078
- Generating Sentences from a Continuous Space https://arxiv.org/abs/1511.06349
- Character-aware neural language models https://arxiv.org/abs/1508.06615
- LSTMs/GRUs, sequence-to-sequence architectures and neural attention (3/18, week 8)
- LSTM: A Search Space Odyssey https://arxiv.org/abs/1503.04069
- Fast and Robust Neural Network Joint Models for Statistical Machine Translation http://acl2014.org/acl2014/P14-1/pdf/P14-1129.pdf
- On Using Very Large Target Vocabulary for Neural Machine Translation https://arxiv.org/abs/1412.2007
- Effective Approaches to Attention-based Neural Machine Translation https://arxiv.org/abs/1508.04025
- Neural Machine Translation in Linear Time https://arxiv.org/abs/1610.10099
- Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation https://arxiv.org/abs/1609.08144
- Applications: parsing, question-answering, inference and machine reading (3/22, week 10)
- Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf
- Globally Normalized Transition-Based Neural Networks https://arxiv.org/abs/1603.06042
- Deep Biaffine Attention for Neural Dependency Parsing https://arxiv.org/abs/1611.01734
- Ask Me Anything: Dynamic Memory Networks for Natural Language Processing https://arxiv.org/abs/1506.07285
- Learning to Compose Neural Networks for Question Answering https://arxiv.org/abs/1601.01705
- Reasoning about Entailment with Neural Attention https://arxiv.org/abs/1509.06664
- Long Short-Term Memory-Networks for Machine Reading https://arxiv.org/abs/1601.06733
- Natural Language Comprehension with the EpiReader https://arxiv.org/abs/1606.02270
- From ImageNet to face (3/29, week 11)
- (strongly recommened) Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size https://arxiv.org/abs/1602.07360
- (strongly recommended)
Geoffrey Hinton, Oriol Vinyals, Jeff Dean, Distilling the Knowledge in a Neural Network
2015
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Going Deeper with Convolutions, CVPR 15
- Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex A. Alemi, ICLR workshop 2016
- Sun Y, Chen Y, Wang X, et al. Deep learning face representation by joint identification-verification, Advances in Neural Information Processing Systems. 2014: 1988-1996.
- Sun Y, Wang X, Tang X. Deeply learned face representations are sparse, selective, and robust. arXiv preprint arXiv:1412.1265, 2014.
- B. Amos, B. Ludwiczuk, M. Satyanarayanan,
Openface: A general-purpose face recognition library with mobile applications,
CMU-CS-16-118, CMU School of Computer Science, Tech. Rep., 2016.
https://cmusatyalab.github.io/openface/
-
Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks NIPS 2012 (will be coverred by instructor)
-
Sergey Ioffe, Christian Szegedy,
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015. (will be coverred by instructors)
-
K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, CVPR 2016 (will be coverred by instructors)
-
Ranjan, Patel, and Chellappa,
HyperFace: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition, arXiv 2016
(will be coverred by instructor)
- Deep leaning for games
- Volodymyr Mnih, Koray Kavukcuoglu et al, Human-level control through deep reinforcement learning, Nature 2015
- David Silver, Aja Huang, et al, Mastering the game of Go with deep neural networks and tree search, Nature, 2016
- Matej Moravčík, Martin Schmid, et al, DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker, https://arxiv.org/abs/1701.01724
- Ian J. Goodfellow, et al, Generative Adversarial Networks https://arxiv.org/abs/1406.2661
- Vision + Text
- (strongly recommended)Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler,
Skip-Thought Vectors,
NIPS 2015
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, L. Zitnick, D. Parikh,
VQA: Visual Question Answering, ICCV 2015
- Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus,
Simple Baseline for Visual Question Answering, 2015
Liangliang Cao Updated 01/18/2017
widget