Columbia University E6894, Spring 2017 (7:00-9:30pm, Wednesday, 627 Seeley W. Mudd Building)

Deep Learning for Computer Vision, Speech, and Language

Requirements for students' presentations

  • Students are suggested to form a team with two persons.
  • Each team is required to select one paper and prepare a 20 minutes presentation.
  • The list of papers to present will be provided in this page soon, organized according to the topics of different lectures.
  • A team should contact the instructor if they want to present one paper not on the list.
  • Presentation slides should be at least 15 pages long, which sent to the instructor one day before the class (for the benefits of discussion)
  • The instructor may prepare questions for teams to discuss in class.

Suggested papers to present

  1. DNNs, CNNs, RNNs, LSTMs for speech (2/15, week 5)
    1. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
    2. Feature Engineering in Context-Dependent Deep Neural Networks for Conversational Speech Transcription
    3. 1-Bit Stochastic Gradient Descent and Application to Data-Parallel Distributed Training of Speech DNNs
    4. Convolutional Neural Networks for Speech Recognition
    5. Deep Bi-directional Recurrent Networks Over Spectral Windows
    6. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling
    7. Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks
  2. End-to-End Speech Recognition (2/22, week 6)
    1. Towards End-to-End Speech Recognition with Recurrent Neural Networks
    2. Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition
    3. Deep Speech: Scaling up end-to-end speech recognition
    4. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
    5. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
  3. Distributed representations, text categorization and language modeling (3/1, week 7)
    1. GloVe: Global Vectors for Word Representation http://www-nlp.stanford.edu/pubs/glove.pdf
    2. Improving Distributional Similarity with Lessons Learned from Word Embeddings http://www.aclweb.org/anthology/Q15-1016
    3. Enriching Word Vectors with Subword Information https://arxiv.org/abs/1607.04606
    4. Bag of Tricks for Efficient Text Classification https://arxiv.org/abs/1607.01759
    5. A Convolutional Neural Network for Modelling Sentences https://arxiv.org/abs/1404.2188
    6. Visualizing and Understanding Recurrent Networks https://arxiv.org/abs/1506.02078
    7. Generating Sentences from a Continuous Space https://arxiv.org/abs/1511.06349
    8. Character-aware neural language models https://arxiv.org/abs/1508.06615
  4. LSTMs/GRUs, sequence-to-sequence architectures and neural attention (3/18, week 8)
    1. LSTM: A Search Space Odyssey https://arxiv.org/abs/1503.04069
    2. Fast and Robust Neural Network Joint Models for Statistical Machine Translation http://acl2014.org/acl2014/P14-1/pdf/P14-1129.pdf
    3. On Using Very Large Target Vocabulary for Neural Machine Translation https://arxiv.org/abs/1412.2007
    4. Effective Approaches to Attention-based Neural Machine Translation https://arxiv.org/abs/1508.04025
    5. Neural Machine Translation in Linear Time https://arxiv.org/abs/1610.10099
    6. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation https://arxiv.org/abs/1609.08144
  5. Applications: parsing, question-answering, inference and machine reading (3/22, week 10)
    1. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank http://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf
    2. Globally Normalized Transition-Based Neural Networks https://arxiv.org/abs/1603.06042
    3. Deep Biaffine Attention for Neural Dependency Parsing https://arxiv.org/abs/1611.01734
    4. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing https://arxiv.org/abs/1506.07285
    5. Learning to Compose Neural Networks for Question Answering https://arxiv.org/abs/1601.01705
    6. Reasoning about Entailment with Neural Attention https://arxiv.org/abs/1509.06664
    7. Long Short-Term Memory-Networks for Machine Reading https://arxiv.org/abs/1601.06733
    8. Natural Language Comprehension with the EpiReader https://arxiv.org/abs/1606.02270
  6. From ImageNet to face (3/29, week 11)
    1. (strongly recommened) Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size https://arxiv.org/abs/1602.07360
    2. (strongly recommended) Geoffrey Hinton, Oriol Vinyals, Jeff Dean, Distilling the Knowledge in a Neural Network 2015
    3. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Going Deeper with Convolutions, CVPR 15
    4. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex A. Alemi, ICLR workshop 2016
    5. Sun Y, Chen Y, Wang X, et al. Deep learning face representation by joint identification-verification, Advances in Neural Information Processing Systems. 2014: 1988-1996.
    6. Sun Y, Wang X, Tang X. Deeply learned face representations are sparse, selective, and robust. arXiv preprint arXiv:1412.1265, 2014.
    7. B. Amos, B. Ludwiczuk, M. Satyanarayanan, Openface: A general-purpose face recognition library with mobile applications, CMU-CS-16-118, CMU School of Computer Science, Tech. Rep., 2016. https://cmusatyalab.github.io/openface/
    8. Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks NIPS 2012 (will be coverred by instructor)
    9. Sergey Ioffe, Christian Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, 2015. (will be coverred by instructors)
    10. K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, CVPR 2016 (will be coverred by instructors)
    11. Ranjan, Patel, and Chellappa, HyperFace: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition, arXiv 2016 (will be coverred by instructor)
  7. Deep leaning for games
    1. Volodymyr Mnih, Koray Kavukcuoglu et al, Human-level control through deep reinforcement learning, Nature 2015
    2. David Silver, Aja Huang, et al, Mastering the game of Go with deep neural networks and tree search, Nature, 2016
    3. Matej Moravčík, Martin Schmid, et al, DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker, https://arxiv.org/abs/1701.01724
    4. Ian J. Goodfellow, et al, Generative Adversarial Networks https://arxiv.org/abs/1406.2661
  8. Vision + Text
    1. (strongly recommended)Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler, Skip-Thought Vectors, NIPS 2015
    2. S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, L. Zitnick, D. Parikh, VQA: Visual Question Answering, ICCV 2015
    3. Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus, Simple Baseline for Visual Question Answering, 2015
Liangliang Cao Updated 01/18/2017
widget