llcao

What to learn from our deep learning course?

This is one from a series of notes written when I prepared the course. It is no substitute for the efforts of writing the code, running the experiments, and attending the class. But hopefully it may help our students a little bit in understanding the big picture).

We’ve received quite a lot warm responses on attending our class on deep learning. But what shall the students from this course? Especially, one semester is short, while the content of deep learning is rich (and also growing!). Can one dive develop in-depth knowledge and expertise in a short time?

What is deep learning?

Before we discuss a problem, we need first define it. So the first question is: what is deep learning?

There is really no rigid definition of deep learning. Maybe we’d say whatever G. Hinton says deep learning is shall be it. In practice, if we denote input_x as signal like vector or matrix or tensor, and input_y as the corresponding label, a deep learning model usually works as the following:

o_1 = Layer_1(input_x)
o_2 = Layer_2(o_1)
o_3 = Layer_2(o_2)
...
o_y = Layer_n (o_n) 
c = Loss(o_y, input_y)

Here Layer_1, Layer_2, … are either linear or nonlinear functions defined by the users. The goal of many deep learning toolkits and hardware is to compute these functions and learn these function as efficient as possible.

However, if we take a closer look at the above code, it becomes obvious that deep learning can be anything. You can convert any machine learning algorithm as a layered structure since the Layer_i can be any function. Although not all the people realize it, we can implement almost any machine learning algorithm using Tensorflow (here is an example) or theano.

Obviously one need not learn everything, we’d better narrow down and focus on those successful ones.

Reviewing effective deep learning models

Although it is difficult to mathematically prove why one deep model works well, the good news is that we can gain good intuitions of them from practice. In this course we plan to guide you through the following deep learning models:

  1. CNNs (which can reduce the number of parameters using a set of local convolution filters),
  2. LSTM or GRU (which are good at modeling sequential data),
  3. word2vec or related neural embedding (which work well in unsupervised learning),
  4. (maybe) DQN and related reinforce learning networks (which are nice strategies for playing games),
    5 and many more.

Note that the effectiveness of different deep networks depends on the scenario. We strongly encourage you to do your experiment when you learn a new model from class and from papers.

Practice is a good resource of wisdom. Deep learning has been quite successful in three fields: speech, computer vision, and natural language processing. In this course we plan to walk you through a number of successful examples of deep learning and we believe in this way you will develop your expertise quickly.

Insights from different modalities

If we look at the recent papers in speech, computer vision, and natural language processing, a funny fact is that a successful model in one field often influence other areas:

  • CNN was first developed in vision, but now adopted by speech and language.
  • Word2vec was designed for language, but now been widely used for vision + text applications.
  • Residual network has become a breakthrough in computer vision, but its idea was partially inspired by the high-way network and LSTM in sequential modeling.

We hope some students in our class will find new interesting discoveries.

Share on Twitter Share on Facebook