Tom has left us on 04/25/2020 with so many memories I can never forget. I just want to mention a few things that keep coming to my mind in the past few days.
Tom has a style of hiding deep insights within humor. When I worked with him as a Ph.D. student, he joked, “The advance of computer vision is mostly due to the advance of the computer”. I thought he was describing his experience starting as early as when he used paper tape with punched holes to study image compression. However, later I found his statement perfectly forecasted the major theme of my career. The powerful computers with GPUs (or TPUs) give researchers the ability to learn from big data, which has changed not only my work after my Ph.D. but also almost everyone in the area of computer vision.
Tom once asked a question: “There are so many papers on visual recognition. If every paper improves state of the art by 0.1%, will the problem be solved after 1000 papers?” I did not know how to answer this question well until I have worked for one decade in different companies and projects. The key is, most evaluation sets of machine learning benchmarks (except a few in well-defined scenarios such as Jeopardy, AlphaGo, and etc.) are limited and biased. In theory, we can easily achieve 100% accuracy on a non-degenerate training set. We can also keep improving the accuracy on limited testing sets by adding priors or context. But the accuracy of the testing set is not reliable in real-world applications. The unseen examples will often defeat machine learning models, which work well in the limited evaluation scenarios.
A solution to address the above problem is to enlarge the evaluation set. But this answer often makes me a bit frustrated. On the one hand, the barrier of data collection and cost may prevent academic researchers from competing with companies with more resources. On the other hand, data engineering becomes more important than everything else in the industry. I was planning to ask Tom when I meet him next time: what do you think of the current data-hungry paradigm? Can we still aim for elegant and beautiful algorithms as your compression and 3D motion estimation work? Unfortunately, I can never meet him again, and can not ask him this question.
But I can imagine, Tom will use his humor to answer my question. I find out an email he sent out two years ago:
“To the IFPers who gave me the great Pavie: I am so happy that our IFPers have developed a culture of loving good wines. I think loving fine wines is just like loving fine mathematics. My prediction is that many of you will use deep learning to explore wine appreciation. As always, the crux is to have a large number of training samples. Happy Drinking. Tom”.
Love you so much, Tom. Happy drinking and happy thinking in heaven.