Fall 2020

Theory of Deep Learning

September 15 – December 12, 2020


Deep learning plays a central role in the recent revolution of artificial intelligence and data science. In a wide range of applications, such as computer vision, natural language processing, and robotics, deep learning achieves dramatic performance improvements over existing baselines and even human. Despite the empirical success of deep learning, its theoretical foundation remains less understood, which hinders the development of more principled methods with performance guarantees. In particular, such a lack of performance guarantees makes it challenging to incorporate deep learning into applications that involve decision making with critical consequences, such as healthcare and autonomous driving.

Towards theoretically understanding deep learning, many basic questions lack satisfying answers:

  1. The objective function for training a neural network is highly non-convex. From an optimization perspective, why does stochastic gradient descent often converge to a desired solution in practice?
  2. The number of parameters of a neural network generally far exceeds the number of training data points (also known as over-parametrization). From a statistical perspective, why can the learned neural network generalize to testing data points, even though classical ML theory suggests serious overfitting?
  3. From an information-theoretic perspective, how to characterize the form and/or the amount of information each hidden layer has about the input and output of a deep neural network?


Nathan Srebro (TTIC), Zhaoran Wang (Northwestern University), Dongning Guo (Northwestern University)


The Special Quarter will be run on the gather.town platform, which will support a virtual space for classes, meetings, seminars, and water-cooler interactions. Registration will give you access to the virtual institute, and allow you to participate in these activities.

Graduate Courses

The following graduate courses are coordinated to facilitate attendance by participants of the special quarter. These courses will be offered remotely to students attending the program. Enrollment is by consent of instructor. Instructors will announce how to access course material at the appropriate time.

Upcoming Events

  • October 1st, 11:30 am Central: Seminar – Babak Hassibi (California Institute of Technology)
    Title and abstract TBA
  • October 6th, 4:00 pm Central: Seminar – Julia Gaudio (MIT)
    “Sparse High-Dimensional Isotonic Regression”
    We consider the problem of estimating an unknown coordinate-wise monotone function given noisy measurements, known as the isotonic regression problem. Often, only a small subset of the features affects the output. This motivates the sparse isotonic regression setting, which we consider here. We provide an upper bound on the expected VC entropy of the space of sparse coordinate-wise monotone functions, and identify the regime of statistical consistency of our estimator. We also propose a linear program to recover the active coordinates, and provide theoretical recovery guarantees. We close with experiments on cancer classification, and show that our method significantly outperforms several standard methods.
  • October 8th, 11:30 am Central: Seminar – Jason Lee (Princeton University)
    “Beyond Linearization in Deep Learning: Hierarchical Learning and the Benefit of Representation”
    Deep neural networks can empirically perform efficient hierarchical learning, in which the layers learn useful representations of the data. However, how they make use of the intermediate representations are not explained by recent theories that relate them to “shallow learners” such as kernels. In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks and can be advantageous over raw inputs. We consider a fixed, randomly initialized neural network as a representation function fed into another trainable network. When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-p polynomial (p≥4) in d dimension, neural representation requires only O~(d⌈p/2⌉) samples, while the best-known sample complexity upper bound for the raw input is O~(dp−1). We contrast our result with a lower bound showing that neural representations do not improve over the raw input (in the infinite width limit), when the trainable network is instead a neural tangent kernel. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
  • October 13th, 11:30 am Central: Seminar – Daniel Hsu (Columbia University)
    Title and abstract TBA
  • October 22nd, 11:30 am Central: Seminar – Andrea Montenari (Stanford University)
    Title and abstract TBA
  • October 29th, 11:30 am Central: Seminar – Francis Bach (INRIA)
    “On the Convergence of Gradient Descent for Wide Two-Layer Neural Networks”
    Many supervised learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this talk, I will consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived. I will also highlight open problems related to the quantitative behavior of gradient descent for such models. (Joint work with Lénaïc Chizat)
  • November 19th, 11:30 am Central: Seminar – Rayadurgam Srikant (University of Illinois, Urbana-Champaign)
  • Title and abstract TBA
  • December 1st, 4:00 pm Central: Seminar – Edgar Dobriban (University of Pennsylvania)
    Title and abstract TBA

Past Events

  • September 15th, 4:00 pm Central: Kickoff Event
    This Special Quarter is sponsored by The Institute for Data, Econometrics, Algorithms, and Learning (IDEAL), a multi-discipline, multi-institution collaborative institute that focuses on key aspects of the theoretical foundations of data science. This is the second installment after a successful Special Quarter in spring 2020 on Inference and Data Science on Networks. An exciting program has been planned for the quarter, including four courses, a seminar series, and virtual social events – all free of charge! By organizing these group activities, we the organizers hope to create an environment for all participants including speakers and instructors to learn from each other, and also to catalyze research collaboration in the focus area of this Special Quarter.
    The kick-off event for this quarter will be held on Tuesday September 15, 2020 at 4 pm Chicago/Central time. We will briefly introduce the institute, the key personnel, the quarter-long courses, and other programs. We will also take you to a tour of our virtual institute on http://gather.town – an amazing virtual space where you can “walk” and meet other participants to video chat and to even work together. Please join us at the kick-off event and mingle!


September 2020

Mon Tue Wed Thu Fri Sat Sun
  • IDEAL Kick-Off Socia…