Theory of Deep Learning
September 15 – December 12, 2020
Synopsis
Deep learning plays a central role in the recent revolution of artificial intelligence and data science. In a wide range of applications, such as computer vision, natural language processing, and robotics, deep learning achieves dramatic performance improvements over existing baselines and even human. Despite the empirical success of deep learning, its theoretical foundation remains less understood, which hinders the development of more principled methods with performance guarantees. In particular, such a lack of performance guarantees makes it challenging to incorporate deep learning into applications that involve decision making with critical consequences, such as healthcare and autonomous driving.
Towards theoretically understanding deep learning, many basic questions lack satisfying answers:
 The objective function for training a neural network is highly nonconvex. From an optimization perspective, why does stochastic gradient descent often converge to a desired solution in practice?
 The number of parameters of a neural network generally far exceeds the number of training data points (also known as overparametrization). From a statistical perspective, why can the learned neural network generalize to testing data points, even though classical ML theory suggests serious overfitting?
 From an informationtheoretic perspective, how to characterize the form and/or the amount of information each hidden layer has about the input and output of a deep neural network?
Organizers
Participation
Graduate Courses
T/Th 2:404:00pm, TTIC, Prof. Nathan Srebro
First day of lecture is 09/29/2020. Streaming via Panopto
First day of lecture is 09/17/2020
Upcoming Events
 October 1st, 11:30 am Central: Seminar – Babak Hassibi (California Institute of Technology)
Title and abstract TBA  October 6th, 4:00 pm Central: Seminar – Julia Gaudio (MIT)
“Sparse HighDimensional Isotonic Regression”
We consider the problem of estimating an unknown coordinatewise monotone function given noisy measurements, known as the isotonic regression problem. Often, only a small subset of the features affects the output. This motivates the sparse isotonic regression setting, which we consider here. We provide an upper bound on the expected VC entropy of the space of sparse coordinatewise monotone functions, and identify the regime of statistical consistency of our estimator. We also propose a linear program to recover the active coordinates, and provide theoretical recovery guarantees. We close with experiments on cancer classification, and show that our method significantly outperforms several standard methods.  October 8th, 11:30 am Central: Seminar – Jason Lee (Princeton University)
“Beyond Linearization in Deep Learning: Hierarchical Learning and the Benefit of Representation”
Deep neural networks can empirically perform efficient hierarchical learning, in which the layers learn useful representations of the data. However, how they make use of the intermediate representations are not explained by recent theories that relate them to “shallow learners” such as kernels. In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks and can be advantageous over raw inputs. We consider a fixed, randomly initialized neural network as a representation function fed into another trainable network. When the trainable network is the quadratic Taylor model of a wide twolayer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a lowrank degreep polynomial (p≥4) in d dimension, neural representation requires only O~(d⌈p/2⌉) samples, while the bestknown sample complexity upper bound for the raw input is O~(dp−1). We contrast our result with a lower bound showing that neural representations do not improve over the raw input (in the infinite width limit), when the trainable network is instead a neural tangent kernel. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.  October 13th, 11:30 am Central: Seminar – Daniel Hsu (Columbia University)
Title and abstract TBA  October 22nd, 11:30 am Central: Seminar – Andrea Montenari (Stanford University)
Title and abstract TBA  October 29th, 11:30 am Central: Seminar – Francis Bach (INRIA)
“On the Convergence of Gradient Descent for Wide TwoLayer Neural Networks”
Many supervised learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many guarantees exist. Models which are nonlinear in their parameters such as neural networks lead to nonconvex optimization problems for which guarantees are harder to obtain. In this talk, I will consider twolayer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived. I will also highlight open problems related to the quantitative behavior of gradient descent for such models. (Joint work with Lénaïc Chizat)  November 19th, 11:30 am Central: Seminar – Rayadurgam Srikant (University of Illinois, UrbanaChampaign)
 Title and abstract TBA
 December 1st, 4:00 pm Central: Seminar – Edgar Dobriban (University of Pennsylvania)
Title and abstract TBA
Past Events
 September 15th, 4:00 pm Central: Kickoff Event
This Special Quarter is sponsored by The Institute for Data, Econometrics, Algorithms, and Learning (IDEAL), a multidiscipline, multiinstitution collaborative institute that focuses on key aspects of the theoretical foundations of data science. This is the second installment after a successful Special Quarter in spring 2020 on Inference and Data Science on Networks. An exciting program has been planned for the quarter, including four courses, a seminar series, and virtual social events – all free of charge! By organizing these group activities, we the organizers hope to create an environment for all participants including speakers and instructors to learn from each other, and also to catalyze research collaboration in the focus area of this Special Quarter.
The kickoff event for this quarter will be held on Tuesday September 15, 2020 at 4 pm Chicago/Central time. We will briefly introduce the institute, the key personnel, the quarterlong courses, and other programs. We will also take you to a tour of our virtual institute on http://gather.town – an amazing virtual space where you can “walk” and meet other participants to video chat and to even work together. Please join us at the kickoff event and mingle!
Calendar
September 2020 


Mon  Tue  Wed  Thu  Fri  Sat  Sun 
1

2

3

4

5

6


7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30
