Workshop: Computational vs Statistical Tradeoffs in Network Inference

About the Series

The IDEAL workshop series brings in four experts on topics related to the foundations of data science to present their perspective and research on a common theme. Chicago area researchers with an interest in the foundations of data science. The technical program is in the morning and includes coffee and lunch. The afternoon of the workshop will allow for continued discussion between attendees and the speakers.

Part of the Special Quarter on Inference and Data Science on Networks.

Synopsis

Network models have been used as a tool to understand the role of interconnections between entities, by diverse communities such as sociology, biology, meteorology, economics, and computer science, to name a few. Moreover emerging technological developments allow collecting data on increasingly larger networks. This leads to both computational and statistical challenges when inferring or learning the structure of such networks. This workshop will cover some of the advances in the last decade on understanding trade-offs between statistical and computational efficiency for many inference problems on large networks. The workshop speakers are Andrea Montanari, Ankur Moitra, and Liza Levina.

Logistics

Date: Monday, June 29, 2020.
Location: Zoom participation (register below). Panopto streaming.
Registration: Registration form. Registered participants will get a Zoom link to the workshop, and the Gather.Town link for the social mixer by email.
Google form for questions: Go here.

Schedule (all in CT)

11:00-11:05: Opening Remarks
11:05-11:50: Andrea Montanari, (Stanford University, EE)
Optimization of mean field spin glasses
11:50-1:45: Lunch Break/ Virtual Social Mixer on Gather.Town
1:45-2:30: Ankur Moitra, (MIT, Mathematics)
Learning with Massart Noise
2:30-3:15: Liza Levina, (University of Michigan, Statistics)
Hierarchical community detection by recursive partitioning
3:25-4:00: Panel Discussion with the Speakers
4:00-5:00: Virtual Social Mixer on Gather.Town

Titles and Abstracts

Speaker: Andrea Montanari, Stanford
Title: Optimization of mean field spin glasses.
Abstract: Let G be a random graph over N vertices generated according to a statistical model, such as the stochastic block model. In order to analyze such a graph, it would be interesting to be able to partition its vertices into k balanced sets as to minimize the number of edges across the partition. Even for k=2, this problem is hard to approximate. Although we know of semidefinite programming relaxations with optimal worst-case guarantees, these can be sub-optimal for random instances. I will consider the case k=2 when G is an Erd\”os-Renyi random graph. This problem is equivalent to the one of finding the ground-state of the Sherrington-Kirkpatrick Hamiltonian in statistical physics.

I describe an algorithm that achieves a (1-\eps) approximation of the latter problem in near-linear time, conditional on a certain conjecture in mathematical physics. The algorithms exploits certain subtle properties of the non-convex energy landscape of this model. I will then discuss a more general class of Hamiltonians for mean field spin glasses and put forward a precise picture of the average-case tractability of optimization in this context.
[Based on arXiv:1812.10897 and arXiv:2001.00904 (with Ahmed El Alaoui and Mark Sellke)]

Speaker: Ankur Moitra, MIT
Title: Learning with Massart Noise
Abstract: In supervised learning, there is often a tension between being robust against increasingly more powerful adversaries and computational complexity. We will be interested in the Massart noise model, where one interpretation is that an adversary can arbitrarily control the labels on a random subset of the data. This seems to be a comfortable middle ground where algorithms cannot overtune to the noise distribution, but it is nevertheless possible to give algorithms with strong provable guarantees.

We give a new game theoretic framework for learning under Massart noise, and our algorithms are derived from low regret online algorithms for playing relaxations of this game. Along the way we give an algorithm for properly learning halfspaces under Massart noise, improving upon the recent work of Diakonikolas, Goulekakis and Tzamos that gives an improper learning algorithm, as well as extensions to generalized linear models. Finally we evaluate our algorithms empirically and find that they exhibit some appealing fairness properties, perhaps as a byproduct of their robustness guarantees.

This is based on joint work with Sitan Chen, Frederic Koehler and Morris Yau.

Speaker: Liza Levina Michigan
Title: Hierarchical community detection by recursive partitioning
Abstract: Community detection in networks has been extensively studied in the form of finding a single partition into a “correct” number of communities. In large networks, however, a multi-scale hierarchy of communities is much more realistic. We show that a hierarchical tree of communities, obviously more interpretable, is also potentially more accurate and more computationally efficient. We construct this tree with a simple top-down recursive algorithm, at each step splitting the nodes into two communities with a non-iterative spectral algorithm, until a stopping rule suggests there are no more communities. The algorithm is model-free, extremely fast, and requires no tuning other than selecting a stopping rule. We propose a natural model for this setting, a binary tree stochastic block model, and prove that the algorithm correctly recovers the entire community tree under relatively mild assumptions. As a by-product, we obtain explicit and intuitive results for fitting the stochastic block model under model misspecification. We illustrate the algorithm on a statistics papers dataset constructing a highly interpretable tree of statistics research communities, and on a network based on gene co-occurrence in research papers on anemia.

This is joint work with Tianxi Li, Lihua Lei, Sharmodeep Bhattacharyya, Koen van de Berge, Purnamrita Sarkar, and Peter Bickel.