Workshop: Estimation of Network Processes and Information Diffusion

About the Series

The IDEAL workshop series brings in four experts on topics related to the foundations of data science to present their perspective and research on a common theme. Chicago area researchers with an interest in the foundations of data science. The series will be remote while our universities and local government advise avoiding non-essential meetings. The virtual format will have two talks before lunch, two talks after lunch, and an early evening panel discussion (where appropriate).

Part of the Special Quarter on Inference and Data Science on Networks.

Synopsis

Many important dynamic processes are determined by an underlying network structure. Examples include the spread of epidemics, the dynamics of public opinions, the diffusion of information about social programs, and biological processes such as neural spike trains. Data about these processes is becoming increasingly available which has lead to a number of different research communities to tackle related questions independently. What is the source of a rumor? Will a given disease spread widely? Who is the key player? This workshop will cover some new tools being developed to address such questions.  The workshop speakers are Devavrat Shah, Lori Beaman, Rebecca Willet, and Arun Chandrasekhar.

Logistics

Schedule

  • 10:55-11:00: Opening Remarks
  • 11:00-11:40: Devavrat Shah (MIT, EECS)
    Synthetic Interventions and COVID-19
  • 11:45-12:25: Lori Beaman (Northwestern University, Economics)
    Can Network Theory-based Targeting Increase Technology Adoption? 
  • 12:30-1:30: Lunch Break
  • 1:30-2:10: Rebecca Willet (University of Chicago, Statistics & CS)
    Context-dependent self-exciting point processes: models, methods, and risk bounds in high dimensions
  • 2:15-2:55: Arun Chandrasekhar (Stanford, Economics)
    Identifying  Latent Space Geometry in Network Models Using Analysis of Curvature
  • 3:00-4:00: Afternoon Break
  • 4:00-5:00: Panel Discussion with the Speakers

Please use this Google form to pose questions for the Panel and the speakers. 

Titles and Abstracts

 
Speaker: Devavrat Shah, MIT
Title: Synthetic Interventions and COVID-19

Abstract: As we reach the apex of the COVID-19 pandemic across the globe, a pressing question facing us all: can we, even partially, reopen the economy without risking the second wave? Towards that, we first need to understand if shutting down the economy helped. And if it did, is it possible to achieve similar gains in the war against the pandemic while partially opening up the economy? And if so, how does that translate into policy? 

 
To address such `what if scenario analysis’ questions, we propose a method of Synthetic Interventions (SI). It enables counterfactual estimates for all interventions of interest using observed data only. SI generalizes the classical Synthetic Controls (SC) method for causal inference using observational studies it has similar data requirements as SC, but enables counterfactual estimates with multiple interventions, rather than single intervention as in SC. The SI method comes with a data-driven test to evaluate its applicability. In addition to explaining its utility in answering above mentioned questions, time permitting, we shall discuss applications in the context of policy design in the developing countries, online A/B testing and drug discovery.  
 
Based on joint work with Anish Agarwal, Abdullah Aalomar, Romain Cosson, Arnab Sarkar and Dennis Shen (all at MIT). 

Speaker: Lori Beaman, Northwestern University
Title: Can Network Theory-based Targeting Increase Technology Adoption? 

Abstract: In order to induce farmers to adopt a new agricultural technology, we use predictions from the threshold model of diffusion to target information to key individuals within villages in Malawi. We combine social network data and model simulations to ex ante determine who is treated in our field experiment. We observe adoption decisions in 200 villages over 3 years. Our results are consistent with a model in which many farmers need to learn from multiple people before they adopt themselves. This means that without proper targeting of information, the diffusion process can stall and technology adoption remains perpetually low.

 
(Joint work with Ariel BenYishay, Jeremy Magruder and Mushfiq Mobarak)

Speaker: Rebecca Willet, University of Chicago
Title: Context-dependent self-exciting point processes: models, methods, and risk bounds in high dimensions


Abstract:

High-dimensional autoregressive point processes model how current events trigger or inhibit future events, such as activity by one member of a social network can affect the future activity of his or her neighbors. While past work has focused on estimating the underlying network structure based solely on the times at which events occur on each node of the network, this work examines the more nuanced problem of estimating context-dependent networks that reflect how features associated with an event (such as the content of a social media post) modulate the strength of influences among nodes. Specifically, we leverage ideas from compositional time series and regularization methods in machine learning to conduct network
estimation for high-dimensional marked point processes using autoregressive multinomial and logistic-normal models. 
 
This is joint work with Lili Zheng, Garvesh Raskutti, and Benjamin Mark.

Speaker: Arun Chandrasekhar, Stanford
Title: Identifying  Latent Space Geometry in Network Models Using Analysis of Curvature

Abstract: Networks are frequently modeled using latent space (LS) models. Nodes are points on a manifold and the probability of a link forming between two points is conditionally independent of anything else and declines in distance in the manifold. Typically, researchers select a LS geometry (the manifold class, dimension, and curvature) by assumption and not in a data-driven way.  In this work, we present a method to consistently estimate the manifold type, dimension, and curvature out of  an empirically relevant class of latent spaces (simply connected, complete Riemannian manifolds)  given network data.   Our argument may be of more general interest in statistical geometry: we can estimate the underlying geometry in such a context when a researcher observes a noisy  estimate of a distance matrix generated by a collection of points on some manifold. We explore the efficacy of our approach with a battery of empirically-relevant simulations. We explore the accuracy of our approach with simulations and then apply our approach to datasets from economics and sociology as well as neuroscience. These document that different geometries are needed to model networks in different contexts.

 
Joint work with Tyler McCormick (Dept. of Statistics & Dept. of Sociology, University of Washington) and Shane Lubold (Dept. of Stats, University of Washington)