CS + Law Workshop: Definitions for Algorithms

About the Series

The IDEAL workshop series brings in experts on topics related to the foundations of data science to present their perspective and research on a common theme. This virtual workshop will feature four talks and discussion sessions.

Synopsis

The use of algorithms in legal contexts presents great potential and significant challenges. Computer scientists and legal scholars need to collaborate to determine appropriate definitions for algorithms that capture notions like bias, fairness, justice, accountability, explainability, verification, certification, auditability, contestability, etc. These definitions then need to be integrated into algorithm design and analysis frameworks. This workshop focuses on developing technical definitions for concepts like these and the technical challenges for algorithm design that they introduce.

The workshop is part of the Spring 2021 Special Quarter on Data Science and Law, co-organized by Northwestern Professors Jason D. Hartline and Daniel W. Linna Jr.

Logistics

Date: Friday, May 28, 2021 11:00am-3:00pm CDT (Chicago Time)
Location: Virtual IDEAL (on gather.town)
Free Registration: Attendees must register to receive information for joining. Login information for Gather.Town will be provided via email.
WATCH THE FULL EVENT HERE

Schedule

11:00-11:10: Introduction (Jason Hartline and Dan Linna)
11:10-11:40: Ran Canetti (Boston Univ.) Watch the talk here
11:40-12:10: Jonathan Shafer (Univ. of Calif.-Berkeley) Watch the talk here
12:10-1:00: Extended Q&A on gather.town
1:00-1:30: Aloni Cohen (Boston Univ.) Watch the talk here
1:30-2:00: Omer Reingold (Stanford Univ.) Watch the talk here
2:00-3:00: Extended Q&A on gather.town
watch the entire event here

Abstracts

Speaker: Ran Canetti
Title: Verification Dilemmas, Law, and the Promise of Zero-Knowledge Proofs
Abstract: Individuals expose personally identifying information to access a website or qualify for a loan, undermining privacy and security. Firms share proprietary information in dealmaking negotiations; if the deal fails, the negotiating partner may use that information to compete. Regulators that comply with public transparency and oversight requirements can risk subjecting algorithmic governance tools to gaming that destroys their efficacy. Litigants might have to reveal trade secrets in court proceedings to prove a claim or defense. Such “verification dilemmas,” or costly choices between opportunities that require the verification of some fact, and risks of exposing sensitive information in order to perform verification, appear across the legal landscape. Yet, existing legal responses to them are imperfect. Legal responses often depend on ex post litigation remedies that are prohibitively expensive for those most in need, or that fail to address abuses of information entirely.

Zero-knowledge proofs (ZKPs)—a class of cryptographic protocols that allow one party to verify a fact or characteristic of secret information without revealing the actual secret—can help solve these verification dilemmas. ZKPs have recently demonstrated their mettle, for example, by providing the privacy backbone for the blockchain. Yet they have received scant notice in the legal literature. This Article fills that gap by providing the first deep dive into ZKPs’ broad relevance for law. It explains ZKPs’ conceptual power and technical operation to a legal audience. It then demonstrates how, and that, ZKPs can be applied as a governance tool to transform verification dilemmas in multiple legal contexts. Finally, the Article surfaces, and provides a framework to address, the policy issues implicated by the potential substitution of ZKP governance tools in place of existing law and practice.

Speaker: Jonathan Shafer
Title: Interactive Proofs for Verifying Machine Learning

Abstract: We consider the following question: using a source of labeled data and interaction with an untrusted prover, what is the complexity of verifying that a given hypothesis is “approximately correct”? We study interactive proof systems for PAC verification, where a verifier that interacts with a prover is required to accept good hypotheses, and reject bad hypotheses. Both the verifier and the prover are efficient and have access to labeled data samples from an unknown distribution. We are interested in cases where the verifier can use significantly less data than is required for (agnostic) PAC learning, or use a substantially cheaper data source (e.g., using only random samples for verification, even though learning requires membership queries). We believe that today, when data and data-driven algorithms are quickly gaining prominence, the question of verifying purported outcomes of data analyses is very well-motivated. We show three main results. First, we prove that for a specific hypothesis class, verification is significantly cheaper than learning in terms of sample complexity, even if the verifier engages with the prover only in a single-round (NP-like) protocol. Moreover, for this class we prove that single-round verification is also significantly cheaper than testing closeness to the class. Second, for the broad class of Fourier-sparse boolean functions, we show a multi-round (IP-like) verification protocol, where the prover uses membership queries, and the verifier is able to assess the result while only using random samples. Third, we show that verification is not always more efficient. Namely, we show a class of functions where verification requires as many samples as learning does, up to a logarithmic factor.

Speaker: Aloni Cohen
Title: Legal Theorems for Data Privacy
Abstract: There is a significant conceptual gap between legal and mathematical thinking around data privacy. The effect is uncertainty as to which technical offerings meet legal standards. This uncertainty is exacerbated by a litany of successful privacy attacks demonstrating that traditional statistical disclosure limitation techniques often fall short of the privacy envisioned by regulators. In this talk, I will define predicate singling-out attacks and show how they can be used to prove “legal theorems” that directly challenge prevailing legal guidance on data anonymization.

Speaker: Omer Reingold
Title: Outcome Indistinguishability
Abstract: Prediction algorithms assign numbers to individuals that are popularly understood as individual “probabilities”—what is the probability of 5-year survival after cancer diagnosis?—and which increasingly form the basis for life-altering decisions. Drawing on an understanding of computational indistinguishability developed in complexity theory and cryptography, we introduce Outcome Indistinguishability. Predictors that are Outcome Indistinguishable yield a generative model for outcomes that cannot be efficiently refuted on the basis of the real-life observations produced by Nature. We investigate a hierarchy of Outcome Indistinguishability definitions, whose stringency increases with the degree to which distinguishers may access the predictor in question. Our findings reveal that Outcome Indistinguishability behaves qualitatively differently than previously studied notions of indistinguishability. First, we provide constructions at all levels of the hierarchy. Then, leveraging recently-developed machinery for proving average-case fine-grained hardness, we obtain lower bounds on the complexity of the more stringent forms of Outcome Indistinguishability. This hardness result provides the first scientific grounds for the political argument that, when inspecting algorithmic risk prediction instruments, auditors should be granted oracle access to the algorithm, not simply historical predictions.