parikg.github.io

Parikshit Gopalan’s homepage

I am a machine learning researcher at Apple where I work on the foundations of machine learning. I aspire to do fundamental theoretical research that impacts practical problems. My current focus is on the interplay between multigroup fairness, loss minimization and indistinguishability. I am also interested in questions related to calibration, distribution drift and anomaly detection.

I have broad interests in theoretical computer science, having worked on coding and information theory, pseudorandomness and computational complexity. On the applied side, I have worked on systems for storing and interactively visualizing large datasets.

In the past, I have been a researcher at VMware Research, Microsoft Research in Redmond and Silicon Valley, a graduate student at Georgia Tech and an undergraduate at IIT Bombay. See my C.V. for more details.

Contact: parikg at apple or gmail

For a full list of publications, see my C.V., dblp page or google scholar page.

Selected Research Projects

Loss minimization, fairness and indistinguishability:

This project aims to reconcile three different perspectives on the goal of learning:

Loss minimization where the goal is to find a model that minimizes a certain loss.
Fairness, where the goal is to guarantee statistical validity of the predictions for various subpopulations.
Indistinguishability, where the goal is to produce predictions that are indistinguishable from the ground truth.

Some papers on this topic which explore connections and tradeoffs between these notions:

Omnipredictors with Adam Tauman Kalai, Omer Reingold, Vatsal Sharan and Udi Wieder, appeared at ITCS’22.
Loss minimization through the lens of outcome indistinguishability with Lunjia Hu, Michael P. Kim, Omer Reingold and Udi Wieder, appeared at ITCS’23.
Characterizing notions of omniprediction via multicalibration with Michael P. Kim and Omer Reingold. Under submission, 2023.

This line of work is summarized in a talk on Multigroup fairness and loss minimization given at the Simons workshop on Multigroup fairness in April’23. Here is a longer talk from the IAS TCSDM seminar and a shorter version from the TOC4fairness seminar.

A paper exploring how these notions relate for Deep Neural Nets, and a talk by Adam Kalai:

Loss minimization yields multicalibration for large neural networks with Jaroslaw Blasiok, Parikshit Gopalan, Lunjia Hu, Adam Kalai, Preetum Nakkiran.

I am also interested in efficient and robust notions of calibration, motivated by the fact that most commonly used calibration measures fail to guarantee at least one of these. Some work proposing better measures of calibration:

Low-degree multicalibration with Michael P. Kim, Mihir Singhal, Shengjia Zhao, appeared at COLT’22.
A Unifying Theory of Distance from Calibration with Jaroslaw Blasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran (to appear in STOC’23).

Erasure Coding for distributed storage:

Motivated by applications in data storage, we introduce the notion of local recovery for an error correcting code, which allows the quick reconstruction of any single data symbol in the event of a single of few symbols being lost. We show fundamental tradeoffs between locality, distance and rate in a codeword, and construct optimal codes that achieve this tradeoff (LRCs). These codes were implemented in Microsoft Azure storage.

The paper introducing the notion of Locality of a codeword symbol with Cheng Huang, Huseyin Simitci and Sergey Yekhanin. This paper won the 2014 Information Theory/Communication Society joint paper prize.
The paper describing their use of LRCs in Azure Storage. This paper won the Best Paper prize at USENIX ATC 2012.
An press article from Microsoft on the use of LRCs in Azure. Another article from Microsoft.
Maximal recoverability is a beyond worst-case notion of reliability tailored towards data storage. A paper constructing maximally recoverable LRCs with Cheng Huang, Bob Jenkins and Sergey Yekhanin. Another paper exploring this notion for grid-like topologies with Guangda Hu, Swastik Kopparty, Shubhangi Saraf, Carol Wang, Sergey Yekhanin.

A two-part tutorial I gave on this topic from a bootcamp at the Simons institute: part 1 and part 2.

Interactive data visualization

Hillview is an open-source tool for fast interactive visualization and exploration of massive data sets using just the click of a mouse. HillView combines a distributed, parallel computation platform with highly optimized sketching and sampling algorithms for fast renderings. Overlook](https://research.vmware.com/publications/overlook-differentially-private-exploratory-visualization-for-big-data) adds a differential privacy layer to Hillview.

Paper on Hillview with Mihai Budiu, Lalith Suresh, Udi Wieder, Han Kruiger and Marcos Aguilera appeared at VLDB 2018. Code is available from GitHub.
Paper on Overlook with Pratiksha Thaker, Mihai Budiu, Udi Wieder and Matei Zaharia, appeared in the Journal of Privacy and Confidentiality.

Other Recent ML Publications

KL divergence estimation with multigroup attribution with Nina Narodytska, Omer Reingold, Vatsal Sharan, Udi Wieder (under submission, 2023).
Multicalibrated Partitions for Importance Weights with Omer Reingold, Vatsal Sharan, Udi Wieder (ALT’22).
PIDForest:Anomaly Detection via Partial Identification with Vatsal Sharan and Udi Wieder. (Neurips’18 spotlight).

Other selected publications

Degree and sensitivity: tails of two distributions with Rocco A. Servedio and Avi Wigderson (CCC’16).
Better pseduorandom generators from milder pseudorandom restrictions with Raghu Meka, Omer Reingold, Salil Vadhan and Luca Trevisan (FOCS’12).
Making the Long code shorter with Boaz Barak, Johan Hastad, Raghu Meka, Prasad Raghavendra and David Steurer (FOCS’12).
DNF Sparsification and a faster deterministic counting algorithm with Raghu Meka and Omer Reingold (CCC’12).
Matching Vector Codes with Zeev Dvir and Sergey Yekhanin (FOCS’10).
Finding duplicates in a data stream with the one and only Jaikumar Radhakrishnan (SODA’09).
Bounded independence fools halfspaces with Ilias Diakonikolas, Ragesh Jaiswal, Rocco Servedio and Emanuele Viola (FOCS’08).
List-Decoding Reed-Muller codes over small fields with Adam R. Klivans and David Zuckerman (STOC’08).
On agnostic learning of parities, monomials and halfspaces with Vitaly Feldman, Subhash Khot and Ashok Ponnuswami (FOCS’06).

Interns mentored:

I have been fortunate to work with several fabulous interns over the years:

Yi Wu, Carnegie Mellon University, now at Google (2009).
Raghu Meka, University of Texas at Austin, now at UCLA. (2010, 2011).
Yuan Zhou, Carnegie Mellon University, now at Tsinghua University. (2012).
Abhishek Bhowmick, University of Texas at Austin. (2013).
Li-Yang Tan, Columbia University, now at Stanford University. (2014).
Vatsal Sharan, Stanford University, now at USC. (2017, 2020).
Michael P. Kim, Stanford University, currently Miller fellow at UC Berkeley (2017).
Roie Levin, CMU, currently Fulbright postdoctoral fellow at Tel-Aviv University (2019).
Shivam Garg, Stanford (2021).
Mihir Singhal, MIT (2021).
Lunjia Hu, Stanford (2022).