Parikshit Gopalan’s homepage

I am a machine learning researcher at Apple where I work on the foundations of machine learning. I aspire to do fundamental theoretical research that impacts practical problems. My current focus is on the interplay between multigroup fairness, loss minimization and indistinguishability. I am also interested in questions related to calibration, distribution drift and anomaly detection.

I have broad interests in theoretical computer science, having worked on coding and information theory, pseudorandomness and computational complexity. On the applied side, I have worked on systems for storing and interactively visualizing large datasets.

In the past, I have been a researcher at VMware Research, Microsoft Research in Redmond and Silicon Valley, a graduate student at Georgia Tech and an undergraduate at IIT Bombay. See my C.V. for more details.

Contact: parikg at apple or gmail

For a full list of publications, see my C.V., dblp page or google scholar page.

Selected Research Projects

Loss minimization, fairness and indistinguishability:

This project aims to reconcile three different perspectives on the goal of learning:

  1. Loss minimization where the goal is to find a model that minimizes a certain loss.
  2. Fairness, where the goal is to guarantee statistical validity of the predictions for various subpopulations.
  3. Indistinguishability, where the goal is to produce predictions that are indistinguishable from the ground truth.

Some papers on this topic which explore connections and tradeoffs between these notions:

This line of work is summarized in a talk on Multigroup fairness and loss minimization given at the Simons workshop on Multigroup fairness in April’23. Here is a longer talk from the IAS TCSDM seminar and a shorter version from the TOC4fairness seminar.

A paper exploring how these notions relate for Deep Neural Nets, and a talk by Adam Kalai:

I am also interested in efficient and robust notions of calibration, motivated by the fact that most commonly used calibration measures fail to guarantee at least one of these. Some work proposing better measures of calibration:

Erasure Coding for distributed storage:

Motivated by applications in data storage, we introduce the notion of local recovery for an error correcting code, which allows the quick reconstruction of any single data symbol in the event of a single of few symbols being lost. We show fundamental tradeoffs between locality, distance and rate in a codeword, and construct optimal codes that achieve this tradeoff (LRCs). These codes were implemented in Microsoft Azure storage.

  1. The paper introducing the notion of Locality of a codeword symbol with Cheng Huang, Huseyin Simitci and Sergey Yekhanin. This paper won the 2014 Information Theory/Communication Society joint paper prize.
  2. The paper describing their use of LRCs in Azure Storage. This paper won the Best Paper prize at USENIX ATC 2012.
  3. An press article from Microsoft on the use of LRCs in Azure. Another article from Microsoft.
  4. Maximal recoverability is a beyond worst-case notion of reliability tailored towards data storage. A paper constructing maximally recoverable LRCs with Cheng Huang, Bob Jenkins and Sergey Yekhanin. Another paper exploring this notion for grid-like topologies with Guangda Hu, Swastik Kopparty, Shubhangi Saraf, Carol Wang, Sergey Yekhanin.

A two-part tutorial I gave on this topic from a bootcamp at the Simons institute: part 1 and part 2.

Interactive data visualization

Hillview is an open-source tool for fast interactive visualization and exploration of massive data sets using just the click of a mouse. HillView combines a distributed, parallel computation platform with highly optimized sketching and sampling algorithms for fast renderings. Overlook]( adds a differential privacy layer to Hillview.

Other Recent ML Publications

Other selected publications

Interns mentored:

I have been fortunate to work with several fabulous interns over the years: