I am a machine learning researcher at Apple where I work on the foundations of machine learning. I aspire to do fundamental theoretical research that impacts practical problems. My current focus is on the interplay between multigroup fairness, loss minimization and indistinguishability. I am also interested in questions related to calibration, distribution drift and anomaly detection.
I have broad interests in theoretical computer science, having worked on coding and information theory, pseudorandomness and computational complexity. On the applied side, I have worked on systems for storing and interactively visualizing large datasets.
In the past, I have been a researcher at VMware Research, Microsoft Research in Redmond and Silicon Valley, a graduate student at Georgia Tech and an undergraduate at IIT Bombay. See my C.V. for more details.
Contact: parikg at apple or gmail
For a full list of publications, see my C.V., dblp page or google scholar page.
Loss minimization, fairness and indistinguishability:
This project aims to reconcile three different perspectives on the goal of learning:
Some papers on this topic which explore connections and tradeoffs between these notions:
This line of work is summarized in a talk on Multigroup fairness and loss minimization given at the Simons workshop on Multigroup fairness in April’23. Here is a longer talk from the IAS TCSDM seminar and a shorter version from the TOC4fairness seminar.
A paper exploring how these notions relate for Deep Neural Nets, and a talk by Adam Kalai:
I am also interested in efficient and robust notions of calibration, motivated by the fact that most commonly used calibration measures fail to guarantee at least one of these. Some work proposing better measures of calibration:
Erasure Coding for distributed storage:
Motivated by applications in data storage, we introduce the notion of local recovery for an error correcting code, which allows the quick reconstruction of any single data symbol in the event of a single of few symbols being lost. We show fundamental tradeoffs between locality, distance and rate in a codeword, and construct optimal codes that achieve this tradeoff (LRCs). These codes were implemented in Microsoft Azure storage.
A two-part tutorial I gave on this topic from a bootcamp at the Simons institute: part 1 and part 2.
Interactive data visualization
Hillview is an open-source tool for fast interactive visualization and exploration of massive data sets using just the click of a mouse. HillView combines a distributed, parallel computation platform with highly optimized sketching and sampling algorithms for fast renderings. Overlook](https://research.vmware.com/publications/overlook-differentially-private-exploratory-visualization-for-big-data) adds a differential privacy layer to Hillview.
I have been fortunate to work with several fabulous interns over the years: