PIRAMID Project
Physics-Informed Research for Ambitious Mechanistic Interpretability Development
We believe that the pursuit of ambitious mechanistic interpretability (AMI) supported by a rigorous science of AI systems is essential for scalable AI alignment. Starting with the methods and frameworks of statistical physics, our research will simultaneously build up the scientific foundations of three pillars of AI safety.
Ongoing work focuses on how neural networks learn and leverage hierarchical structure in real-world data, grounded in several hypotheses:
- Learned features are organized according to a hierarchical, scale-dependent notion of relevance.
- Faithful interpretability tools leverage this hierarchy.
- A scale-aware framework can place principled, probabilistic bounds on worst-case behaviors of AI systems.
These are deeply interconnected: theory underpins applications, which provide empirical support for theoretical predictions, and principled validation methods facilitate quick feedback between the two. Though we take inspiration from physics, we prioritize mission impact over methodological purity, and remain open to complementary or alternative insights from multiple disciplines.
Our Team
Lauren Greenspan
Technical Director
Dmitry Vaintrob
Research Lead, Learning Theory
Nischal Mainali
Research Affiliate, Learning Theory
Ari Brill
Research Lead, Data Models
Tom I. Carlson
Research Affiliate, Data Models
Andrew Mack
Research Lead, Tools
Jennifer Lin
Research Affiliate, Tools