Fellows & Alumni

Fellows of 2026

Alessandro Veneri

I’m a PhD researcher in Economics at the European University Institute. My main work uses mechanism design to study industrial organisation questions, such as auctions with asymmetric information for digital advertising markets, and screening in two-sided markets. I’m increasingly interested in applying game-theoretic tools to cooperative AI, with a current project aiming to identify conditions for differential progress between cooperative and adversarial capabilities. During my PIBBSS fellowship, I plan on developing a formal framework for how the diffusion of AI as abundant cognitive labour reshapes the space of viable institutional arrangements.

Alessandro Veneri

Ana Simina Stoian

I’ve always been interested in connecting the dots and understanding the bigger picture. My background spans AI evaluation and reinforcement learning, a Master’s in Public Policy from Sciences Po Paris, and co-founding an AI startup. While working on evaluating frontier AI agents, I’ve found myself contemplating questions that transcend technical architectures: what does it mean to deliberately engineer increasingly capable minds whose internal workings we do not fully understand, and what do we owe them if we succeed? My research focuses on the persistent structural failures in how AI systems reason, execute, and relate to the physical world despite increasing capabilities, the codependency risks that arise when humans become entangled with them, and anthropomorphic projection: the tendency to attribute care, intentionality, and selfhood to systems whose actual behaviour is fundamentally alien to these concepts. During PIBBSS, I’ll investigate where these projections break down, what “disorders” they produce, and what a more rigorous, disanalogy-aware approach to alignment would look like.

Ana Simina Stoian

Andrea Ferrari

I am a high-energy physicist with a strong philosophical background, who has been working at the forefront of the intersection between particle Physics (Quantum Fields and Strings) and Mathematics (Algebraic Geometry, Vertex Algebras and Category Theory) for several years. I have recently been exploring, with some success, applications of Artificial Intelligence and Machine Learning to scientific discovery in Physics. Having grown concerned about the rapidity of the development of agentic systems, I have decided to apply for this fellowship to explore the implications and consequences of their deployment, starting from the disciplines that are closest to my work.

Andrea Ennio Vincenzo Ferrari

Augustin Pierre Octave Lafay

I am a researcher in mathematical physics, more precisely, I have been working on exactly solvable models of statistical mechanics and conformal field theory. I recently developped interests in mechanistic interpretability.

Augustin Pierre Octave Lafay

Damini Kusum

I am a PhD candidate at Carnegie Mellon University, pursuing a degree in Logic, Computation, and Methodology. My work combines the theories of computability and probability to address the following question: what, in principle, can be learned by a machine?

Damini Kusum

Ida Karolina Mattsson

[Affiliate Jun 2024-Jul 2025] I’m interested in understanding how abstraction works, and how artificial and biological neural systems use them to process information and do computations. I have a colourful background combining mathematics, physics, computational neuroscience, philosophy, and music composition. My current work focuses on formalising the notion of “emergence”, and applying it to better understand how agents build world models and use them to solve specific tasks. I’m also very interested in how various fundamental ideas from complexity science (e.g. renormalisation, metastability, criticality) can be used to foster AI interpretability.

Ida Karolina Mattsson

Idil Karen Şahin

How do minds model other minds? As an undergraduate at Dartmouth studying mathematics and cognitive science (and an incoming CS MS student) I investigate the mechanistic basis that allow agents to represent beliefs and intentions, such as theory-of-mind in large language models or cooperative strategies in iterated games. My research sits at the intersection of mechanistic interpretability, Bayesian cognition, and multiagent systems, focusing on what computational structures make coordination possible.

Idil Karen Sahin

Jaime Ruiz Serra

I am fascinated by the capacity of collectives to achieve outcomes beyond the reach of isolated individuals. My PhD research investigates the interplay between individual agency and collective dynamics, focusing on how incentives shape this relationship. I draw upon models from cognitive science and computational economics, analysing them through a complex systems perspective incorporating information theory, dynamical systems theory, game theory, and Bayesian inference.

Jaime Ruiz Serra

Jeremy Saal

I am a cognitive neuroscientist with several years of experience researching intracranial neural prostheses. I previously completed my PhD at the University of California, San Francisco focusing on steering and probing the subjective experience of chronic pain in humans via implanted electrodes. My long-term research goal is to understand the mechanisms, as well as correct the behavior, of artificial intelligence systems through developing the cognitive and clinical neuroscience of synthetic minds.

Jeremy Saal

Jonathan Elsworth Eicher

My name’s Jonathan and I did my PhD on the thermodynamics of disordered protein supramolecular assemblies at UNC Chapel Hill. I developed an interest in “what is life” early on, was a bit put out to discover it was less a physical question than a definitional quandary. Still, I found the work and ideas fascinating; eventually stepping into the world of computational life.

Now my work is focused on quantifying how codebases can exhibit ecosystem-like behaviors and how they influence agents that interact with them. With a particular focus in how info-hazards can cause harm or extinction-vortex style behavioral collapse in agents.

Jonathan Elsworth Eicher

Joonas Mikael Vättö

I’m interested in understanding the inner workings of complex systems. With a background in mathematical physics, I now work on interpreting how AI models learn and represent concepts.

Joonas Mikael Vättö

Junior Chinomso Okoroafor

Hi, my name is Junior. I am currently finishing up grad school in cognitive science at MIT.

My research focuses on modelling the computational structure of human value representations and investigating how these mental representations shape the processes and outcomes of our decisions. I am also interested in developing normative frameworks for how people ought to solve these problems given their objectives and resource constraints.

I draw on methods at the intersection of cognitive science and analytic philosophy to investigate these questions.

I am also interested in comparative cognition in non-human animals and AI systems, particularly in questions such as: to what extent do such entities possess experiences and values of their own?

Junior Chinomso Okoroafor

Ka Wang Pang

I am a Cambridge University master’s graduate and former ARIA technical creator. My work is focused around giving concepts like cooperation and empathy useful and practical descriptions that apply to systems which obey formal and computational rules. As part of that work, I want to bridge the gap between humanist, philosophic, computational, and physics-oriented ways of viewing the world.

Chris Pang

Laura Teresa Ball

Hi, my name is Junior. I am currently finishing up grad school in cognitive science at MIT.

My research focuses on modelling the computational structure of human value representations and investigating how these mental representations shape the processes and outcomes of our decisions. I am also interested in developing normative frameworks for how people ought to solve these problems given their objectives and resource constraints.

I draw on methods at the intersection of cognitive science and analytic philosophy to investigate these questions.

I am also interested in comparative cognition in non-human animals and AI systems, particularly in questions such as: to what extent do such entities possess experiences and values of their own?

Laura Teresa Ball

Lydia Nottingham

Lydia Nottingham

Matthew James Farr

I am a researcher at Groundless AI, where I investigate the subtle computational contexts that make AI possible, and how these introduce vulnerabilities into interpretability and control. I am currently interested in the problem of self-modification, and believe my background with computational contexts and embedded AI systems, combined with mentorship from Daniel Herrmann and Sahil K, can help clarify this increasingly significant problem.

Matthew Farr

Mikhail Mironov

I am a math PhD (algebraic geometry) with research experience in graph machine learning and knowledge graphs. Previously, I worked as a research manager on the MARS program at the Cambridge AI Safety Hub. My current research focuses on the mathematical foundations of AI agents for safety and alignment.

Mikhail Mironov

Nadav Amir

I am a computational cognitive scientist based at the Fields Institute in Toronto. I am interested in notions of individual and collective agency in natural and artificial cognitive systems. My work draws on non-Western, and in particular Buddhist, models of cognition and subjective experience.

Nadav Amir

Nathaniel Sebastian Cooke

I’m a Senior Policy Adviser on the National Security & Resilience team at the UK’s Government Office for Science, where I work on preparing the Scientific Advisory Group for Emergencies (SAGE) for AI risks.

I have a background in existential risk research, crisis management, activism, and wargaming.

I’m interested in AI supply chains, community organising, AI politics/strategy, and worst-case scenario planning.

Nathaniel Sebastian Cooke

Ohad Avnery

I am an MSc student in theoretical mathematics and have worked in applied cryptography. I am interested in advancing the theory of AI alignment, focusing on the mathematical foundations, and in bringing a security mindset to the field.

Ohad Avnery

Rachel Elizabeth Mary Calcott

I’m a PhD student at Harvard studying how people learn from and coordinate with others, and how shared norms emerge to support that coordination. I’m interested in how these dynamics play out in human–AI systems, and in the cognitive processes underlying coordination success and failure.

Rachel Elizabeth Mary Calcott

Roger Dearnaley

I’ve been interested in AI Alignment for many years, and have in the last year transitioned to working an an independent AI Alignment researcher. My research interests include the intersection of Ethics and Evolutionary Moral Psychology with AI Alignment, Constitutional Alignment, Value Learning, Personas, AI Psychology, Simulator Theory, Alignment Pretraining, and practical alignment applications of Interpretability techniques.

Roger Dearnaley

SJ Beard

I am a philosopher and existential risk researcher working on large-scale global transformations being driven by AI, political instability, societal disruption, and environmental breakdown and building existential hope in possibilities for safe, joyous, and inclusive futures for humanity. I have a strong portfolio of work across policy and media engagement, including establishing an All Party Parliamentary Group for Future Generations and making programmes for the BBC. ​I am currently a Research Affiliate at the Institute for Technology and Humanity in the University of Cambridge and a BBC/AHRC New Generation Thinker. I have a PhD in philosophy from the London School of Economics and am on the editorial board of the journal Futures.

SJ Beard

Stephan Wäldchen

My research focuses interpretability, steganography, AIS via debate. Currently also doing a project on training dynamics of tensor networks as a generalisation of Deep Linear Networkds. I also wrote and directed “Out of this Box! – The Last Musical (Written by Humans)”, see outofthisbox.show.

Stephan Wäldchen

Affiliate Alumni

Profile adam shai

I’m a PhD student in applied math at Brown University interested in cognitively-inspired AI, sequential decision making, and probabilistic programming.

Adam Shai

Founder of Simplex
Me bw rect annah

[Affiliate Jan-Jun 2024] I’m Annah, I started my AI safety journey with the MATS summer 2023 cohort under the mentorship of Dan Hendrycks. I previously did a PhD in machine learning at TU-Berlin. During MATS I worked on concept extraction, activation steering and knowledge removal. I also coauthored a paper on representation engineering with my colleagues at CAIS. I’m excited to explore new research directions with PIBBSS.

Ann-Kathrin Dombrowski

Far.ai
Clem clem von stengel e1704819887998

[Affiliate Jan-Jun 2024] I am a researcher in Alignment of Complex Systems, focussing on formal models of phenomena in evolutionary ecology which could shed light on the AI alignment problem. I’ve also made my way through a variety of academic disciplines : I’m currently PhD student in both Informatics and Macroecology, with a background in Mathematics and Theoretical Physics (Bachelors), and History and Philosophy of Science (Masters).

Clem von Stengel

Shadow AI Startup
Fernando rosas2

[Affiliate Jun 2024-Jul 2025] I’m interested in understanding how abstraction works, and how artificial and biological neural systems use them to process information and do computations. I have a colourful background combining mathematics, physics, computational neuroscience, philosophy, and music composition. My current work focuses on formalising the notion of “emergence”, and applying it to better understand how agents build world models and use them to solve specific tasks. I’m also very interested in how various fundamental ideas from complexity science (e.g. renormalisation, metastability, criticality) can be used to foster AI interpretability.

Fernando Rosas

University of Sussex
Profile pic

[Affiliate Jan-Jun 2024] My research aims to understand the development of internal representations and capabilities during the training of deep learning models. I am interested in reducing uncertainty about the emergence of deceptive alignment during training and developing mathematically principled techniques to better detect deceptively aligned goals. In this context, I am looking into the relevance of singular learning theory to understanding the training dynamics of deep learning models, and adapting measures of emergence from multivariate information theory to deep neural networks.

Guillaume Corlouer

Independent Researcher
Nischal

[Affiliate Jan-Jun 2024] I am Nischal, a PhD student in theoretical neuroscience at the Hebrew University of Jerusalem interested in mathematical theories of brain functions. I’m curious if and how we can use tools and ideas from neuroscience to understand AI.

Nischal Mainali

Academic Researcher

Alumni Fellows of 2025

Aaron kirtland

I’m a PhD student in applied math at Brown University interested in cognitively-inspired AI, sequential decision making, and probabilistic programming.

Aaron Kirtland

Abhimanyu pallavi sudhir

I am a CS PhD student at Warwick focused on the intersection of markets and AI, focused on themes such as (1) formal analogies between market failure and misalignment, and market-based approaches to scalable oversight (2) market dynamics, friction and bounded rationality (3) general market-based approaches to AI.

Abhimanyu Pallavi Sudhir

Abhinav singh

I specialize in Cybersecurity research with a focus on Data, Cloud and AI.

Abhinav Singh

Alexandre variengien

I am Alexandre Variengien. I am interested in positive visions for the development of AI fostering healthy ecosystems of machines and humans.

I previously worked as a researcher on mechanistic interpretability, co-founded an AI safety research and advocacy non-profit in France, and most recently worked as a technology specialist at the EU AI Office, in the EU Commission.

Alexandre Variengien

Annie stephenson photo annie stephenson

I’m a postdoctoral researcher at Princeton studying human collective behavior, and my PhD is in physics. My recent work has focused Reddit’s r/place experiment as a way to understand large-scale human coordination. I’m now interested in multi-agent AI systems: how emergent behavior of many agents differs from that of humans, how those differences might impact human social dynamics, and how AI could help us better understand or encourage human cooperation.

Annie Stephenson

Antoine vigouroux

I study mechanisms that were not deliberately designed by humans. Most of my research has been on bacteria – my PhD was about bacterial morphogenesis, and my postdoc about how populations of bacteria evolve when you select for faster growth rate. After years of studying how sequences of nucleotides can self-organize into complex functional mechanisms, I am now interested in how arrays of numbers can self-organize into something more intelligent than us.

Antoine Vigouroux

Callum lawson

I love trying to figure out the best strategies to deal with dynamic and unpredictable environments. I’ve worked on these kinds of problems in quantitative ecology, where I studied ecological and evolutionary responses to fluctuating environments, and in industry, where I built models to predict risks in quantitative finance and healthcare. Now I’m looking to apply these ideas to AI research, linking ideas from evolutionary game theory and unsupervised environment design to help understand the robustness of reinforcement learning agents.

Callum Lawson

Franz nowak

I use tools from the theory of computation to better understand the inner workings, representational capacity, and limitations of neural language models. In the long run, I am interested in bridging the gap between symbolic and data-driven AI to verify and ensure that models do what they say they do.

Franz Nowak

Jacques thibodeau

I am a co-founder of Coordinal Research, an organization which seeks to leverage AIs to accelerate AI safety progress. Prior to that, I’ve worked as an alignment researcher on a variety of research agendas which focus on interpretability, stable alignment, evaluations, and cyborgism. In a previous life, I worked as a data scientist in a variety of teams within government and studied lasers and material science as a physicist.

Jacques Thibodeau

Jared moore

I’m a PhD student in Computer Science at Stanford University focusing on social reasoning and alignment. I work on the fundamental social abilities of large language models (LLMs), such as their capacity for theory of mind and tendencies toward deception, as well as how those abilities map onto the real world, such as how LLMs fail to perform various skills of therapy.

I previously worked as a researcher on mechanistic interpretability, co-founded an AI safety research and advocacy non-profit in France, and most recently worked as a technology specialist at the EU AI Office, in the EU Commission.

Jared Moore

Jasmina urdshals

I am a theoretical physicist, with a background in high energy particle physics. I quit a postdoc last year to focus on AI Safety research, and I worked on mechanistic and developmental interpretability using Singular Learning Theory tools. I am interested in theoretical approaches to alignment and interpretability in AI systems, and aim to diminish risks from advanced AI systems.

Jasmina Urdshals

Joel christoph

I research AI safety and governance through the lens of economics and complex systems as a PhD researcher at the European University Institute. My focus is on alignment strategies and mitigating large-scale risks from advanced AI. I’m excited by PIBBSS’s interdisciplinary approach to developing safe and beneficial artificial intelligence.

Joel Cristoph

Matthew shinkle

I’m driven to understand the fundamental components of cognition—how internal representations and computations give rise to intelligent behaviors. In my graduate work in neuroscience, I developed methods for simulating brain responses using DNNs and applying interpretability techniques from computer vision to the human visual system. Going forward, I’m focused on understanding AI models themselves—the features and computations they perform, and how we can use these insights to make AI systems more transparent, steerable, and safe.

Matthew Shinkle

Matthias georg mayer

I am a mathematician interested in Agent Foundations to provide formal guarantees for the safety of AI systems. Previously, I worked on structural independence, a generalization of d-separation to structural causal models (also known by Finite Factored Sets). Recently, my interests have shited towards the Learning Theoretic Agenda by Vanessa Kosoy. In particular I am excited about Infrabayesian-Physicalism as a means to address embedded agency.

Matthias Georg Mayer

Max ramsahoye

I’m Max and I’m an interdisciplinary researcher in Philosophy focused on the alignment of complex adaptive systems and hybrid collective intelligences (CI) – emergent agencies composed of artificial and human intelligences (HIxAI). My research project aims to introduce the ‘conceptual engineering’ of ‘The Cybernetic Intelligence Alignment Problem’ and produce the first literature review on this emerging paradigm – exemplified in recent years by the work of the Alignment of Complex Systems Research Group, the AI Objectives Institute, the Collective Intelligence Project, the Tegmark Group as well as many other AI safety organisations and theorists (including PIBBSS!). Ultimately, my goal is to advance a paradigm-shift within AI, from an AI-centric approach defined by technological and political solutionism (‘safety and governance’), to a more critical, holistic approach (‘sociotechnical alignment’, ‘ideal governance’). that orients towards addressing the metacrisis, designing ‘the third attractor’ and aligning civilisational superintelligence as a whole.

Max Ramsahoye

Paul lorxus rapoport

I’m a mathematician who got a doctorate working in algebraic topology and model theory, with earlier background in molecular biology and linguistics. These days, I find myself using everything from category theory to mechanism design to study the components of agency and the roots of abstraction.

I previously worked as a researcher on mechanistic interpretability, co-founded an AI safety research and advocacy non-profit in France, and most recently worked as a technology specialist at the EU AI Office, in the EU Commission.

Paul “Lorxus” Rapoport

Dalcy ku

I am an undergraduate studying math at Harvard. My interest regards convergent structural properties (beyond behavioral isomorphism) of systems selected under some criteria, with applications to agent foundations and interpretability.

Dalcy Ku

Shray bansal

I am a Postdoctoral researcher at Georgia Tech. My research is at the intersections of game theory, multiagent learning, and human-AI coordination. I have focused on developing cooperative AI methods for human-robot interaction but I am interested in fundamental problems in alignment of intelligent agents.

Shray Bansal

Sonja kraiczy

I am Sonja, a PhD student at the University of Oxford studying fair algorithms for democratic decision making and preference aggregation, as well as problems in the intersection of economics and theoretical computer science more generally. Over the past year, I have been interested in the role these areas can play as part of our bet on strategies to develop save and cooperative AIs.

Sonja Kraiczy

Tomasz steifer

I have recently finished my postdoc in Chile, where I worked on computational learning theory, social choice and computational limitations of transformers. Before that I obtained a PhD in computer science (the thesis was on prediction and algorithmic randomness) and ever earlier I had studied mathematical logic, philosophy and neuroinformatics. I am excited about using mathematical tools to study philosophical questions, such as, what can be known?

Tomasz Steifer

Xavier poncini

I’m Max and I’m an interdisciplinary researcher in Philosophy focused on the alignment of complex adaptive systems and hybrid collective intelligences (CI) – emergent agencies composed of artificial and human intelligences (HIxAI). My research project aims to introduce the ‘conceptual engineering’ of ‘The Cybernetic Intelligence Alignment Problem’ and produce the first literature review on this emerging paradigm – exemplified in recent years by the work of the Alignment of Complex Systems Research Group, the AI Objectives Institute, the Collective Intelligence Project, the Tegmark Group as well as many other AI safety organisations and theorists (including PIBBSS!). Ultimately, my goal is to advance a paradigm-shift within AI, from an AI-centric approach defined by technological and political solutionism (‘safety and governance’), to a more critical, holistic approach (‘sociotechnical alignment’, ‘ideal governance’). that orients towards addressing the metacrisis, designing ‘the third attractor’ and aligning civilisational superintelligence as a whole.

Xavier Poncini

Generic profile image

Also completed this year

Sunayana Rane

Alumni Fellows of 2024

Agustín martinez suñé

I recently finished a Ph.D. in Computer Science at the University of Buenos Aires, Argentina, where I developed formal methods for analyzing distributed systems. These methods are grounded in logical-mathematical foundations to provide provable guarantees about their output. I’m transitioning to a career in AI safety and AI risk reduction. The main question that currently drives my research is: what role can formal verification techniques play in the field of AI safety?

Final presentation – Neuro-Symbolic Approaches for Safe LLM-Based Agents

Agustín Martinez Suñé

Aron vallinder

I’m an independent researcher. My primary academic background is in philosophy, with a PhD on Bayesian epistemology from the London School of Economics. I’m currently interested in using lessons from cultural evolution to think about AI safety and development.

Final presentation – Cultural Evolution of Cooperation in LLMs

Aron Vallinder

Baram sosis

I’m a PhD student in mathematical neuroscience at the University of Pittsburgh. My research focuses on understanding the mechanisms of learning and decision-making in the basal ganglia. I’m currently transitioning to work in AI safety, where I’m interested in exploring a variety of approaches.

Final presentation – Dynamics of LLM beliefs during chain-of-thought reasoning

Baram Sosis

Euan mclean

I have a PhD in theoretical particle physics, worked in ML engineering, technical comms, and macrostrategy research at the centre on long-term risk. I’m interested in questions regarding phenomenal consciousness and wellbeing in AI systems.

Final presentation – Indicators of phenomenal consciousness in LLMs: Metacognition & higher-order theory

Euan McLean

Jan bauer

I’m interested in the tension between expressivity and stability in intelligent systems. How can capricious components give rise to reliable cognition? For example, in the brain, synaptic noise and strong connectivity give rise to chaotic dynamics, whereas in artificial systems, adversarial attacks sometimes prevent robust generalization from training data. Yet, both systems are highly capable. As a strong believer in synergies between fields, I approach this question from theoretical neuroscience, biased with a background in statistical physics.

Final presentation – Neuro-Symbolic Approaches for Safe LLM-Based Agents

Jan Bauer

Magdalena wache

Causality enthusiast trying to become less confused about agency and abstractions. Previously I did my master’s in machine learning with a minor in mathematics and I have worked on interpretability in the course of the Machine Learning Alignment Theory Scholars program.

Final presentation -Factored Space Models: Causality Between Levels of Abstraction

Magdalena Wache

Mateusz bagiński

What are the unifying principles behind phenomena, such as cognition, agency, and goal-directedness? Do we hold some assumptions that prevent us from understanding these principles? I’m especially interested in getting traction on formation and mechanics of goal-directed cognition, consequentialist reasoning, and endogenously driven value change.

Final presentation – Fixing our concepts to understand minds and agency

Mateusz Bagiński

Matthew clarke

I am interested in how networks make decisions, both in machines and in biology. My work as a postdoctoral researcher has focused on understanding the networks that underlie decision making in human cells. Specifically, I research how these decisions go wrong in cancer or are hijacked in viral disease, and how we can best perturb them to treat disease. I am now interested in applying the lessons from this work to the mechanistic understanding of neural networks, as well as bringing methods for interpreting synthetic networks back to biology.

Final presentation – Examining Co-occurence of SAE Features

Matthew Clarke

Nadine spychala

I’m a doctoral researcher in computational neuroscience & complex systems at Sussex University as well as a research software engineer at King’s College London. During the PIBBSS fellowship, I aim to bring together various strands of research (philosophical, formal/mathematical and empirical) on the concept of emergence to inform & bring progress on research in AI capabilities. I ultimately want to explore whether gained insights can be channelled into evals-type of work to produce a deployable “emergence-assessment pipeline” for assessing AIs w. r. t. their emergent capabilities.

Final presentation – The potential of formal approaches to emergence for AI safety

Nadine Spychala

Shaun raviv

I’m a freelance print and audio journalist based in Atlanta. I’ve written features for Wired, Smithsonian, The Intercept, The Ringer, and The Washington Post, as well as several podcast series. Topics I’ve covered include the free energy principle, the history of facial recognition technology, phone hacking in 1980’s Sweden, and ethics and hereditary disease.

Shaun Raviv

Wesley erickson

I have an PhD in physics, with a specialization in stochastic processes, computational physics, and laser-cooled atoms. My research has involved investigating universal aspects of rare but extreme events, with models that can be applied to systems ranging from atomic motion of cold atoms to optimal animal foraging strategies. I am interested in exploring similar universal behavior in machine learning algorithms, especially to better understand how to detect signatures of “insight” in the learning process.

Final presentation – Heavy-tailed Noise & Stochastic Gradient Descent

Wesley Erickson

Yevgeniy liokumovich

I am a mathematician interested in using methods from geometry and topology to contribute to the AI safety and alignment problem.

Final presentation – Minimum Description Length for singular models

Yevgeniy Liokumovich

Alumni Fellows of 2023

Aysja johnson

My academic background is in neuro and cognitive science; now, I’m learning about biology in search of a better understanding of entities which can cause reality to warp to their goals. Things I like to think about: how life manages to robustly hit narrow targets (such as making a human being starting from one cell), what exactly “levels of abstraction” are and how life uses them, what the dial is that causes “agency” to vary across different systems (e.g., skin cells seem much less “agentic” than immune cells—why?)

Final presentation – Searching For a Science of Abstraction

Aysja Johnson

Brady pelkey

I am an independent student with a background in math and philosophy. I’m currently exploring ways to formalize embedded agents and goal-directed subsystems. Other topics I like to think about include maps between causal models, and interactive preference construction.

Brady Pelkey

Ceciliawood

I’m a PhD student in Economics at the London School of Economics. My research focuses on using techniques from economic theory, especially mechanism design, to AI safety.

Final presentation – Beyond vNM Self-modification and Reflective Stability

Cecilia Wood

Profilepic eleniangelou

Eleni is a PhD student in the philosophy program at the CUNY Graduate Center. She is currently a visiting researcher at the Center for Science, Technology, Medicine, and Society at UC Berkeley. Her research focuses on scientific cognition in both human and artificial agents. Eleni is also interested in questions related to technological progress, innovation, and the metascience of AI Safety.

Final presentation – Overview of Problems in the Study of Language Model Behavior

Eleni Angelou

Erin cooper

I am a PhD candidate in Philosophy at Stanford. I specialize in Political Philosophy and Ethics and am completing a dissertation on trust in political philosophy. For the fellowship, I will be doing a project summarizing philosophical approaches to distinguishing between manipulation and non-manipulation.

Erin Cooper

Gabriel weil

I am an Assistant Professor at Touro University Law Center. Prior to joining the Touro faculty, I was a research manager at the Climate Leadership Council. My primary research focus is climate governance governance, but I am interested in applying the tools and methods I have developed in that domain to AI safety.

Final presentation – Tort law as a tool for mitigating catastrophic risk from AI

Gabriel Weil

George deane

I am a philosopher, currently a postdoctoral researcher on artificial consciousness on the Digital Minds project — a collaborative project between philosophers and computer scientists (Yoshua Bengio and his group at MILA) based at the University of Montreal, and the University of Oxford. I received my PhD from the University of Edinburgh in 2021, on consciousness, the self, and altered sense of self in the active inference framework. At the moment I am very interested in the possibility of a sense of self and agency emerging in AI systems.

George Deane

Giles howdle

My research background is primarily in the philosophy of action. I am particularly interested in the nature and emergence of agency (and normativity) in humans, social entities, and artificial intelligence. I am also working on the relationship between instrumental rationality and the adoption of values and policies, particularly in the context of cognitively, computationally, and/or temporally bounded agents. I am also keen to investigate the AI risk and ethical implications of these issues.

Final presentation – Auto-Intentional Agency and AI Risk

Giles Howdle

Guillaume Corlouer

My research aims to understand the development of internal representations and capabilities during the training of deep learning models. I am interested in reducing uncertainty about the emergence of deceptive alignment during training and developing mathematically principled techniques to better detect deceptively aligned goals. In this context, I am looking into the relevance of singular learning theory to understanding the training dynamics of deep learning models, and adapting measures of emergence from multivariate information theory to deep neural networks.

Final presentation – The role of model degeneracy in the dynamics of SGD

Guillaume Corlouer

Jason hoelscher obermaier

I am an ML research engineer with a Ph.D. in experimental quantum physics and a background in philosophy. I am interested in robust evaluations of AI systems and how to use AI to improve rather than damage our collective epistemics and decision-making.

Final presentation – How LLM Evaluations Influence AI Risks

Jason Hoelscher-Obermaier

Martín soto

I am a Mathematical Logic grad student from Barcelona, working towards understanding intelligence in order to reduce future disvalue. I’m working with Vivek Hebbar (Researcher, MIRI) on theoretical threat models and interpretable architectures. While finishing my studies, I’m also exploring different directions in agent foundations with Abram Demski (Researcher, MIRI), and collaborating with the Center on Long-Term Risk for the reduction of suffering-risks.

Final presentation – Constructing Logically Updateless Decision Theory

Martín Soto

Matthew lutz

I am a behavioral ecologist and architect with a PhD in Ecology and Evolutionary Biology from Princeton, where I studied self-assembled structures built by army ants from their own bodies. My current work as a postdoc at the University of Roehampton seeks to understand the evolution of building behavior in termites by comparing nest morphologies among related species. At PIBBSS, I will apply insights drawn from mathematical modeling of these complex insect societies to alignment and coordination problems in multi-agent systems, with the aim of avoiding the evolution of novel predatory AI superorganisms.

Final presentation – Detecting emergent capabilities in multi-agent AI Systems

Matthew Lutz

Ninelloldenburg

I just graduated from a Master’s program in IT and Cognition at the University of Copenhagen and have a background in linguistics and computational linguistics. I am broadly interested in cooperation amongst humans, computers, and in-between those two, currently with a focus on social norms.

Final presentation – Learning and Sustaining Social Norms as Normative Equilibria

Ninell Oldenburg

Nischal

I am Nischal, a PhD student in theoretical neuroscience at the Hebrew University of Jerusalem interested in mathematical theories of brain functions. I’m curious if and how we can use tools and ideas from neuroscience to understand AI.

Final presentation – A Geometry Viewpoint for Interpretability

Nischal Mainali

Sambita modak

I have a PhD in Behavioral Ecology from Indian Institute of Science, Bangalore, and I am currently working as a researcher at National Centre for Biological Sciences in Bangalore. While my research background is rooted in examining determinants of animal behavior in an evolutionary biology framework, I am deeply motivated by transdisciplinary approaches to research and problem solving. My current interest is to explore how concepts and skills from my doctoral research in animal behavior and evolution can be applied to other cause areas like AI alignment.

Sambita Modak

Sammy martin

I’m currently working with CLR on a project that investigates AI misuse scenarios. I’m also involved with running the Modelling Transformative AI Risk (MTAIR) forecasting project and conducting technical research in cooperative AI (benchmarking cooperative intelligence). I’m currently most interested in AI strategy and forecasting, with a strong inclination towards incorporating expertise from diverse fields such as politics, international relations, and other disciplines to address AI strategy questions. I’m also keen to explore methods to aggregate knowledge from various sources and reason better under deep uncertainty.

Final presentation – An overview of AI misuse risks and what to do about them

Sammy Martin

Tom ringstrom

I am a Computer Scientist who is interested in the foundations of reward-free compositional planning and intrinsic motivation. I develop theory for constructing compositional representations that agents can use to rapidly stitch together plans. My theory allows advanced agents to plan in dynamic hierarchical environments and also evaluate why achieving some state of the world is good or bad, without succumbing to objectives that accumulate “reward signals”, as is common in AI.

Final presentation – A Mathematical Model of Deceptive Policy Optimization

Tom Ringstrom

Urte laukaityte

I am a late-stage PhD candidate in the Philosophy Department at UC Berkeley, focusing on cognitive science, biology, and psychiatry. I am interested in exploring the issues around building artificial systems with respect to some of the recent developments within the life and mind sciences – particularly basal cognition, soft robotics, and the biogenic approach more generally.

Urte Laukaityte

Alumni Fellows of 2022

Adam prada

I am a PhD student at the Yusuf Hamied Department of Chemistry, University of Cambridge working on quantum chemical dynamics. During my PIBBSS fellowship, I will be working on the problem of agency and hierarchical agents.

Adam Prada

Anand siththaranjan

I’m a PhD student at UC Berkeley advised by Stuart Russell and Claire Tomlin. I’m interested in leveraging ideas from control theory, learning, and economics as a means of creating principled, beneficial intelligent systems.

Anand Siththaranjan

Anson ho

I’m a researcher at Epoch, investigating and forecasting the development of advanced AI to help inform AI governance. I’m particularly interested in neural network interpretability, AI forecasting, and theoretical AI alignment research.

Anson Ho

Jan hendrik kirchner

I am a researcher of minds – artificial and biological – with a background in cognitive science and computational neuroscience. After researching the early development of the brain in my PhD, I am now working towards aligning artificial intelligence with human values at OpenAI. I write blog posts “On Brains, Minds, And Their Possible Uses” and care about doing good, better.

Jan Hendrik Kirchner

Danielalexanderherrmann2

I am a PhD candidate in the department of Logic and Philosophy of Science at the University of California, Irvine. My primary research areas are decision/game theory and formal epistemology, in which I develop models of agents who reason about the ways in which they might be embedded in their world. I also have work clarifying the connection between computational learning theory and Occam’s razor, modeling the invention and evolution of conventions and language, and applying prediction aggregation methods to social epistemology and policy making.

Daniel Hermann

Holly elmore

I have a PhD in Evolutionary Biology from Harvard, where I also did EA community organizing. Now I work as a researcher at Rethink Priorities on wild animal welfare and am interested in applying my evolutionary background to other important cause areas.

Holly Elmore

Lux miranda

I would describe myself as a social scientist of intelligent agents such as humans and AI. My research draws from complexity science, anthropology, cognitive science, and (inverse) generative computational modeling. At Uppsala, I will study ethics and alignment surrounding human-like identity cues in social robots and other AI. I do my best to be a source of light. Find me at https://luxmiranda.com/

Lux Miranda

Martin stoffel

I’m an Evolutionary Geneticist at the University of Edinburgh, trying to work out how genetic variants spread and disappear and contribute to traits and fitness in wild animal populations. With a background in Psychology and Molecular Ecology, I’m curious how ideas connect across disciplines, and what we can learn about AI alignment from biological systems.

Martin Stoffel

Zachary peck

I am a PhD student in Philosophy of Science at the University of Cincinnati. Within academic philosophy, my research lies at the intersection of cognitive science, artificial intelligence, social and political philosophy, and the life sciences. Generally speaking, my AI-alignment research interests fall into two categories: agency and abstraction. In particular, I’m interested in how the capacity for acting agentially and thinking abstractly emerges in complex systems (both biological and artificial).

Zachary Peck

Generic profile image

Other Fellows

Aanjaneya Kumar

Generic profile image

Other Fellows

Abra Ganz

Generic profile image

Other Fellows

Andrea Luppi

Generic profile image

Other Fellows

Blake Elias

Generic profile image

Other Fellows

Ivo Andrews

Generic profile image

Other Fellows

Jeffery Andrade

Generic profile image

Other Fellows

Josiah Lopez-Wild

Generic profile image

Other Fellows

Kai Sandbrink

Generic profile image

Other Fellows

Mel Andrews

Generic profile image

Other Fellows

Orowa Sikder

Generic profile image

Other Fellows

Simon McGregor