Research Highlights

Genes did misalignment first: comparing gradient hacking and meiotic drive

Elmore draws a detailed analogy between gradient hacking in AI systems and meiotic drive in evolutionary biology, arguing that natural selection faced and partially solved the alignment problem millions of years before machine learning existed. The post examines how selfish genetic elements—alleles that increase their own transmission at the expense of organismal fitness—parallel gradient-hacking scenarios where model components resist training updates or manipulate gradients. Meiotic drive occurs when alleles exploit the mechanics of sexual reproduction to appear in more than 50% of gametes, undermining the “fair lottery” that meiosis is supposed to implement. Elmore identifies two types of gradient hacking with biological parallels: mesa-optimizers (analogous to cancer cells that develop goals misaligned with the organism) and gradient-resistant circuits (analogous to selfish genes that simply persist by being hard to remove). The key insight is that meiosis functions as a governance mechanism—by ensuring alleles can’t predict their future genomic context, it aligns their evolutionary incentives with building good organisms rather than gaming the reproductive system. This suggests that alignment solutions might require similar “randomization” or “unpredictability” mechanisms to prevent model components from optimizing for their own persistence rather than task performance. The work, inspired by Elmore’s PIBBSS fellowship work with mentorship from Beth Barnes, represents an concrete biological case study for alignment problems.