We ran the Doom neuron experiment 601 times to answer one question: do the neurons actually matter?

Last week we reported on a neuroscience experiment where living human neurons in a dish learned to play the 1993 video game Doom. In the readme file associated with the project, researcher Sean Cole had documented a problem: his decoder, the conventional software that translates neuron activity into game actions, “tends to start becoming a policy head.” In plain language, the software might be learning to play Doom on its own, routing around the biology entirely. But Cole built a kill switch into his own code: three ablation modes that let anyone swap the neuron signal for random noise, or silence it completely, and see whether learning collapses.

We wanted to see for ourselves. So we cloned the repository and ran the experiment 451 times across 15 random seeds on an NVIDIA H100 GPU through Google Colab (we had no extra neurons lying around to spare), with an additional 150 runs on a MacBook to confirm the pattern held across hardware. If you want to modify the code, you can take it for a spin here.

One caveat. This was a software-only test. Our runs isolate the software pipeline rather than a live neuron dish. But that is why the ablation result matters. If the decoder were secretly doing all the work, random or zeroed inputs should perform similarly. Instead, performance drops when structured signal is removed, which suggests the system is not simply a software policy masquerading as wetware.

We defined a breakthrough as any episode scoring above −600 reward, where −1,000 is the floor and positive scores are possible. The agent never killed anything in any mode; learning here means surviving longer and accumulating reward rather than combat. [That pattern is expected in early RL and wetware-style setups: survival gives dense, immediate feedback, while a kill requires a long, brittle action chain (find, aim, fire) with sparse reward. In that regime, policies often settle into a local minimum of “stay alive longer” (i.e. hiding in a corner) which raises reward without developing lethal combat behavior.]
With the neuron signal intact, the agent crossed that threshold in 27% of H100 episodes. Replace the signal with noise: 7%. Silence it entirely: 8%. The ceiling told the same story. The best intact-signal episode scored −100; the best random episode managed −330, and zero topped out at −380. On the MacBook, where the effect was even sharper, the intact signal produced the only positive score in the entire dataset: +160, while random never broke −600. The decoder alone wasn’t enough.

Across 451 Doom ablation episodes on an H100 GPU (15 seeds), intact signal (none) showed better upside than random or zero-signal controls, while all modes remained highly seed-sensitive.

To be clear, ablation can rule out the simplest version of the concern Cole flagged, that the decoder is a standalone policy ignoring its input. The structured signal matters. But a software-only run cannot confirm that living neurons are the ones doing the learning. That question requires the full biological loop: a live dish, real-time feedback and the same ablation controls applied to wetware.

The results were also highly seed-dependent. Some random seeds produced rapid improvement; others flatlined across all ten episodes regardless of mode. That volatility is typical of reinforcement learning in its earliest stages, where a few lucky initial conditions can cascade into meaningful exploration while most runs never escape a local minimum. A longer training horizon or more parallel environments would likely smooth the curve, but with ten episodes per seed, the signal is real and noisy.

Cortical Labs isn’t alone in this space. Multiple independent groups have been training living neurons to control simulated and physical systems for years. Steve Potter’s lab at Georgia Tech pioneered the field around 1999, building hybrid robots controlled by cultured rat neurons. In 2004, Thomas DeMarse at the University of Florida grew 25,000 rat neurons that learned to fly a simulated F-22. More recently, open-source projects and YouTubers have taken up similar work. What makes the Doom repository unusual isn’t the concept but the transparency: a public codebase with built-in ablation testing that anyone can run.

Related Articles Read More >

AI agents in the R&D workforce: Moving beyond commodity AI

Cursor is using AI to build the next Cursor while grappling with industrial-scale code creation

As OpenAI releases GPT-Rosalind for life sciences research, we test out the new AlphaFold and PubMed plugins in Codex

After 9,000 announced job cuts, Novo Nordisk taps OpenAI as it hires in China

Search R&D World