How We Can “Brain Scan” an AI

9/26/2025

The other day I met a guy at Carnegie Mellon who is working on his PhD in neuroscience. He also has a background in data science, so his research leans heavily into the computational side of things. We got into a conversation about how you can actually evaluate what is happening inside AI systems in ways that are surprisingly similar to how neuroscientists and psychologists study people. Just as a neuroscientist might flash images in front of a subject and watch how the brain responds in a scanner, researchers can poke at an AI system with carefully designed inputs and then watch what happens under the hood.

This raises a fascinating question: what does it mean to give an AI a “brain scan”?

Stimulus and Response

The most basic method is behavioral. You give an AI a stimulus and observe the output. This looks a lot like cognitive psychology experiments with humans. For example, you can test memory in a language model by giving it a long paragraph and then asking questions about the first few sentences. If it fails, you have found something about its working memory. Similarly, adversarial images test whether a vision model can be tricked, just as optical illusions reveal quirks in human perception.

But just as psychology only tells you about behavior and not mechanisms, stimulus-response testing with AI only scratches the surface. To get deeper, you need to look inside.

Inside the Black Box

Neuroscience has fMRI, EEG, and single-cell recordings. AI researchers have something even more direct: complete access to every activation in every layer of the network. When an AI processes a sentence or an image, each “neuron” (in reality a mathematical unit) produces a number. All those numbers together form a vector that represents the internal state.

Linear algebra is what lets us make sense of these states. Imagine each layer of a neural network as a high-dimensional vector space. Every input projects onto this space, producing a pattern of activations. By comparing vectors with tools like cosine similarity or Euclidean distance, we can measure how similar the network’s representations are across different inputs. This is the mathematical backbone of representational similarity analysis, a method borrowed from neuroscience that compares how patterns of brain activity cluster around concepts.

Tools of the Trade

Neuron selectivity mapping: Present many stimuli and track which artificial neurons fire. This is analogous to finding cells in the visual cortex that respond to edges or faces. Sometimes you even get “grandmother neurons” in AI that activate strongly for a specific concept.
Feature visualization: Use optimization to generate the input that makes a neuron fire maximally. This is like reverse-engineering a neuron’s tuning curve, except we can literally compute the preferred stimulus.
Dimensionality reduction: Apply PCA or t-SNE to project the high-dimensional activations into two dimensions. This shows clusters and trajectories of representations, much like plotting brain activity in reduced spaces to see how neural populations encode movement or memory.
Probing classifiers: Train a simple linear model to decode whether a property is encoded in a network’s hidden state. For example, can we predict verb tense from a language model’s activations? This mirrors decoding models in neuroscience that predict what image a person is viewing from fMRI scans.
Lesions and interventions: Remove or modify parts of the network and see what breaks. This is the AI version of lesion studies or brain stimulation, testing causality rather than just correlation.
Each of these methods is deeply tied to linear algebra. Whether it is computing singular vectors in PCA, projecting activity onto a subspace, or measuring vector similarity, the math is the same language that underlies both data science and computational neuroscience.

Why This Matters

Thinking this way reframes AI not just as engineering but as a scientific object of study. If a language model can be tested like a brain and scanned like a brain, then interpretability becomes less about debugging code and more about building a cognitive science of artificial systems. It also creates a two-way street. Neuroscience has long inspired AI architectures. Now, AI models give neuroscientists new hypotheses about how biological brains might encode, cluster, and process information.

The tools we use—linear algebra, data visualization, regression, clustering—are not just abstract math tricks. They are the shared microscope that lets us peer into both silicon and biological minds.

Would you like me to polish this into something even more narrative-driven (like a magazine-style essay with more metaphors and examples), or keep it as a straightforward explanatory blog post for a technically literate audience?

Page updated

Google Sites

Report abuse