
Brain activity measured in an fMRI machine can train a brain decoder to interpret a person’s thoughts. UT Austin researchers have developed a faster adaptation method, even for those with language comprehension difficulties. Credit: Jerry Tang/University of Texas at Austin.
Researchers Jerry Tang and Alex Huth at UT Austin have developed an AI tool that decodes brain activity into continuous text, even for those who struggle with language comprehension. Unlike previous methods that needed many hours of training and worked only for specific individuals, the technique can be adapted to new users in about an hour, using fMRI while users watch silent videos, like Pixar shorts.
The research holds particular promise for aphasia, a neurological disorder affecting about one million people in the U.S. The condition impairs the ability to produce and comprehend language, which can result from strokes, brain injuries or neurodegenerative diseases. This condition poses significant challenges to communication, impacting personal, social, and professional interactions.
This novel research at UT Austin occupies a new niche in the BCI landscape by balancing the benefits of high-fidelity decoding with a non-invasive approach. Invasive BCI implants can deliver faster, more precise results but require neurosurgery, extensive calibration, and ongoing clinical management. Conversely, other non-invasive methods like EEG remain more portable but lack the resolution to produce rich, continuous text. By tapping fMRI data and a converter algorithm that adapts the model to new individuals in about one hour—even for those who cannot comprehend spoken language—UT Austin’s system demonstrates that meaningful, free-form semantic decoding is feasible without surgery or months of training.

Jerry Tang (left) and Alex Huth (right) demonstrate an AI tool that converts thoughts into continuous text without requiring spoken word comprehension. Here, they prepare the fMRI scanner for a brain activity recording. Credit: Nolan Zunk/University of Texas at Austin.
How it works
One of the most notable features of the brain decoder is its reliance on functional magnetic resonance imaging (fMRI) coupled with a transformer-based AI model, akin to GPT-style language architectures. In essence, the system captures a person’s brain activity patterns while they engage with certain stimuli—originally spoken stories, but now more flexibly with silent videos. The AI analyzes these patterns to extract semantic meaning, producing an ongoing stream of text that describes the user’s thoughts, as Physics World noted last year.
Because fMRI records slow blood-oxygen-level-dependent (BOLD) signals, the decoder often paraphrases rather than delivering literal word-for-word transcripts. Despite this, the resulting text captures the “gist” of what a person is mentally processing, as an NIH article noted. Earlier versions required up to 16 hours of fMRI data from a single user, but the latest improvements cut training time to roughly one hour by using visual, language-free stimuli. This shift to non-verbal training helps those who cannot comprehend or produce speech.
Adapting to new users
Another unique aspect of this research is the converter algorithm that speeds adaptation to new individuals. While a reference model might still require longer sessions with several participants (often listening to narratives or stories), a new user only needs around one hour of silent video viewing. During this time, the algorithm aligns the new user’s brain activity with the pre-trained model’s patterns. This approach leverages “cross-modal semantics”—the idea that meaning exists independent of how it’s conveyed (audio, language, or visuals)—so the system can detect and decode that underlying meaning.
Technically, the alignment is achieved through the comparison of fMRI signals from the silent videos in the new user to similar signals in the reference subjects. The converter then “translates” these new signals into a format that the already-trained language model can recognize, thus producing coherent text from a minimal dataset.
Potential applications
Because the decoder can glean meaning from non-verbal, internal brain signals, it holds tremendous promise for therapeutic and assistive technologies. In particular. In addition to aphasia, the technology holds promise for helping individuals with paralysis and ALS. Beyond medical uses, a non-invasive decoder could someday act as a hands-free keyboard for note-taking or device control purely via mental activity.
Other brain-computer interfaces use invasive implants (such as electrocorticography grids) to achieve faster and more precise communication rates, sometimes reaching up to 62 words per minute, as Brown News has noted. Yet the surgical risks, technical complexities\ and long-term maintenance requirements pose barriers to widespread adoption.
A core concern in any “mind-reading” technology is misuse or unauthorized data acquisition. The UT AUstin brain decoder only works with cooperative participants who undergo proper training sessions. If a person is unwilling or actively thinking of something else, the model’s output degrades into incoherence. Furthermore, if the decoder is trained on one individual’s brain signals but used on another, it produces nonsensical text. These safeguards help ensure that unauthorized mind-reading isn’t a near-term concern.

Brain activity from two individuals watching the same silent film. A converter algorithm developed at UT Austin maps one person’s brain activity (left) onto another’s (right), enabling faster adaptation of the brain decoder. Credit: Jerry Tang/University of Texas at Austin.
The UT Austin neurotech research is something of a microcosm in that it representing a quickly advancing field. For instance, in a recent BrainGate2 clinical trial at Stanford University, a 69-year-old man with C4 AIS C spinal cord injury successfully piloted a virtual quadcopter using only his thoughts. Two 96-channel microelectrode arrays were implanted in the “hand knob” region of his left precentral gyrus, enabling real-time decoding of neural signals as he attempted to move individual fingers. The participant achieved a mean target acquisition time of about two seconds and navigated through 18 virtual rings in under three minutes—over six times faster than a comparable EEG-controlled system. Researchers suggest that this finger-level control could translate to more natural, multi-degree-of-freedom tasks, potentially facilitating improved independence and leisure activities for people living with paralysis.