Deciphering the neural mechanisms that transform sensory experiences into meaningful semantic representations is a fundamental challenge in cognitive neuroscience. While neuroimaging has mapped a distributed semantic network, the format and neural code of semantic content remain elusive, particularly for complex, naturalistic stimuli. Traditional brain decoding, focused on visual reconstruction, primarily captures low-level perceptual features, missing the deeper semantic essence guiding human cognition. Here, we introduce a paradigm shift by directly decoding fMRI signals into textual descriptions of viewed natural images. Our novel deep learning model, trained without visual input, achieves state-of-the-art semantic decoding performance, generating meaningful captions that capture the core semantic content of complex scenes. Neuroanatomical analysis reveals the critical role of higher-level visual regions, including MT+, ventral stream visual cortex, and inferior parietal cortex, in this semantic transformation. Category-specific decoding further demonstrates nuanced neural representations for semantic dimensions like animacy and motion. This text-based decoding approach provides a more direct and interpretable window into the brain's semantic encoding than visual reconstruction, offering a powerful new methodology for probing the neural basis of complex semantic processing, refining our understanding of the distributed semantic network, and potentially inspiring brain-inspired language models.
View on arXiv@article{feng2025_2503.22697, title={ From Eye to Mind: brain2text Decoding Reveals the Neural Mechanisms of Visual Semantic Processing }, author={ Feihan Feng and Jingxin Nie }, journal={arXiv preprint arXiv:2503.22697}, year={ 2025 } }