Protein Large Language Models: A Comprehensive Survey

Protein-specific large language models (Protein LLMs) are revolutionizing protein science by enabling more efficient protein structure prediction, function annotation, and design. While existing surveys focus on specific aspects or applications, this work provides the first comprehensive overview of Protein LLMs, covering their architectures, training datasets, evaluation metrics, and diverse applications. Through a systematic analysis of over 100 articles, we propose a structured taxonomy of state-of-the-art Protein LLMs, analyze how they leverage large-scale protein sequence data for improved accuracy, and explore their potential in advancing protein engineering and biomedical research. Additionally, we discuss key challenges and future directions, positioning Protein LLMs as essential tools for scientific discovery in protein science. Resources are maintained atthis https URL.
View on arXiv@article{xiao2025_2502.17504, title={ Protein Large Language Models: A Comprehensive Survey }, author={ Yijia Xiao and Wanjia Zhao and Junkai Zhang and Yiqiao Jin and Han Zhang and Zhicheng Ren and Renliang Sun and Haixin Wang and Guancheng Wan and Pan Lu and Xiao Luo and Yu Zhang and James Zou and Yizhou Sun and Wei Wang }, journal={arXiv preprint arXiv:2502.17504}, year={ 2025 } }