Ace-CEFR -- A Dataset for Automated Evaluation of the Linguistic Difficulty of Conversational Texts for LLM Applications

Main:8 Pages
2 Figures
Bibliography:1 Pages
5 Tables
Appendix:3 Pages
Abstract
There is an unmet need to evaluate the language difficulty of short, conversational passages of text, particularly for training and filtering Large Language Models (LLMs). We introduce Ace-CEFR, a dataset of English conversational text passages expert-annotated with their corresponding level of text difficulty. We experiment with several models on Ace-CEFR, including Transformer-based models and LLMs. We show that models trained on Ace-CEFR can measure text difficulty more accurately than human experts and have latency appropriate to production environments. Finally, we release the Ace-CEFR dataset to the public for research and development.
View on arXiv@article{kogan2025_2506.14046, title={ Ace-CEFR -- A Dataset for Automated Evaluation of the Linguistic Difficulty of Conversational Texts for LLM Applications }, author={ David Kogan and Max Schumacher and Sam Nguyen and Masanori Suzuki and Melissa Smith and Chloe Sophia Bellows and Jared Bernstein }, journal={arXiv preprint arXiv:2506.14046}, year={ 2025 } }
Comments on this paper