ModernGBERT: German-only 1B Encoder Model Trained from Scratch

19 May 2025

Abstract

Despite the prominence of decoder-only language models, encoders remain crucial for resource-constrained applications. We introduce ModernGBERT (134M, 1B), a fully transparent family of German encoder models trained from scratch, incorporating architectural innovations from ModernBERT. To evaluate the practical trade-offs of training encoders from scratch, we also present LLäMmlein2Vec (120M, 1B, 7B), a family of encoders derived from German decoder-only models via LLM2Vec. We benchmark all models on natural language understanding, text embedding, and long-context reasoning tasks, enabling a controlled comparison between dedicated encoders and converted decoders. Our results show that ModernGBERT 1B outperforms prior state-of-the-art German encoders as well as encoders adapted via LLM2Vec, with regard to performance and parameter-efficiency. All models, training data, checkpoints and code are publicly available, advancing the German NLP ecosystem with transparent, high-performance encoder models.

View on arXiv

@article{ehrmanntraut2025_2505.13136,
  title={ ModernGBERT: German-only 1B Encoder Model Trained from Scratch },
  author={ Anton Ehrmanntraut and Julia Wunderle and Jan Pfister and Fotis Jannidis and Andreas Hotho },
  journal={arXiv preprint arXiv:2505.13136},
  year={ 2025 }
}

Comments on this paper