ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.21265
77
0

Multilingual Pretraining for Pixel Language Models

27 May 2025
Ilker Kesen
Jonas F. Lotz
Ingo Ziegler
Phillip Rust
Desmond Elliott
    MLLMVLM
ArXiv (abs)PDFHTML
Main:7 Pages
20 Figures
Bibliography:4 Pages
7 Tables
Appendix:6 Pages
Abstract

Pixel language models operate directly on images of rendered text, eliminating the need for a fixed vocabulary. While these models have demonstrated strong capabilities for downstream cross-lingual transfer, multilingual pretraining remains underexplored. We introduce PIXEL-M4, a model pretrained on four visually and linguistically diverse languages: English, Hindi, Ukrainian, and Simplified Chinese. Multilingual evaluations on semantic and syntactic tasks show that PIXEL-M4 outperforms an English-only counterpart on non-Latin scripts. Word-level probing analyses confirm that PIXEL-M4 captures rich linguistic features, even in languages not seen during pretraining. Furthermore, an analysis of its hidden representations shows that multilingual pretraining yields a semantic embedding space closely aligned across the languages used for pretraining. This work demonstrates that multilingual pretraining substantially enhances the capability of pixel language models to effectively support a diverse set of languages.

View on arXiv
@article{kesen2025_2505.21265,
  title={ Multilingual Pretraining for Pixel Language Models },
  author={ Ilker Kesen and Jonas F. Lotz and Ingo Ziegler and Phillip Rust and Desmond Elliott },
  journal={arXiv preprint arXiv:2505.21265},
  year={ 2025 }
}
Comments on this paper