You Can Have Your Data and Balance It Too: Towards Balanced and
Efficient Multilingual Models

You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models

13 October 2022

Tomasz Limisiewicz

Gabriel Stanovsky

Papers citing "You Can Have Your Data and Balance It Too: Towards Balanced and Efficient Multilingual Models"

8 / 8 papers shown

Title
Optimal word order for non-causal text generation with Large Language Models: the Spanish case Andrea Busto-Castiñeira Silvia García-Méndez Francisco de Arriba-Pérez Francisco J. González Castaño 41 0 0 21 Feb 2025
MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization Orevaoghene Ahia Sachin Kumar Hila Gonen Valentin Hoffman Tomasz Limisiewicz Yulia Tsvetkov Noah A. Smith 51 4 0 11 Jul 2024
A Representative Study on Human Detection of Artificially Generated Media Across Countries Joel Frank Franziska Herbert Jonas Ricker Lea Schonherr Thorsten Eisenhofer Asja Fischer Markus Dürmuth Thorsten Holz 38 13 0 10 Dec 2023
PuoBERTa: Training and evaluation of a curated language model for Setswana Vukosi Marivate Moseli Motsóehli Valencia Wagner Richard Lastrucci Isheanesu Dzingirai 27 8 0 13 Oct 2023
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 226 405 0 24 Feb 2021
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models Phillip Rust Jonas Pfeiffer Ivan Vulić Sebastian Ruder Iryna Gurevych 80 235 0 31 Dec 2020
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models Benjamin Muller Antonis Anastasopoulos Benoît Sagot Djamé Seddah LRM 134 165 0 24 Oct 2020
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,746 0 26 Sep 2016