Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.08110
Cited By
Language Contamination Helps Explain the Cross-lingual Capabilities of English Pretrained Models
17 April 2022
Terra Blevins
Luke Zettlemoyer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Contamination Helps Explain the Cross-lingual Capabilities of English Pretrained Models"
50 / 67 papers shown
Title
Enhancing LLM Language Adaption through Cross-lingual In-Context Pre-training
Linjuan Wu
Haoran Wei
Huan Lin
Tianhao Li
Baosong Yang
Weiming Lu
38
0
0
29 Apr 2025
Kuwain 1.5B: An Arabic SLM via Language Injection
Khalil Hennara
Sara Chrouf
Mohamed Motaism Hamed
Zeina Aldallal
Omar Hadid
Safwan AlModhayan
34
1
0
21 Apr 2025
Developing and Utilizing a Large-Scale Cantonese Dataset for Multi-Tasking in Large Language Models
Jiyue Jiang
Alfred Kar Yin Truong
Yuxiao Chen
Qinghang Bao
Sheng Wang
Pengan Chen
J. T. Wang
Lingpeng Kong
Yu Li
Chuan Wu
ALM
61
0
0
05 Mar 2025
Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs
Jonathan Rystrøm
Hannah Rose Kirk
Scott A. Hale
46
2
0
23 Feb 2025
Lessons From Red Teaming 100 Generative AI Products
Blake Bullwinkel
Amanda Minnich
Shiven Chawla
Gary Lopez
Martin Pouliot
...
Pete Bryan
Ram Shankar Siva Kumar
Yonatan Zunger
Chang Kawaguchi
Mark Russinovich
AAML
VLM
37
5
0
13 Jan 2025
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Angelika Romanou
Negar Foroutan
Anna Sotnikova
Zeming Chen
Sree Harsha Nelaturu
...
Mike Zhang
Imanol Schlag
Marzieh Fadaee
Sara Hooker
Antoine Bosselut
ELM
113
6
0
29 Nov 2024
A Practical Guide to Fine-tuning Language Models with Limited Data
Márton Szép
Daniel Rueckert
Rüdiger von Eisenhart-Rothe
Florian Hinterwimmer
SyDa
ALM
49
2
0
14 Nov 2024
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages
Amir Hossein Kargaran
François Yvon
Hinrich Schutze
VLM
36
5
0
31 Oct 2024
Towards Robust Knowledge Representations in Multilingual LLMs for Equivalence and Inheritance based Consistent Reasoning
Gaurav Arora
Srujana Merugu
Shreya Jain
Vaibhav Saxena
LRM
32
0
0
18 Oct 2024
Converging to a Lingua Franca: Evolution of Linguistic Regions and Semantics Alignment in Multilingual Large Language Models
Hongchuan Zeng
Senyu Han
Lu Chen
Kai Yu
62
6
0
15 Oct 2024
Unsupervised Data Validation Methods for Efficient Model Training
Yurii Paniv
37
1
0
10 Oct 2024
Goldfish: Monolingual Language Models for 350 Languages
Tyler A. Chang
Catherine Arnett
Zhuowen Tu
Benjamin Bergen
LRM
44
4
0
19 Aug 2024
Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens
Anqi Zhang
Chaofeng Wu
36
5
0
30 Jul 2024
Why do LLaVA Vision-Language Models Reply to Images in English?
Musashi Hinck
Carolin Holtermann
M. L. Olson
Florian Schneider
Sungduk Yu
Anahita Bhiwandiwalla
Anne Lauscher
Shaoyen Tseng
Vasudev Lal
VLM
40
5
0
02 Jul 2024
Understanding and Mitigating Language Confusion in LLMs
Kelly Marchisio
Wei-Yin Ko
Alexandre Berard
Théo Dehaze
Sebastian Ruder
58
23
0
28 Jun 2024
modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models
Nathan A. Chi
Teodor Malchev
Riley Kong
Ryan A. Chi
Lucas Huang
Ethan A. Chi
R. Thomas McCoy
Dragomir R. Radev
LRM
43
8
0
24 Jun 2024
Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation
Chunyuan Deng
Yilun Zhao
Yuzhao Heng
Yitong Li
Jiannan Cao
Xiangru Tang
Arman Cohan
35
13
0
20 Jun 2024
PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes
He Cao
Yanjun Shao
Zhiyuan Liu
Zijing Liu
Xiangru Tang
Yuan Yao
Yu Li
AI4CE
35
5
0
19 Jun 2024
Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages
Fabian David Schmidt
Philipp Borchert
Ivan Vulić
Goran Glavaš
42
5
0
18 Jun 2024
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
Amith Ananthram
Elias Stengel-Eskin
Carl Vondrick
Joey Tianyi Zhou
VLM
42
0
0
17 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
39
0
06 Jun 2024
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights
Wenhao Zhu
Shujian Huang
Fei Yuan
Cheng Chen
Jiajun Chen
Alexandra Birch
LRM
49
5
0
02 May 2024
TIGQA:An Expert Annotated Question Answering Dataset in Tigrinya
Hailay Teklehaymanot
Dren Fazlija
Niloy Ganguly
Gourab K. Patro
Wolfgang Nejdl
34
0
0
26 Apr 2024
CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment
Geyu Lin
Bin Wang
Zhengyuan Liu
Nancy F. Chen
37
7
0
18 Apr 2024
Measuring Cross-lingual Transfer in Bytes
Leandro Rodrigues de Souza
Thales Sales Almeida
R.A. Lotufo
Rodrigo Nogueira
CLL
32
3
0
12 Apr 2024
The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments
Anton Schäfer
Shauli Ravfogel
Thomas Hofmann
Tiago Pimentel
Imanol Schlag
63
3
0
11 Apr 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Hai-Tao Zheng
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
55
36
0
07 Apr 2024
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
Tomasz Limisiewicz
Terra Blevins
Hila Gonen
Orevaoghene Ahia
Luke Zettlemoyer
30
13
0
15 Mar 2024
Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models
Martin Riddell
Ansong Ni
Arman Cohan
ELM
42
29
0
06 Mar 2024
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
Carolin Holtermann
Paul Röttger
Timm Dill
Anne Lauscher
ELM
LRM
40
22
0
06 Mar 2024
Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet?
E. Razumovskaia
Ivan Vulić
Anna Korhonen
46
6
0
04 Mar 2024
Zero-shot cross-lingual transfer in instruction tuning of large language models
Nadezhda Chirkova
Vassilina Nikoulina
LRM
43
3
0
22 Feb 2024
A Note on Bias to Complete
Jia Xu
Mona Diab
49
2
0
18 Feb 2024
Cross-Lingual Transfer from Related Languages: Treating Low-Resource Maltese as Multilingual Code-Switching
Kurt Micallef
Nizar Habash
Claudia Borg
Fadhl Eryani
Houda Bouamor
31
2
0
30 Jan 2024
Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
Terra Blevins
Tomasz Limisiewicz
Suchin Gururangan
Margaret Li
Hila Gonen
Noah A. Smith
Luke Zettlemoyer
50
22
0
19 Jan 2024
Question Translation Training for Better Multilingual Reasoning
Wenhao Zhu
Shujian Huang
Fei Yuan
Shuaijie She
Jiajun Chen
Alexandra Birch
LRM
23
29
0
15 Jan 2024
Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?
Tannon Kew
Florian Schottmann
Rico Sennrich
LRM
28
36
0
20 Dec 2023
Investigating Data Contamination in Modern Benchmarks for Large Language Models
Chunyuan Deng
Yilun Zhao
Xiangru Tang
Mark B. Gerstein
Arman Cohan
AAML
ELM
27
52
0
16 Nov 2023
Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models
J. Michaelov
Catherine Arnett
Tyler A. Chang
Benjamin Bergen
36
12
0
15 Nov 2023
Data Similarity is Not Enough to Explain Language Model Performance
Gregory Yauney
Emily Reif
David M. Mimno
47
6
0
15 Nov 2023
Empowering Multi-step Reasoning across Languages via Tree-of-Thoughts
Leonardo Ranaldi
Giulia Pucci
Federico Ranaldi
Elena Sofia Ruzzetti
Fabio Massimo Zanzotto
LRM
29
12
0
14 Nov 2023
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Naomi Saphra
Eve Fleisig
Kyunghyun Cho
Adam Lopez
LRM
30
8
0
08 Nov 2023
Continual Learning Under Language Shift
Evangelia Gogoulou
Timothée Lesort
Magnus Boman
Joakim Nivre
KELM
CLL
32
3
0
02 Nov 2023
GlotLID: Language Identification for Low-Resource Languages
Amir Hossein Kargaran
Ayyoob Imani
François Yvon
Hinrich Schütze
27
10
0
24 Oct 2023
Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages
Libo Qin
Qiguang Chen
Fuxuan Wei
Shijue Huang
Wanxiang Che
LRM
32
75
0
23 Oct 2023
Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models
Jirui Qi
Raquel Fernández
Arianna Bisazza
KELM
HILM
27
60
0
16 Oct 2023
Crosslingual Structural Priming and the Pre-Training Dynamics of Bilingual Language Models
Catherine Arnett
Tyler A. Chang
J. Michaelov
Benjamin Bergen
19
0
0
11 Oct 2023
Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca
Pinzhen Chen
Shaoxiong Ji
Nikolay Bogoychev
Andrey Kutuzov
Barry Haddow
Kenneth Heafield
31
45
0
16 Sep 2023
Empowering Cross-lingual Abilities of Instruction-tuned Large Language Models by Translation-following demonstrations
Leonardo Ranaldi
Giulia Pucci
André Freitas
35
33
0
27 Aug 2023
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Anna Rogers
A. Luccioni
53
19
0
14 Aug 2023
1
2
Next