ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.05791
  4. Cited By
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from
  Wikipedia

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

10 July 2019
Holger Schwenk
Vishrav Chaudhary
Shuo Sun
Hongyu Gong
Francisco Guzmán
    CVBM
ArXivPDFHTML

Papers citing "WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia"

50 / 225 papers shown
Title
Leveraging LLMs for Synthesizing Training Data Across Many Languages in
  Multilingual Dense Retrieval
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
Nandan Thakur
Jianmo Ni
Gustavo Hernández Ábrego
John Wieting
Jimmy J. Lin
Daniel Cer
RALM
49
12
0
10 Nov 2023
Memorisation Cartography: Mapping out the Memorisation-Generalisation
  Continuum in Neural Machine Translation
Memorisation Cartography: Mapping out the Memorisation-Generalisation Continuum in Neural Machine Translation
Verna Dankers
Ivan Titov
Dieuwke Hupkes
43
5
0
09 Nov 2023
NusaWrites: Constructing High-Quality Corpora for Underrepresented and
  Extremely Low-Resource Languages
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Dea Adhista
Emmanuel Dave
...
Genta Indra Winata
David Moeljadi
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
54
7
0
19 Sep 2023
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large
  Language Models in 167 Languages
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Thuat Nguyen
Chien Van Nguyen
Viet Dac Lai
Hieu Man
Nghia Trung Ngo
Franck Dernoncourt
Ryan A. Rossi
Thien Huu Nguyen
45
97
0
17 Sep 2023
X-PARADE: Cross-Lingual Textual Entailment and Information Divergence
  across Paragraphs
X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs
Juan Diego Rodriguez
Katrin Erk
Greg Durrett
48
4
0
16 Sep 2023
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Sneha Kudugunta
Isaac Caswell
Biao Zhang
Xavier Garcia
Christopher A. Choquette-Choo
...
Derrick Xin
Aditya Kusupati
Romi Stella
Ankur Bapna
Orhan Firat
73
120
0
09 Sep 2023
Differential Privacy, Linguistic Fairness, and Training Data Influence:
  Impossibility and Possibility Theorems for Multilingual Language Models
Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language Models
Phillip Rust
Anders Søgaard
33
3
0
17 Aug 2023
Extrapolating Large Language Models to Non-English by Aligning Languages
Extrapolating Large Language Models to Non-English by Aligning Languages
Wenhao Zhu
Yunzhe Lv
Qingxiu Dong
Fei Yuan
Jingjing Xu
Shujian Huang
Lingpeng Kong
Jiajun Chen
Lei Li
45
66
0
09 Aug 2023
Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages
Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages
Yasmine Karoui
R. Lebret
Negar Foroutan
Karl Aberer
MLLM
VLM
40
1
0
29 Jun 2023
xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource
  Languages
xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages
Mingda Chen
Kevin Heffernan
Onur cCelebi
Alexandre Mourachko
Holger Schwenk
34
3
0
22 Jun 2023
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language
  Representations
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations
Gregor Geigle
Radu Timofte
Goran Glavaš
VLM
MLLM
36
5
0
14 Jun 2023
Learning Multilingual Sentence Representations with Cross-lingual
  Consistency Regularization
Learning Multilingual Sentence Representations with Cross-lingual Consistency Regularization
Pengzhi Gao
Liwen Zhang
Zhongjun He
Hua Wu
Haifeng Wang
35
6
0
12 Jun 2023
Eliciting the Translation Ability of Large Language Models via
  Multilingual Finetuning with Translation Instructions
Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions
Jiahuan Li
Hao Zhou
Shujian Huang
Shan Chen
Jiajun Chen
LRM
41
55
0
24 May 2023
LIMIT: Language Identification, Misidentification, and Translation using
  Hierarchical Models in 350+ Languages
LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ Languages
M. Agarwal
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
38
5
0
23 May 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500
  Languages
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
Ayyoob Imani
Peiqin Lin
Amir Hossein Kargaran
Silvia Severini
Masoud Jalili Sabet
...
Chunlan Ma
Helmut Schmid
André F. T. Martins
François Yvon
Hinrich Schütze
ALM
LRM
49
96
0
20 May 2023
Pseudo-Label Training and Model Inertia in Neural Machine Translation
Pseudo-Label Training and Model Inertia in Neural Machine Translation
B. Hsu
Anna Currey
Xing Niu
Maria Nuadejde
Georgiana Dinu
ODL
58
2
0
19 May 2023
ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores
  Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource
  Languages
ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource Languages
Sourojit Ghosh
Aylin Caliskan
41
69
0
17 May 2023
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
RC3: Regularized Contrastive Cross-lingual Cross-modal Pre-training
Chulun Zhou
Yunlong Liang
Fandong Meng
Jinan Xu
Jinsong Su
Jie Zhou
VLM
23
4
0
13 May 2023
A General-Purpose Multilingual Document Encoder
A General-Purpose Multilingual Document Encoder
Onur Galoglu
Robert Litschko
Goran Glavaš
37
2
0
11 May 2023
Escaping the sentence-level paradigm in machine translation
Escaping the sentence-level paradigm in machine translation
Matt Post
Marcin Junczys-Dowmunt
33
26
0
25 Apr 2023
Low-resource Bilingual Dialect Lexicon Induction with Large Language
  Models
Low-resource Bilingual Dialect Lexicon Induction with Large Language Models
Ekaterina Artemova
Barbara Plank
34
1
0
19 Apr 2023
A Survey of Corpora for Germanic Low-Resource Languages and Dialects
A Survey of Corpora for Germanic Low-Resource Languages and Dialects
Verena Blaschke
Hinrich Schütze
Barbara Plank
27
13
0
19 Apr 2023
Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine
  Translation
Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation
Alex Jones
Isaac Caswell
Ishan Saxena
Orhan Firat
23
9
0
27 Mar 2023
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation
Can Qin
Ning Yu
Chen Xing
Shu Zhen Zhang
Zeyuan Chen
Stefano Ermon
Yun Fu
Caiming Xiong
Ran Xu
DiffM
50
20
0
17 Mar 2023
LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with
  Knowledge Distillation
LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation
Zhuoyuan Mao
Tetsuji Nakagawa
FedML
19
19
0
16 Feb 2023
Understanding and Detecting Hallucinations in Neural Machine Translation
  via Model Introspection
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection
Weijia Xu
Sweta Agrawal
Eleftheria Briakou
Marianna J. Martindale
Marine Carpuat
HILM
27
47
0
18 Jan 2023
Prompting Large Language Model for Machine Translation: A Case Study
Prompting Large Language Model for Machine Translation: A Case Study
Biao Zhang
Barry Haddow
Alexandra Birch
LRM
32
278
0
17 Jan 2023
Countering Malicious Content Moderation Evasion in Online Social
  Networks: Simulation and Detection of Word Camouflage
Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage
Álvaro Huertas-García
Alejandro Martín
Javier Huertas-Tato
David Camacho
34
9
0
27 Dec 2022
SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic
  Mistakes
SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic Mistakes
Wenda Xu
Xian Qian
Mingxuan Wang
Lei Li
William Yang Wang
23
10
0
19 Dec 2022
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for
  Programming Languages
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages
Yekun Chai
Shuohuan Wang
Chao Pang
Yu Sun
Hao Tian
Hua Wu
38
36
0
13 Dec 2022
Towards a general purpose machine translation system for Sranantongo
Towards a general purpose machine translation system for Sranantongo
Just Zwennicker
David Stap
30
4
0
13 Dec 2022
T5Score: Discriminative Fine-tuning of Generative Evaluation Metrics
T5Score: Discriminative Fine-tuning of Generative Evaluation Metrics
Yiwei Qin
Weizhe Yuan
Graham Neubig
Pengfei Liu
17
23
0
12 Dec 2022
Improving Simultaneous Machine Translation with Monolingual Data
Improving Simultaneous Machine Translation with Monolingual Data
Hexuan Deng
Liang Ding
Xuebo Liu
Meishan Zhang
Dacheng Tao
Min Zhang
40
12
0
02 Dec 2022
CUNI Systems for the WMT22 Czech-Ukrainian Translation Task
CUNI Systems for the WMT22 Czech-Ukrainian Translation Task
Martin Popel
Jindrich Libovický
Jindřich Helcl
27
4
0
01 Dec 2022
Beyond Counting Datasets: A Survey of Multilingual Dataset Construction
  and Necessary Resources
Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources
Xinyan Velocity Yu
Akari Asai
Trina Chatterjee
Junjie Hu
Eunsol Choi
29
21
0
28 Nov 2022
Frustratingly Easy Label Projection for Cross-lingual Transfer
Frustratingly Easy Label Projection for Cross-lingual Transfer
Yang Chen
Chao Jiang
Alan Ritter
Wei Xu
27
31
0
28 Nov 2022
TSMind: Alibaba and Soochow University's Submission to the WMT22
  Translation Suggestion Task
TSMind: Alibaba and Soochow University's Submission to the WMT22 Translation Suggestion Task
Xin Ge
Ke Min Wang
Jiayi Wang
Nini Xiao
Xiangyu Duan
Yu Zhao
Yuqi Zhang
35
2
0
16 Nov 2022
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for
  Understanding and Generation
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation
Bin Shan
Yaqian Han
Weichong Yin
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
MLLM
VLM
24
7
0
09 Nov 2022
Learning an Artificial Language for Knowledge-Sharing in Multilingual
  Translation
Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation
Danni Liu
Jan Niehues
21
5
0
02 Nov 2022
Very Low Resource Sentence Alignment: Luhya and Swahili
Very Low Resource Sentence Alignment: Luhya and Swahili
E. Chimoto
Bruce A. Bassett
CVBM
21
10
0
31 Oct 2022
Improving Zero-Shot Multilingual Translation with Universal
  Representations and Cross-Mappings
Improving Zero-Shot Multilingual Translation with Universal Representations and Cross-Mappings
Shuhao Gu
Yang Feng
27
11
0
28 Oct 2022
Beyond English-Centric Bitexts for Better Multilingual Language
  Representation Learning
Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
Barun Patra
Saksham Singhal
Shaohan Huang
Zewen Chi
Li Dong
Furu Wei
Vishrav Chaudhary
Xia Song
71
23
0
26 Oct 2022
Leveraging Affirmative Interpretations from Negation Improves Natural
  Language Understanding
Leveraging Affirmative Interpretations from Negation Improves Natural Language Understanding
Md Mosharaf Hossain
Eduardo Blanco
50
4
0
26 Oct 2022
RuCoLA: Russian Corpus of Linguistic Acceptability
RuCoLA: Russian Corpus of Linguistic Acceptability
Vladislav Mikhailov
T. Shamardina
Max Ryabinin
A. Pestova
I. Smurov
Ekaterina Artemova
32
28
0
23 Oct 2022
AfroLID: A Neural Language Identification Tool for African Languages
AfroLID: A Neural Language Identification Tool for African Languages
Ife Adebara
AbdelRahim Elmadany
Muhammad Abdul-Mageed
Alcides Alcoba Inciarte
36
30
0
21 Oct 2022
The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared
  Task (MixMT)
The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared Task (MixMT)
Faheem Kirefu
Vivek Iyer
Pinzhen Chen
Laurie Burchell
MoE
28
1
0
20 Oct 2022
Separating Grains from the Chaff: Using Data Filtering to Improve
  Multilingual Translation for Low-Resourced African Languages
Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages
Idris Abdulmumin
Michael Beukman
Jesujoba Oluwadara Alabi
Chris C. Emezue
Everlyn Asiko
...
Shamsuddeen Hassan Muhammad
Mofetoluwa Adeyemi
Oreen Yousuf
Sahib Singh
T. Gwadabe
34
8
0
19 Oct 2022
Language Agnostic Multilingual Information Retrieval with Contrastive
  Learning
Language Agnostic Multilingual Information Retrieval with Contrastive Learning
Xiyang Hu
Xinchi Chen
Peng Qi
Deguang Kong
Kun Liu
William Yang Wang
Zhiheng Huang
20
8
0
12 Oct 2022
Measuring Fine-Grained Semantic Equivalence with Abstract Meaning
  Representation
Measuring Fine-Grained Semantic Equivalence with Abstract Meaning Representation
Shira Wein
Zhuxin Wang
Nathan Schneider
19
2
0
06 Oct 2022
Relative representations enable zero-shot latent space communication
Relative representations enable zero-shot latent space communication
Luca Moschella
Valentino Maiorca
Marco Fumero
Antonio Norelli
Francesco Locatello
Emanuele Rodolà
29
97
0
30 Sep 2022
Previous
12345
Next