ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.05791
  4. Cited By
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from
  Wikipedia

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

10 July 2019
Holger Schwenk
Vishrav Chaudhary
Shuo Sun
Hongyu Gong
Francisco Guzmán
    CVBM
ArXivPDFHTML

Papers citing "WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia"

50 / 225 papers shown
Title
A Massively Multilingual Analysis of Cross-linguality in Shared
  Embedding Space
A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space
Alex Jones
Wenjie Wang
Kyle Mahowald
31
8
0
13 Sep 2021
The Grammar-Learning Trajectories of Neural Language Models
The Grammar-Learning Trajectories of Neural Language Models
Leshem Choshen
Guy Hacohen
D. Weinshall
Omri Abend
31
28
0
13 Sep 2021
MURAL: Multimodal, Multitask Retrieval Across Languages
MURAL: Multimodal, Multitask Retrieval Across Languages
Aashi Jain
Mandy Guo
Krishna Srinivasan
Ting-Li Chen
Sneha Kudugunta
Chao Jia
Yinfei Yang
Jason Baldridge
VLM
115
52
0
10 Sep 2021
Survey of Low-Resource Machine Translation
Survey of Low-Resource Machine Translation
Barry Haddow
Rachel Bawden
Antonio Valerio Miceli Barone
Jindvrich Helcl
Alexandra Birch
AIMat
45
150
0
01 Sep 2021
AMMUS : A Survey of Transformer-based Pretrained Models in Natural
  Language Processing
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
VLM
LM&MA
31
261
0
12 Aug 2021
Machine Translation of Low-Resource Indo-European Languages
Machine Translation of Low-Resource Indo-European Languages
Wei-Rui Chen
Muhammad Abdul-Mageed
22
3
0
08 Aug 2021
The USYD-JD Speech Translation System for IWSLT 2021
The USYD-JD Speech Translation System for IWSLT 2021
Liang Ding
Di Wu
Dacheng Tao
37
16
0
24 Jul 2021
Integrating Unsupervised Data Generation into Self-Supervised Neural
  Machine Translation for Low-Resource Languages
Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages
Dana Ruiter
Dietrich Klakow
Josef van Genabith
C. España-Bonet
31
9
0
19 Jul 2021
As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical
  Translation
As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical Translation
Jun Wang
Chang Xu
Francisco Guzman
Ahmed El-Kishky
Benjamin I. P. Rubinstein
Trevor Cohn
32
10
0
18 Jul 2021
Putting words into the system's mouth: A targeted attack on neural
  machine translation using monolingual data poisoning
Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning
Jun Wang
Chang Xu
Francisco Guzman
Ahmed El-Kishky
Yuqing Tang
Benjamin I. P. Rubinstein
Trevor Cohn
AAML
SILM
19
33
0
12 Jul 2021
A Survey on Low-Resource Neural Machine Translation
A Survey on Low-Resource Neural Machine Translation
Rui Wang
Xu Tan
Renqian Luo
Tao Qin
Tie-Yan Liu
3DV
40
58
0
09 Jul 2021
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
Zewen Chi
Shaohan Huang
Li Dong
Shuming Ma
Bo Zheng
...
Payal Bajaj
Xia Song
Xian-Ling Mao
Heyan Huang
Furu Wei
56
119
0
30 Jun 2021
IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task
IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task
Pavel Denisov
Manuel Mager
Ngoc Thang Vu
37
6
0
30 Jun 2021
Neural Machine Translation for Low-Resource Languages: A Survey
Neural Machine Translation for Low-Resource Languages: A Survey
Surangika Ranathunga
E. Lee
Marjana Prifti Skenduli
Ravi Shekhar
Mehreen Alam
Rishemjit Kaur
42
237
0
29 Jun 2021
Machine Translation into Low-resource Language Varieties
Machine Translation into Low-resource Language Varieties
Sachin Kumar
Antonios Anastasopoulos
S. Wintner
Yulia Tsvetkov
11
29
0
12 Jun 2021
Exploiting Parallel Corpora to Improve Multilingual Embedding based
  Document and Sentence Alignment
Exploiting Parallel Corpora to Improve Multilingual Embedding based Document and Sentence Alignment
Dilan Sachintha
Lakmali Piyarathna
Charith Rajitha
Surangika Ranathunga
27
3
0
12 Jun 2021
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word
  Alignment
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment
Zewen Chi
Li Dong
Bo Zheng
Shaohan Huang
Xian-Ling Mao
Heyan Huang
Furu Wei
45
67
0
11 Jun 2021
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual
  Machine Translation
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
Naman Goyal
Cynthia Gao
Vishrav Chaudhary
Peng-Jen Chen
Guillaume Wenzek
Da Ju
Sanjan Krishnan
MarcÁurelio Ranzato
Francisco Guzman
Angela Fan
15
564
0
06 Jun 2021
Language Scaling for Universal Suggested Replies Model
Language Scaling for Universal Suggested Replies Model
Qianlan Ying
Payal Bajaj
Budhaditya Deb
Yu Yang
Wei Wang
Bojia Lin
Milad Shokouhi
Xia Song
Yang Yang
Daxin Jiang
LRM
21
2
0
04 Jun 2021
Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences
  on Neural Machine Translation
Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences on Neural Machine Translation
Eleftheria Briakou
Marine Carpuat
16
13
0
31 May 2021
Paraphrastic Representations at Scale
Paraphrastic Representations at Scale
John Wieting
Kevin Gimpel
Graham Neubig
Taylor Berg-Kirkpatrick
24
19
0
30 Apr 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean
  Crawled Corpus
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
43
430
0
18 Apr 2021
MT6: Multilingual Pretrained Text-to-Text Transformer with Translation
  Pairs
MT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
Zewen Chi
Li Dong
Shuming Ma
Shaohan Huang Xian-Ling Mao
Heyan Huang
Furu Wei
LRM
53
72
0
18 Apr 2021
XLEnt: Mining a Large Cross-lingual Entity Dataset with
  Lexical-Semantic-Phonetic Word Alignment
XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment
Ahmed El-Kishky
Adithya Renduchintala
James Cross
Francisco Guzmán
Philipp Koehn
29
17
0
17 Apr 2021
"Wikily" Supervised Neural Translation Tailored to Cross-Lingual Tasks
"Wikily" Supervised Neural Translation Tailored to Cross-Lingual Tasks
Mohammad Sadegh Rasooli
Chris Callison-Burch
Derry Wijaya
CLIP
29
5
0
16 Apr 2021
Fast, Effective, and Self-Supervised: Transforming Masked Language
  Models into Universal Lexical and Sentence Encoders
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders
Fangyu Liu
Ivan Vulić
Anna Korhonen
Nigel Collier
VLM
OffRL
27
117
0
16 Apr 2021
Samanantar: The Largest Publicly Available Parallel Corpora Collection
  for 11 Indic Languages
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages
Gowtham Ramesh
Sumanth Doddapaneni
Aravinth Bheemaraj
Mayank Jobanputra
AK Raghavan
...
K. Deepak
Vivek Raghavan
Anoop Kunchukuttan
Pratyush Kumar
Mitesh Khapra
LRM
37
231
0
12 Apr 2021
Low-Resource Machine Translation Training Curriculum Fit for
  Low-Resource Languages
Low-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages
Garry Kuwanto
Afra Feyza Akyürek
Isidora Chara Tourni
Siyang Li
Alex Jones
Derry Wijaya
27
5
0
24 Mar 2021
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Julia Kreutzer
Isaac Caswell
Lisa Wang
Ahsan Wahab
D. Esch
...
Duygu Ataman
Orevaoghene Ahia
Oghenefego Ahia
Sweta Agrawal
Mofetoluwa Adeyemi
20
269
0
22 Mar 2021
Congolese Swahili Machine Translation for Humanitarian Response
Congolese Swahili Machine Translation for Humanitarian Response
A. Oktem
Eric DeLuca
Rodrigue Bashizi
Eric Paquin
G. Tang
11
5
0
19 Mar 2021
The Effect of Domain and Diacritics in Yorùbá-English Neural Machine
  Translation
The Effect of Domain and Diacritics in Yorùbá-English Neural Machine Translation
David Ifeoluwa Adelani
Dana Ruiter
Jesujoba Oluwadara Alabi
Damilola Adebonojo
Adesina Ayeni
Mofetoluwa Adeyemi
Ayodele Awokoya
C. España-Bonet
27
40
0
15 Mar 2021
Majority Voting with Bidirectional Pre-translation For Bitext Retrieval
Majority Voting with Bidirectional Pre-translation For Bitext Retrieval
Alex Jones
Derry Wijaya
22
6
0
10 Mar 2021
Unbiased Sentence Encoder For Large-Scale Multi-lingual Search Engines
Unbiased Sentence Encoder For Large-Scale Multi-lingual Search Engines
Mahdi Hajiaghayi
Monir Hajiaghayi
Mark R. Bolin
23
0
0
01 Mar 2021
Towards More Fine-grained and Reliable NLP Performance Prediction
Towards More Fine-grained and Reliable NLP Performance Prediction
Zihuiwen Ye
Pengfei Liu
Jinlan Fu
Graham Neubig
16
33
0
10 Feb 2021
Quality Estimation without Human-labeled Data
Quality Estimation without Human-labeled Data
Yi-Lin Tuan
Ahmed El-Kishky
Adithya Renduchintala
Vishrav Chaudhary
Francisco Guzmán
Lucia Specia
16
25
0
08 Feb 2021
Cross-Lingual Named Entity Recognition Using Parallel Corpus: A New
  Approach Using XLM-RoBERTa Alignment
Cross-Lingual Named Entity Recognition Using Parallel Corpus: A New Approach Using XLM-RoBERTa Alignment
Bing Li
Yujie He
Wenjin Xu
28
23
0
26 Jan 2021
Bilingual Lexicon Induction via Unsupervised Bitext Construction and
  Word Alignment
Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment
Freda Shi
Luke Zettlemoyer
Sida I. Wang
SSL
32
33
0
01 Jan 2021
ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual
  Semantics with Monolingual Corpora
ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora
Ouyang Xuan
Shuohuan Wang
Chao Pang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
62
100
0
31 Dec 2020
A Targeted Attack on Black-Box Neural Machine Translation with Parallel
  Data Poisoning
A Targeted Attack on Black-Box Neural Machine Translation with Parallel Data Poisoning
Chang Xu
Jun Wang
Yuqing Tang
Francisco Guzman
Benjamin I. P. Rubinstein
Trevor Cohn
AAML
28
7
0
02 Nov 2020
Tilde at WMT 2020: News Task Systems
Tilde at WMT 2020: News Task Systems
Rihards Krivslauks
Marcis Pinnis
VLM
31
3
0
29 Oct 2020
Beyond English-Centric Multilingual Machine Translation
Beyond English-Centric Multilingual Machine Translation
Angela Fan
Shruti Bhosale
Holger Schwenk
Zhiyi Ma
Ahmed El-Kishky
...
Vitaliy Liptchinsky
Sergey Edunov
Edouard Grave
Michael Auli
Armand Joulin
LRM
41
832
0
21 Oct 2020
Unsupervised Bitext Mining and Translation via Self-trained Contextual
  Embeddings
Unsupervised Bitext Mining and Translation via Self-trained Contextual Embeddings
Phillip Keung
Julian Salazar
Y. Lu
Noah A. Smith
SSL
27
25
0
15 Oct 2020
Asking Crowdworkers to Write Entailment Examples: The Best of Bad
  Options
Asking Crowdworkers to Write Entailment Examples: The Best of Bad Options
Clara Vania
Ruijie Chen
Samuel R. Bowman
20
10
0
13 Oct 2020
MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset
MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset
M. Fomicheva
Shuo Sun
E. Fonseca
Chrysoula Zerva
Frédéric Blain
Vishrav Chaudhary
Francisco Guzmán
Nina Lopatina
Lucia Specia
André F. T. Martins
29
67
0
09 Oct 2020
Detecting Fine-Grained Cross-Lingual Semantic Divergences without
  Supervision by Learning to Rank
Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank
Eleftheria Briakou
Marine Carpuat
18
25
0
07 Oct 2020
Harnessing Multilinguality in Unsupervised Machine Translation for Rare
  Languages
Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages
Xavier Garcia
Aditya Siddhant
Orhan Firat
Ankur P. Parikh
30
31
0
23 Sep 2020
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New
  Datasets for Bengali-English Machine Translation
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
Tahmid Hasan
Abhik Bhattacharjee
Kazi Samin Mubasshir
Masum Hasan
Madhusudan Basak
M. Rahman
Rifat Shahriyar
VLM
23
72
0
20 Sep 2020
Paraphrase Generation as Zero-Shot Multilingual Translation:
  Disentangling Semantic Similarity from Lexical and Syntactic Diversity
Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity
Brian Thompson
Matt Post
31
57
0
11 Aug 2020
Revisiting Low Resource Status of Indian Languages in Machine
  Translation
Revisiting Low Resource Status of Indian Languages in Machine Translation
Jerin Philip
Shashank Siripragada
Vinay P. Namboodiri
C. V. Jawahar
15
27
0
11 Aug 2020
A Multilingual Parallel Corpora Collection Effort for Indian Languages
A Multilingual Parallel Corpora Collection Effort for Indian Languages
Shashank Siripragrada
Jerin Philip
Vinay P. Namboodiri
C. V. Jawahar
VLM
32
47
0
15 Jul 2020
Previous
12345
Next