Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.05791
Cited By
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
10 July 2019
Holger Schwenk
Vishrav Chaudhary
Shuo Sun
Hongyu Gong
Francisco Guzmán
CVBM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia"
25 / 225 papers shown
Title
Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords
Tom Kocmi
Martin Popel
Ondrej Bojar
11
38
0
06 Jul 2020
TICO-19: the Translation Initiative for Covid-19
Antonios Anastasopoulos
A. Cattelan
Zi-Yi Dou
Marcello Federico
C. Federman
...
Mengmeng Niu
A. Oktem
Eric Paquin
G. Tang
Sylwia Tur
24
90
0
03 Jul 2020
Unsupervised Quality Estimation for Neural Machine Translation
M. Fomicheva
Shuo Sun
Lisa Yankovskaya
Frédéric Blain
Francisco Guzmán
Mark Fishel
Nikolaos Aletras
Vishrav Chaudhary
Lucia Specia
UQLM
20
184
0
21 May 2020
Parallel Corpus Filtering via Pre-trained Language Models
Boliang Zhang
Ajay Nagesh
Kevin Knight
30
31
0
13 May 2020
Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction
C. España-Bonet
Alberto Barrón-Cedeño
Lluís Marquez
11
9
0
03 May 2020
Predicting Performance for Natural Language Processing Tasks
Mengzhou Xia
Antonios Anastasopoulos
Ruochen Xu
Yiming Yang
Graham Neubig
25
59
0
02 May 2020
MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases
Louis Martin
Angela Fan
Eric Villemonte de la Clergerie
Antoine Bordes
Benoît Sagot
28
36
0
01 May 2020
A Call for More Rigor in Unsupervised Cross-lingual Learning
Mikel Artetxe
Sebastian Ruder
Dani Yogatama
Gorka Labaka
Eneko Agirre
18
72
0
30 Apr 2020
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing
Brian Thompson
Matt Post
LRM
19
188
0
30 Apr 2020
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
Nils Reimers
Iryna Gurevych
42
1,000
0
21 Apr 2020
SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings
Masoud Jalili Sabet
Philipp Dufter
François Yvon
Hinrich Schütze
23
228
0
18 Apr 2020
Translation Artifacts in Cross-lingual Transfer Learning
Mikel Artetxe
Gorka Labaka
Eneko Agirre
27
115
0
09 Apr 2020
Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation
Dana Ruiter
Josef van Genabith
C. España-Bonet
SSL
26
3
0
07 Apr 2020
Detecting and Understanding Generalization Barriers for Neural Machine Translation
Guanlin Li
Lemao Liu
Conghui Zhu
Tiejun Zhao
Shuming Shi
28
0
0
05 Apr 2020
Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech
Mihir Kale
Scott Roy
14
14
0
05 Apr 2020
PMIndia -- A Collection of Parallel Corpora of Languages of India
Barry Haddow
Faheem Kirefu
19
102
0
27 Jan 2020
A Comprehensive Survey of Multilingual Neural Machine Translation
Raj Dabre
Chenhui Chu
Anoop Kunchukuttan
LRM
36
33
0
04 Jan 2020
Automatic Spanish Translation of the SQuAD Dataset for Multilingual Question Answering
C. Carrino
Marta R. Costa-jussá
José A. R. Fonollosa
6
88
0
11 Dec 2019
GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies
Marta R. Costa-jussá
P. Lin
C. España-Bonet
SyDa
31
24
0
10 Dec 2019
JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus
Makoto Morishita
Jun Suzuki
Masaaki Nagata
LRM
38
64
0
25 Nov 2019
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
Holger Schwenk
Guillaume Wenzek
Sergey Edunov
Edouard Grave
Armand Joulin
33
256
0
10 Nov 2019
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
Ahmed El-Kishky
Vishrav Chaudhary
Francisco Guzman
Philipp Koehn
28
198
0
10 Nov 2019
Should All Cross-Lingual Embeddings Speak English?
Antonios Anastasopoulos
Graham Neubig
19
31
0
08 Nov 2019
LibriVoxDeEn: A Corpus for German-to-English Speech Translation and German Speech Recognition
Benjamin Beilharz
Xin Sun
Sariya Karimova
Stefan Riezler
8
28
0
17 Oct 2019
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible
Marcely Zanon Boito
William N. Havard
Mahault Garnerin
Éric Le Ferrand
Laurent Besacier
32
47
0
30 Jul 2019
Previous
1
2
3
4
5