ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.05791
  4. Cited By
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from
  Wikipedia

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

10 July 2019
Holger Schwenk
Vishrav Chaudhary
Shuo Sun
Hongyu Gong
Francisco Guzmán
    CVBM
ArXivPDFHTML

Papers citing "WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia"

25 / 225 papers shown
Title
Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords
Announcing CzEng 2.0 Parallel Corpus with over 2 Gigawords
Tom Kocmi
Martin Popel
Ondrej Bojar
11
38
0
06 Jul 2020
TICO-19: the Translation Initiative for Covid-19
TICO-19: the Translation Initiative for Covid-19
Antonios Anastasopoulos
A. Cattelan
Zi-Yi Dou
Marcello Federico
C. Federman
...
Mengmeng Niu
A. Oktem
Eric Paquin
G. Tang
Sylwia Tur
24
90
0
03 Jul 2020
Unsupervised Quality Estimation for Neural Machine Translation
Unsupervised Quality Estimation for Neural Machine Translation
M. Fomicheva
Shuo Sun
Lisa Yankovskaya
Frédéric Blain
Francisco Guzmán
Mark Fishel
Nikolaos Aletras
Vishrav Chaudhary
Lucia Specia
UQLM
20
184
0
21 May 2020
Parallel Corpus Filtering via Pre-trained Language Models
Parallel Corpus Filtering via Pre-trained Language Models
Boliang Zhang
Ajay Nagesh
Kevin Knight
30
31
0
13 May 2020
Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora
  Extraction
Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction
C. España-Bonet
Alberto Barrón-Cedeño
Lluís Marquez
11
9
0
03 May 2020
Predicting Performance for Natural Language Processing Tasks
Predicting Performance for Natural Language Processing Tasks
Mengzhou Xia
Antonios Anastasopoulos
Ruochen Xu
Yiming Yang
Graham Neubig
25
59
0
02 May 2020
MUSS: Multilingual Unsupervised Sentence Simplification by Mining
  Paraphrases
MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases
Louis Martin
Angela Fan
Eric Villemonte de la Clergerie
Antoine Bordes
Benoît Sagot
28
36
0
01 May 2020
A Call for More Rigor in Unsupervised Cross-lingual Learning
A Call for More Rigor in Unsupervised Cross-lingual Learning
Mikel Artetxe
Sebastian Ruder
Dani Yogatama
Gorka Labaka
Eneko Agirre
18
72
0
30 Apr 2020
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot
  Paraphrasing
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing
Brian Thompson
Matt Post
LRM
19
188
0
30 Apr 2020
Making Monolingual Sentence Embeddings Multilingual using Knowledge
  Distillation
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
Nils Reimers
Iryna Gurevych
42
1,000
0
21 Apr 2020
SimAlign: High Quality Word Alignments without Parallel Training Data
  using Static and Contextualized Embeddings
SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings
Masoud Jalili Sabet
Philipp Dufter
François Yvon
Hinrich Schütze
23
228
0
18 Apr 2020
Translation Artifacts in Cross-lingual Transfer Learning
Translation Artifacts in Cross-lingual Transfer Learning
Mikel Artetxe
Gorka Labaka
Eneko Agirre
27
115
0
09 Apr 2020
Self-Induced Curriculum Learning in Self-Supervised Neural Machine
  Translation
Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation
Dana Ruiter
Josef van Genabith
C. España-Bonet
SSL
26
3
0
07 Apr 2020
Detecting and Understanding Generalization Barriers for Neural Machine
  Translation
Detecting and Understanding Generalization Barriers for Neural Machine Translation
Guanlin Li
Lemao Liu
Conghui Zhu
Tiejun Zhao
Shuming Shi
28
0
0
05 Apr 2020
Machine Translation Pre-training for Data-to-Text Generation -- A Case
  Study in Czech
Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech
Mihir Kale
Scott Roy
14
14
0
05 Apr 2020
PMIndia -- A Collection of Parallel Corpora of Languages of India
PMIndia -- A Collection of Parallel Corpora of Languages of India
Barry Haddow
Faheem Kirefu
19
102
0
27 Jan 2020
A Comprehensive Survey of Multilingual Neural Machine Translation
A Comprehensive Survey of Multilingual Neural Machine Translation
Raj Dabre
Chenhui Chu
Anoop Kunchukuttan
LRM
36
33
0
04 Jan 2020
Automatic Spanish Translation of the SQuAD Dataset for Multilingual
  Question Answering
Automatic Spanish Translation of the SQuAD Dataset for Multilingual Question Answering
C. Carrino
Marta R. Costa-jussá
José A. R. Fonollosa
6
88
0
11 Dec 2019
GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual
  Corpus of Wikipedia Biographies
GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies
Marta R. Costa-jussá
P. Lin
C. España-Bonet
SyDa
31
24
0
10 Dec 2019
JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus
JParaCrawl: A Large Scale Web-Based English-Japanese Parallel Corpus
Makoto Morishita
Jun Suzuki
Masaaki Nagata
LRM
38
64
0
25 Nov 2019
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
Holger Schwenk
Guillaume Wenzek
Sergey Edunov
Edouard Grave
Armand Joulin
33
256
0
10 Nov 2019
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
Ahmed El-Kishky
Vishrav Chaudhary
Francisco Guzman
Philipp Koehn
28
198
0
10 Nov 2019
Should All Cross-Lingual Embeddings Speak English?
Should All Cross-Lingual Embeddings Speak English?
Antonios Anastasopoulos
Graham Neubig
19
31
0
08 Nov 2019
LibriVoxDeEn: A Corpus for German-to-English Speech Translation and
  German Speech Recognition
LibriVoxDeEn: A Corpus for German-to-English Speech Translation and German Speech Recognition
Benjamin Beilharz
Xin Sun
Sariya Karimova
Stefan Riezler
8
28
0
17 Oct 2019
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken
  Utterances Extracted from the Bible
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible
Marcely Zanon Boito
William N. Havard
Mahault Garnerin
Éric Le Ferrand
Laurent Besacier
32
47
0
30 Jul 2019
Previous
12345