Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.06154
Cited By
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
10 November 2019
Ahmed El-Kishky
Vishrav Chaudhary
Francisco Guzman
Philipp Koehn
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs"
21 / 121 papers shown
Title
A Survey on Low-Resource Neural Machine Translation
Rui Wang
Xu Tan
Renqian Luo
Tao Qin
Tie-Yan Liu
3DV
43
58
0
09 Jul 2021
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
Zewen Chi
Shaohan Huang
Li Dong
Shuming Ma
Bo Zheng
...
Payal Bajaj
Xia Song
Xian-Ling Mao
Heyan Huang
Furu Wei
56
119
0
30 Jun 2021
IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task
Pavel Denisov
Manuel Mager
Ngoc Thang Vu
37
6
0
30 Jun 2021
Machine Translation into Low-resource Language Varieties
Sachin Kumar
Antonios Anastasopoulos
S. Wintner
Yulia Tsvetkov
11
29
0
12 Jun 2021
LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models
Hongyu Gong
Vishrav Chaudhary
Yuqing Tang
Francisco Guzmán
32
3
0
07 Jun 2021
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
Naman Goyal
Cynthia Gao
Vishrav Chaudhary
Peng-Jen Chen
Guillaume Wenzek
Da Ju
Sanjan Krishnan
MarcÁurelio Ranzato
Francisco Guzman
Angela Fan
19
564
0
06 Jun 2021
Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data
Wei-Jen Ko
Ahmed El-Kishky
Adithya Renduchintala
Vishrav Chaudhary
Naman Goyal
Francisco Guzmán
Pascale Fung
Philipp Koehn
Mona T. Diab
32
41
0
31 May 2021
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
43
430
0
18 Apr 2021
XLEnt: Mining a Large Cross-lingual Entity Dataset with Lexical-Semantic-Phonetic Word Alignment
Ahmed El-Kishky
Adithya Renduchintala
James Cross
Francisco Guzmán
Philipp Koehn
29
17
0
17 Apr 2021
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages
Gowtham Ramesh
Sumanth Doddapaneni
Aravinth Bheemaraj
Mayank Jobanputra
AK Raghavan
...
K. Deepak
Vivek Raghavan
Anoop Kunchukuttan
Pratyush Kumar
Mitesh Khapra
LRM
37
231
0
12 Apr 2021
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Julia Kreutzer
Isaac Caswell
Lisa Wang
Ahsan Wahab
D. Esch
...
Duygu Ataman
Orevaoghene Ahia
Oghenefego Ahia
Sweta Agrawal
Mofetoluwa Adeyemi
20
269
0
22 Mar 2021
MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani
Jade Z. Abbott
Graham Neubig
Daniel D'souza
Julia Kreutzer
...
T. Diop
A. Diallo
Adewale Akinfaderin
T. Marengereke
Salomey Osei
30
186
0
22 Mar 2021
Quality Estimation without Human-labeled Data
Yi-Lin Tuan
Ahmed El-Kishky
Adithya Renduchintala
Vishrav Chaudhary
Francisco Guzmán
Lucia Specia
16
25
0
08 Feb 2021
The Multilingual TEDx Corpus for Speech Recognition and Translation
Elizabeth Salesky
Sanjeev Khudanpur
Jacob Bremerman
R. Cattoni
Matteo Negri
Marco Turchi
Douglas W. Oard
Matt Post
22
119
0
02 Feb 2021
DeepRepair: Style-Guided Repairing for DNNs in the Real-world Operational Environment
Bing Yu
Hua Qi
Qing Guo
Felix Juefei Xu
Xiaofei Xie
Lei Ma
Jianjun Zhao
17
5
0
19 Nov 2020
Facebook AI's WMT20 News Translation Task Submission
Peng-Jen Chen
Ann Lee
Changhan Wang
Naman Goyal
Angela Fan
Mary Williamson
Jiatao Gu
VLM
25
37
0
16 Nov 2020
Beyond English-Centric Multilingual Machine Translation
Angela Fan
Shruti Bhosale
Holger Schwenk
Zhiyi Ma
Ahmed El-Kishky
...
Vitaliy Liptchinsky
Sergey Edunov
Edouard Grave
Michael Auli
Armand Joulin
LRM
41
832
0
21 Oct 2020
A Bayesian Multilingual Document Model for Zero-shot Topic Identification and Discovery
Santosh Kesiraju
Sangeet Sagar
Ondvrej Glembek
Lukávs Burget
Ján Černocký
S. Gangashetty
29
0
0
02 Jul 2020
Massively Multilingual Document Alignment with Cross-lingual Sentence-Mover's Distance
Ahmed El-Kishky
Francisco Guzmán
19
15
0
31 Jan 2020
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
Holger Schwenk
Guillaume Wenzek
Sergey Edunov
Edouard Grave
Armand Joulin
33
256
0
10 Nov 2019
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
Holger Schwenk
Vishrav Chaudhary
Shuo Sun
Hongyu Gong
Francisco Guzmán
CVBM
51
401
0
10 Jul 2019
Previous
1
2
3