Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.06154
Cited By
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
10 November 2019
Ahmed El-Kishky
Vishrav Chaudhary
Francisco Guzman
Philipp Koehn
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs"
50 / 121 papers shown
Title
Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation
Yaoming Zhu
Zewei Sun
Shanbo Cheng
Yuyang Huang
Liwei Wu
Mingxuan Wang
33
10
0
20 Dec 2022
GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator
Jian Yang
Shuming Ma
Li Dong
Shaohan Huang
Haoyang Huang
Yuwei Yin
Dongdong Zhang
Liqun Yang
Furu Wei
Zhoujun Li
SyDa
AI4CE
37
25
0
20 Dec 2022
Advancing Multilingual Pre-training: TRIP Triangular Document-level Pre-training for Multilingual Language Models
Hongyuan Lu
Haoyang Huang
Shuming Ma
Dongdong Zhang
W. Lam
Furu Wei
39
4
0
15 Dec 2022
TyDiP: A Dataset for Politeness Classification in Nine Typologically Diverse Languages
A. Srinivasan
Eunsol Choi
40
15
0
29 Nov 2022
Frustratingly Easy Label Projection for Cross-lingual Transfer
Yang Chen
Chao Jiang
Alan Ritter
Wei Xu
32
31
0
28 Nov 2022
TorchScale: Transformers at Scale
Shuming Ma
Hongyu Wang
Shaohan Huang
Wenhui Wang
Zewen Chi
...
Alon Benhaim
Barun Patra
Vishrav Chaudhary
Xia Song
Furu Wei
AI4CE
30
10
0
23 Nov 2022
Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation
Danni Liu
Jan Niehues
23
5
0
02 Nov 2022
Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
Barun Patra
Saksham Singhal
Shaohan Huang
Zewen Chi
Li Dong
Furu Wei
Vishrav Chaudhary
Xia Song
71
23
0
26 Oct 2022
Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings
Iker García-Ferrero
Rodrigo Agerri
German Rigau
69
21
0
23 Oct 2022
AfroLID: A Neural Language Identification Tool for African Languages
Ife Adebara
AbdelRahim Elmadany
Muhammad Abdul-Mageed
Alcides Alcoba Inciarte
36
30
0
21 Oct 2022
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
VLM
MoE
LRM
29
20
0
20 Oct 2022
Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages
Idris Abdulmumin
Michael Beukman
Jesujoba Oluwadara Alabi
Chris C. Emezue
Everlyn Asiko
...
Shamsuddeen Hassan Muhammad
Mofetoluwa Adeyemi
Oreen Yousuf
Sahib Singh
T. Gwadabe
44
8
0
19 Oct 2022
CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation
Jian Yang
Shaohan Huang
Shuming Ma
Yuwei Yin
Li Dong
Dongdong Zhang
Hongcheng Guo
Zhoujun Li
Furu Wei
52
24
0
13 Oct 2022
MTet: Multi-domain Translation for English and Vietnamese
C. Ngo
Trieu H. Trinh
Long Phan
H. Tran
Tai Dang
Hieu Duy Nguyen
Minh Le Nguyen
Minh-Thang Luong
VLM
42
8
0
11 Oct 2022
Language Varieties of Italy: Technology Challenges and Opportunities
Alan Ramponi
32
7
0
20 Sep 2022
CJaFr-v3 : A Freely Available Filtered Japanese-French Aligned Corpus
Raoul Blin
Fabien Cromierès
20
1
0
28 Aug 2022
Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization
Pengcheng He
Baolin Peng
Liyang Lu
Song Wang
Jie Mei
...
Chenguang Zhu
Wayne Xiong
Michael Zeng
Jianfeng Gao
Xuedong Huang
28
47
0
21 Aug 2022
Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation
Muhammad N. ElNokrashy
Amr Hendy
Mohamed Maher
Mohamed Afify
Hany Awadalla
25
2
0
11 Aug 2022
esCorpius: A Massive Spanish Crawling Corpus
Asier Gutiérrez-Fandiño
David Pérez-Fernández
Jordi Armengol-Estapé
D. Griol
Z. Callejas
51
2
0
30 Jun 2022
What Do Compressed Multilingual Machine Translation Models Forget?
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
AI4CE
49
9
0
22 May 2022
OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval
Tong Niu
Kazuma Hashimoto
Yingbo Zhou
Caiming Xiong
VLM
29
5
0
17 May 2022
Building Machine Translation Systems for the Next Thousand Languages
Ankur Bapna
Isaac Caswell
Julia Kreutzer
Orhan Firat
D. Esch
...
Apurva Shah
Yanping Huang
Zhehuai Chen
Yonghui Wu
Macduff Hughes
56
99
0
09 May 2022
A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
David Ifeoluwa Adelani
Jesujoba Oluwadara Alabi
Angela Fan
Julia Kreutzer
Xiaoyu Shen
...
Ayodele Awokoya
Happy Buzaaba
Blessing K. Sibanda
Andiswa Bukula
Sam Manthalu
34
111
0
04 May 2022
A Survey on Cross-Lingual Summarization
Jiaan Wang
Fandong Meng
Duo Zheng
Yunlong Liang
Zhixu Li
Jianfeng Qu
Jie Zhou
AILaw
28
61
0
23 Mar 2022
Pre-Trained Multilingual Sequence-to-Sequence Models: A Hope for Low-Resource Language Translation?
E. Lee
Sarubi Thillainathan
Shravan Nayak
Surangika Ranathunga
David Ifeoluwa Adelani
Ruisi Su
Arya D. McCarthy
VLM
23
43
0
16 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
42
157
0
01 Mar 2022
Sequence-to-Sequence Resources for Catalan
Ona de Gibert
Ksenia Kharitonova
B. Figueras
Jordi Armengol-Estapé
Maite Melero
19
0
0
14 Feb 2022
Data Scaling Laws in NMT: The Effect of Noise and Architecture
Yamini Bansal
Behrooz Ghorbani
Ankush Garg
Biao Zhang
M. Krikun
Colin Cherry
Behnam Neyshabur
Orhan Firat
42
47
0
04 Feb 2022
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus
Julien Abadji
Pedro Ortiz Suarez
Laurent Romary
Benoît Sagot
CLL
45
153
0
17 Jan 2022
Multilingual Open Text Release 1: Public Domain News in 44 Languages
Chester Palen-Michel
June-Woo Kim
Constantine Lignos
VLM
29
12
0
14 Jan 2022
DOCmT5: Document-Level Pretraining of Multilingual Language Models
Chia-Hsuan Lee
Aditya Siddhant
Viresh Ratnakar
Melvin Johnson
LRM
25
9
0
16 Dec 2021
Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21
Lintang Sutawika
Jan Christian Blaise Cruz
16
3
0
20 Nov 2021
BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine Translation
Eleftheria Briakou
Sida Wang
Luke Zettlemoyer
Marjan Ghazvininejad
34
5
0
12 Nov 2021
Improving Large-scale Language Models and Resources for Filipino
Jan Christian Blaise Cruz
C. Cheng
AI4CE
35
27
0
11 Nov 2021
Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Jian Yang
Shuming Ma
Haoyang Huang
Dongdong Zhang
Li Dong
...
Alexandre Muzio
Saksham Singhal
Hany Awadalla
Xia Song
Furu Wei
35
45
0
03 Nov 2021
PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation
Long Doan
L. T. Nguyen
Nguyen Luong Tran
T. Hoang
Dat Quoc Nguyen
33
22
0
23 Oct 2021
Alternative Input Signals Ease Transfer in Multilingual Machine Translation
Simeng Sun
Angela Fan
James Cross
Vishrav Chaudhary
C. Tran
Philipp Koehn
Francisco Guzman
32
16
0
15 Oct 2021
We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing
Fredrik Olsson
Magnus Sahlgren
26
1
0
11 Oct 2021
EdinSaar@WMT21: North-Germanic Low-Resource Multilingual NMT
Svetlana Tchistiakova
Jesujoba Oluwadara Alabi
Koel Dutta Chowdhury
Sourav Dutta
Dana Ruiter
VLM
36
6
0
29 Sep 2021
Improving Arabic Diacritization by Learning to Diacritize and Translate
Brian Thompson
A. Alshehri
45
10
0
29 Sep 2021
Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering
Fahim Faisal
Antonios Anastasopoulos
37
4
0
24 Sep 2021
Multilingual Document-Level Translation Enables Zero-Shot Transfer From Sentences to Documents
Biao Zhang
Ankur Bapna
Melvin Johnson
A. Dabirmoghaddam
N. Arivazhagan
Orhan Firat
34
12
0
21 Sep 2021
Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications
Shuo Sun
Ahmed El-Kishky
Vishrav Chaudhary
James Cross
Francisco Guzmán
Lucia Specia
26
1
0
17 Sep 2021
Evaluating Multiway Multilingual NMT in the Turkic Languages
Jamshidbek Mirzakhalov
A. Babu
Aigiz Kunafin
Ahsan Wahab
Behzodbek Moydinboyev
...
Julia Kreutzer
Francis M. Tyers
Orhan Firat
John Licato
Sriram Chellappan
ELM
33
9
0
13 Sep 2021
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
VLM
LM&MA
31
261
0
12 Aug 2021
Facebook AI WMT21 News Translation Task Submission
C. Tran
Shruti Bhosale
James Cross
Philipp Koehn
Sergey Edunov
Angela Fan
VLM
134
81
0
06 Aug 2021
PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence Pretraining
Machel Reid
Mikel Artetxe
VLM
52
26
0
04 Aug 2021
The USYD-JD Speech Translation System for IWSLT 2021
Liang Ding
Di Wu
Dacheng Tao
44
16
0
24 Jul 2021
Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages
Dana Ruiter
Dietrich Klakow
Josef van Genabith
C. España-Bonet
33
9
0
19 Jul 2021
Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning
Jun Wang
Chang Xu
Francisco Guzman
Ahmed El-Kishky
Yuqing Tang
Benjamin I. P. Rubinstein
Trevor Cohn
AAML
SILM
30
33
0
12 Jul 2021
Previous
1
2
3
Next