Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.00359
Cited By
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
1 November 2019
Guillaume Wenzek
Marie-Anne Lachaux
Alexis Conneau
Vishrav Chaudhary
Francisco Guzmán
Armand Joulin
Edouard Grave
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data"
50 / 171 papers shown
Title
Distilling a Pretrained Language Model to a Multilingual ASR Model
Kwanghee Choi
Hyung-Min Park
VLM
33
11
0
25 Jun 2022
DIALOG-22 RuATD Generated Text Detection
Narek Maloyan
Bulat Nutfullin
Eugene Ilyushin
DeLMO
30
8
0
16 Jun 2022
A computational psycholinguistic evaluation of the syntactic abilities of Galician BERT models at the interface of dependency resolution and training time
Iria de-Dios-Flores
Marcos Garcia
25
2
0
06 Jun 2022
Finding the Right Recipe for Low Resource Domain Adaptation in Neural Machine Translation
Virginia Adams
Sandeep Subramanian
Mike Chrzanowski
Oleksii Hrinchuk
Oleksii Kuchaiev
33
2
0
02 Jun 2022
Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
R. Liu
Young Jin Kim
Alexandre Muzio
Hany Awadalla
MoE
55
22
0
28 May 2022
On the Role of Bidirectionality in Language Model Pre-Training
Mikel Artetxe
Jingfei Du
Naman Goyal
Luke Zettlemoyer
Ves Stoyanov
30
16
0
24 May 2022
Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog
Chia-Chien Hung
Anne Lauscher
Ivan Vulić
Simone Paolo Ponzetto
Goran Glavaš
33
34
0
20 May 2022
Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters
Yang Xiang
Zhihua Wu
Weibao Gong
Siyu Ding
Xianjie Mo
...
Yue Yu
Ge Li
Yu Sun
Yanjun Ma
Dianhai Yu
24
5
0
19 May 2022
Building Machine Translation Systems for the Next Thousand Languages
Ankur Bapna
Isaac Caswell
Julia Kreutzer
Orhan Firat
D. Esch
...
Apurva Shah
Yanping Huang
Zhehuai Chen
Yonghui Wu
Macduff Hughes
56
99
0
09 May 2022
SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding
Harish Tayyar Madabushi
Edward Gow-Smith
Marcos García
Carolina Scarton
M. Idiart
Aline Villavicencio
20
48
0
21 Apr 2022
IndicXNLI: Evaluating Multilingual Inference for Indian Languages
Divyanshu Aggarwal
V. Gupta
Anoop Kunchukuttan
31
27
0
19 Apr 2022
mGPT: Few-Shot Learners Go Multilingual
Oleh Shliazhko
Alena Fenogenova
Maria Tikhonova
Vladislav Mikhailov
Anastasia Kozlova
Tatiana Shavrina
51
149
0
15 Apr 2022
Combining Static and Contextualised Multilingual Embeddings
Katharina Hämmerl
Jindrich Libovický
Alexander Fraser
27
10
0
17 Mar 2022
Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure
Yuan Chai
Yaobo Liang
Nan Duan
LRM
27
21
0
16 Mar 2022
OCR Improves Machine Translation for Low-Resource Languages
Oana Ignat
Jean Maillard
Vishrav Chaudhary
Francisco Guzmán
45
10
0
27 Feb 2022
A Survey on Artificial Intelligence for Source Code: A Dialogue Systems Perspective
Erfan Al-Hossami
Samira Shaikh
32
6
0
10 Feb 2022
Cedille: A large autoregressive French language model
Martin Müller
Florian Laurent
36
19
0
07 Feb 2022
Negativity Spreads Faster: A Large-Scale Multilingual Twitter Analysis on the Role of Sentiment in Political Communication
Dimosthenis Antypas
Alun D. Preece
Jose Camacho-Collados
27
26
0
01 Feb 2022
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
Suchin Gururangan
Dallas Card
Sarah K. Drier
E. K. Gade
Leroy Z. Wang
Zeyu Wang
Luke Zettlemoyer
Noah A. Smith
175
74
0
25 Jan 2022
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus
Julien Abadji
Pedro Ortiz Suarez
Laurent Romary
Benoît Sagot
CLL
45
153
0
17 Jan 2022
Evaluation of HTR models without Ground Truth Material
Phillip Benjamin Strobel
Simon Clematide
M. Volk
R. Schwitter
Tobias Hodel
David Schoch
19
11
0
17 Jan 2022
Efficient Large Scale Language Modeling with Mixtures of Experts
Mikel Artetxe
Shruti Bhosale
Naman Goyal
Todor Mihaylov
Myle Ott
...
Jeff Wang
Luke Zettlemoyer
Mona T. Diab
Zornitsa Kozareva
Ves Stoyanov
MoE
61
188
0
20 Dec 2021
Few-shot Learning with Multilingual Language Models
Xi Lin
Todor Mihaylov
Mikel Artetxe
Tianlu Wang
Shuohui Chen
...
Luke Zettlemoyer
Zornitsa Kozareva
Mona T. Diab
Ves Stoyanov
Xian Li
BDL
ELM
LRM
64
287
0
20 Dec 2021
Unsupervised Dense Information Retrieval with Contrastive Learning
Gautier Izacard
Mathilde Caron
Lucas Hosseini
Sebastian Riedel
Piotr Bojanowski
Armand Joulin
Edouard Grave
RALM
43
825
0
16 Dec 2021
Large Language Models are not Models of Natural Language: they are Corpus Models
Csaba Veres
17
18
0
13 Dec 2021
Enhancing Multilingual Language Model with Massive Multilingual Knowledge Triples
Linlin Liu
Xin Li
Ruidan He
Lidong Bing
Chenyu You
Luo Si
KELM
42
18
0
22 Nov 2021
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu
Changhan Wang
Andros Tjandra
Kushal Lakhotia
Qiantong Xu
...
Yatharth Saraf
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
SSL
32
665
0
17 Nov 2021
Diverse Distributions of Self-Supervised Tasks for Meta-Learning in NLP
Trapit Bansal
K. Gunasekaran
Tong Wang
Tsendsuren Munkhdalai
Andrew McCallum
SSL
OOD
53
19
0
02 Nov 2021
SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training
Ankur Bapna
Yu-An Chung
Na Wu
Anmol Gulati
Ye Jia
J. Clark
Melvin Johnson
Jason Riesa
Alexis Conneau
Yu Zhang
VLM
64
94
0
20 Oct 2021
Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation
Guanhua Chen
Shuming Ma
Yun-Nung Chen
Dongdong Zhang
Jia Pan
Wenping Wang
Furu Wei
LRM
31
14
0
16 Oct 2021
DS-TOD: Efficient Domain Specialization for Task Oriented Dialog
Chia-Chien Hung
Anne Lauscher
Simone Paolo Ponzetto
Goran Glavaš
41
31
0
15 Oct 2021
We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing
Fredrik Olsson
Magnus Sahlgren
26
1
0
11 Oct 2021
Unsupervised Neural Machine Translation with Generative Language Models Only
Jesse Michael Han
Igor Babuschkin
Harrison Edwards
Arvind Neelakantan
Tao Xu
...
Alex Ray
Pranav Shyam
Aditya A. Ramesh
Alec Radford
Ilya Sutskever
52
36
0
11 Oct 2021
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
34
276
0
06 Oct 2021
Sequential Reptile: Inter-Task Gradient Alignment for Multilingual Learning
Seanie Lee
Haebeom Lee
Juho Lee
Sung Ju Hwang
MoMe
CLL
53
17
0
06 Oct 2021
Is the Number of Trainable Parameters All That Actually Matters?
A. Chatelain
Amine Djeghri
Daniel Hesslow
Julien Launay
Iacopo Poli
56
7
0
24 Sep 2021
DuRecDial 2.0: A Bilingual Parallel Corpus for Conversational Recommendation
Zeming Liu
Haifeng Wang
Zheng-Yu Niu
Hua Wu
Wanxiang Che
26
56
0
18 Sep 2021
AfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages
Machel Reid
Junjie Hu
Graham Neubig
Y. Matsuo
77
31
0
10 Sep 2021
An Unsupervised Method for Building Sentence Simplification Corpora in Multiple Languages
Xinyu Lu
Jipeng Qiang
Yun Li
Yunhao Yuan
Yi Zhu
33
19
0
01 Sep 2021
Facebook AI WMT21 News Translation Task Submission
C. Tran
Shruti Bhosale
James Cross
Philipp Koehn
Sergey Edunov
Angela Fan
VLM
134
81
0
06 Aug 2021
A Survey on Low-Resource Neural Machine Translation
Rui Wang
Xu Tan
Renqian Luo
Tao Qin
Tie-Yan Liu
3DV
43
58
0
09 Jul 2021
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment
Zewen Chi
Li Dong
Bo Zheng
Shaohan Huang
Xian-Ling Mao
Heyan Huang
Furu Wei
45
67
0
11 Jun 2021
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
Naman Goyal
Cynthia Gao
Vishrav Chaudhary
Peng-Jen Chen
Guillaume Wenzek
Da Ju
Sanjan Krishnan
MarcÁurelio Ranzato
Francisco Guzman
Angela Fan
17
564
0
06 Jun 2021
HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish
Robert Mroczkowski
Piotr Rybak
Alina Wróblewska
Ireneusz Gawlik
36
81
0
04 May 2021
Frustratingly Easy Edit-based Linguistic Steganography with a Masked Language Model
Honai Ueoka
Yugo Murawaki
Sadao Kurohashi
19
41
0
20 Apr 2021
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
Yiheng Xu
Tengchao Lv
Lei Cui
Guoxin Wang
Yijuan Lu
D. Florêncio
Cha Zhang
Furu Wei
MLLM
VLM
38
128
0
18 Apr 2021
Large-Scale Self- and Semi-Supervised Learning for Speech Translation
Changhan Wang
Anne Wu
J. Pino
Alexei Baevski
Michael Auli
Alexis Conneau
SSL
33
44
0
14 Apr 2021
Bertinho: Galician BERT Representations
David Vilares
Marcos Garcia
Carlos Gómez-Rodríguez
65
22
0
25 Mar 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
60
94
0
23 Mar 2021
Experimental Evaluation of Deep Learning models for Marathi Text Classification
Atharva Kulkarni
Meet Mandhane
Manali Likhitkar
G. Kshirsagar
J. Jagdale
Raviraj Joshi
46
28
0
13 Jan 2021
Previous
1
2
3
4
Next