ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.00359
  4. Cited By
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

1 November 2019
Guillaume Wenzek
Marie-Anne Lachaux
Alexis Conneau
Vishrav Chaudhary
Francisco Guzmán
Armand Joulin
Edouard Grave
ArXivPDFHTML

Papers citing "CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data"

21 / 171 papers shown
Title
XLM-T: Scaling up Multilingual Machine Translation with Pretrained
  Cross-lingual Transformer Encoders
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Shuming Ma
Jian Yang
Haoyang Huang
Zewen Chi
Li Dong
...
Akiko Eriguchi
Saksham Singhal
Xia Song
Arul Menezes
Furu Wei
LRM
26
33
0
31 Dec 2020
Neural Machine Translation: A Review of Methods, Resources, and Tools
Neural Machine Translation: A Review of Methods, Resources, and Tools
Zhixing Tan
Shuo Wang
Zonghan Yang
Gang Chen
Xuancheng Huang
Maosong Sun
Yang Liu
3DV
AI4TS
35
106
0
31 Dec 2020
Language Models not just for Pre-training: Fast Online Neural Noisy
  Channel Modeling
Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling
Shruti Bhosale
Kyra Yee
Sergey Edunov
Michael Auli
50
7
0
13 Nov 2020
FLERT: Document-Level Features for Named Entity Recognition
FLERT: Document-Level Features for Named Entity Recognition
Stefan Schweter
Alan Akbik
32
111
0
13 Nov 2020
Multilingual AMR-to-Text Generation
Multilingual AMR-to-Text Generation
Angela Fan
Claire Gardent
17
32
0
10 Nov 2020
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Tatiana Likhomanenko
Qiantong Xu
Vineel Pratap
Paden Tomasello
Jacob Kahn
Gilad Avidov
R. Collobert
Gabriel Synnaeve
39
98
0
22 Oct 2020
German's Next Language Model
German's Next Language Model
Branden Chan
Stefan Schweter
Timo Möller
36
265
0
21 Oct 2020
Multi-task Learning for Multilingual Neural Machine Translation
Multi-task Learning for Multilingual Neural Machine Translation
Yiren Wang
Chengxiang Zhai
Hany Awadalla
37
68
0
06 Oct 2020
STIL -- Simultaneous Slot Filling, Translation, Intent Classification,
  and Language Identification: Initial Results using mBART on MultiATIS++
STIL -- Simultaneous Slot Filling, Translation, Intent Classification, and Language Identification: Initial Results using mBART on MultiATIS++
Jack G. M. FitzGerald
29
13
0
02 Oct 2020
Nearest Neighbor Machine Translation
Nearest Neighbor Machine Translation
Urvashi Khandelwal
Angela Fan
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
18
282
0
01 Oct 2020
Unsupervised Cross-lingual Representation Learning for Speech
  Recognition
Unsupervised Cross-lingual Representation Learning for Speech Recognition
Alexis Conneau
Alexei Baevski
R. Collobert
Abdel-rahman Mohamed
Michael Auli
SSL
70
755
0
24 Jun 2020
Cross-lingual Retrieval for Iterative Self-Supervised Training
Cross-lingual Retrieval for Iterative Self-Supervised Training
C. Tran
Y. Tang
Xian Li
Jiatao Gu
RALM
30
73
0
16 Jun 2020
A Monolingual Approach to Contextualized Word Embeddings for
  Mid-Resource Languages
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages
Pedro Ortiz Suarez
Laurent Romary
Benoît Sagot
28
227
0
11 Jun 2020
From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual
  Transfer with Multilingual Transformers
From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers
Anne Lauscher
Vinit Ravishankar
Ivan Vulić
Goran Glavaš
34
56
0
01 May 2020
MUSS: Multilingual Unsupervised Sentence Simplification by Mining
  Paraphrases
MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases
Louis Martin
Angela Fan
Eric Villemonte de la Clergerie
Antoine Bordes
Benoît Sagot
33
36
0
01 May 2020
SimAlign: High Quality Word Alignments without Parallel Training Data
  using Static and Contextualized Embeddings
SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings
Masoud Jalili Sabet
Philipp Dufter
François Yvon
Hinrich Schütze
23
228
0
18 Apr 2020
On the Language Neutrality of Pre-trained Multilingual Representations
On the Language Neutrality of Pre-trained Multilingual Representations
Jindrich Libovický
Rudolf Rosa
Alexander Fraser
25
101
0
09 Apr 2020
XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training,
  Understanding and Generation
XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation
Yaobo Liang
Nan Duan
Yeyun Gong
Ning Wu
Fenfei Guo
...
Shuguang Liu
Fan Yang
Daniel Fernando Campos
Rangan Majumder
Ming Zhou
ELM
VLM
63
343
0
03 Apr 2020
Multilingual Denoising Pre-training for Neural Machine Translation
Multilingual Denoising Pre-training for Neural Machine Translation
Yinhan Liu
Jiatao Gu
Naman Goyal
Xian Li
Sergey Edunov
Marjan Ghazvininejad
M. Lewis
Luke Zettlemoyer
AI4CE
AIMat
70
1,777
0
22 Jan 2020
CamemBERT: a Tasty French Language Model
CamemBERT: a Tasty French Language Model
Louis Martin
Benjamin Muller
Pedro Ortiz Suarez
Yoann Dupont
Laurent Romary
Eric Villemonte de la Clergerie
Djamé Seddah
Benoît Sagot
42
956
0
10 Nov 2019
Speech Intention Understanding in a Head-final Language: A
  Disambiguation Utilizing Intonation-dependency
Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency
Won Ik Cho
Hyeon Seung Lee
J. Yoon
Seokhwan Kim
N. Kim
41
5
0
10 Nov 2018
Previous
1234