ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.05749
  4. Cited By
A Shocking Amount of the Web is Machine Translated: Insights from
  Multi-Way Parallelism

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism

11 January 2024
Brian Thompson
Mehak Preet Dhaliwal
Peter Frisch
Tobias Domhan
Marcello Federico
ArXivPDFHTML

Papers citing "A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism"

17 / 17 papers shown
Title
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
HyoJung Han
Akiko Eriguchi
Haoran Xu
Hieu T. Hoang
Marine Carpuat
Huda Khayrallah
VLM
69
3
0
12 Oct 2024
Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
Kaden Uhlig
Joern Wuebker
Raphael Reinauer
John DeNero
80
0
0
26 Sep 2024
Dubbing in Practice: A Large Scale Study of Human Localization With
  Insights for Automatic Dubbing
Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing
William Brannon
Yogesh Virkar
Brian Thompson
52
22
0
23 Dec 2022
What Language Model to Train if You Have One Million GPU Hours?
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
265
107
0
27 Oct 2022
No Language Left Behind: Scaling Human-Centered Machine Translation
No Language Left Behind: Scaling Human-Centered Machine Translation
Nllb team
Marta R. Costa-jussá
James Cross
Onur cCelebi
Maha Elbayad
...
Alexandre Mourachko
C. Ropers
Safiyyah Saleem
Holger Schwenk
Jeff Wang
MoE
215
1,258
0
11 Jul 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
459
6,231
0
05 Apr 2022
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean
  Crawled Corpus
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge
Maarten Sap
Ana Marasović
William Agnew
Gabriel Ilharco
Dirk Groeneveld
Margaret Mitchell
Matt Gardner
AILaw
110
446
0
18 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
436
2,091
0
31 Dec 2020
Language ID in the Wild: Unexpected Challenges on the Path to a
  Thousand-Language Web Text Corpus
Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Isaac Caswell
Theresa Breiner
D. Esch
Ankur Bapna
65
89
0
27 Oct 2020
Language-agnostic BERT Sentence Embedding
Language-agnostic BERT Sentence Embedding
Fangxiaoyu Feng
Yinfei Yang
Daniel Cer
N. Arivazhagan
Wei Wang
159
904
0
03 Jul 2020
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot
  Paraphrasing
Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing
Brian Thompson
Matt Post
LRM
53
190
0
30 Apr 2020
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
Holger Schwenk
Guillaume Wenzek
Sergey Edunov
Edouard Grave
Armand Joulin
79
260
0
10 Nov 2019
Low-Resource Corpus Filtering using Multilingual Sentence Embeddings
Low-Resource Corpus Filtering using Multilingual Sentence Embeddings
Vishrav Chaudhary
Y. Tang
Francisco Guzmán
Holger Schwenk
Philipp Koehn
62
79
0
20 Jun 2019
Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora
Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora
Marcin Junczys-Dowmunt
45
135
0
01 Sep 2018
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural
  Machine Translation
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation
Antonio Toral
Sheila Castilho
Ke Hu
Andy Way
48
190
0
30 Aug 2018
Has Machine Translation Achieved Human Parity? A Case for Document-level
  Evaluation
Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation
Samuel Läubli
Rico Sennrich
M. Volk
41
258
0
21 Aug 2018
Billion-scale similarity search with GPUs
Billion-scale similarity search with GPUs
Jeff Johnson
Matthijs Douze
Hervé Jégou
257
3,720
0
28 Feb 2017
1