Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.08885
Cited By
Low-Resource Corpus Filtering using Multilingual Sentence Embeddings
20 June 2019
Vishrav Chaudhary
Y. Tang
Francisco Guzmán
Holger Schwenk
Philipp Koehn
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Low-Resource Corpus Filtering using Multilingual Sentence Embeddings"
15 / 15 papers shown
Title
Catch Me if You Search: When Contextual Web Search Results Affect the Detection of Hallucinations
Mahjabin Nahar
Eun-Ju Lee
Jin Won Park
Dongwon Lee
HILM
75
0
0
01 Apr 2025
A comparison of data filtering techniques for English-Polish LLM-based machine translation in the biomedical domain
Jorge del Pozo Lérida
Kamil Kojs
János Máté
Mikołaj Antoni Barański
Christian Hardmeier
45
0
0
27 Jan 2025
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most?
HyoJung Han
Akiko Eriguchi
Haoran Xu
Hieu T. Hoang
Marine Carpuat
Huda Khayrallah
VLM
43
2
0
12 Oct 2024
Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text
Isaac Caswell
Lisa Wang
Isabel Papadimitriou
28
0
0
11 Nov 2023
There's no Data Like Better Data: Using QE Metrics for MT Data Filtering
Jan-Thorsten Peter
David Vilar
Daniel Deutsch
Mara Finkelstein
Juraj Juraska
Markus Freitag
22
17
0
09 Nov 2023
A Commonsense-Infused Language-Agnostic Learning Framework for Enhancing Prediction of Political Polarity in Multilingual News Headlines
Swati Swati
Adrian Mladenic Grobelnik
Dunja Mladenić
M. Grobelnik
32
3
0
01 Dec 2022
Improve Sentence Alignment by Divide-and-conquer
Wu Zhang
16
0
0
18 Jan 2022
Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer
Wenda Xu
Michael Stephen Saxon
Misha Sra
Wei Wang
MedIm
19
13
0
06 Oct 2021
Exploiting Parallel Corpora to Improve Multilingual Embedding based Document and Sentence Alignment
Dilan Sachintha
Lakmali Piyarathna
Charith Rajitha
Surangika Ranathunga
24
3
0
12 Jun 2021
Detecting Hallucinated Content in Conditional Neural Sequence Generation
Chunting Zhou
Graham Neubig
Jiatao Gu
Mona T. Diab
P. Guzmán
Luke Zettlemoyer
Marjan Ghazvininejad
HILM
39
195
0
05 Nov 2020
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
Tahmid Hasan
Abhik Bhattacharjee
Kazi Samin Mubasshir
Masum Hasan
Madhusudan Basak
M. Rahman
Rifat Shahriyar
VLM
23
72
0
20 Sep 2020
Cross-lingual Retrieval for Iterative Self-Supervised Training
C. Tran
Y. Tang
Xian Li
Jiatao Gu
RALM
28
73
0
16 Jun 2020
Exploiting Sentence Order in Document Alignment
Brian Thompson
Philipp Koehn
27
19
0
30 Apr 2020
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
Ahmed El-Kishky
Vishrav Chaudhary
Francisco Guzman
Philipp Koehn
20
198
0
10 Nov 2019
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
Holger Schwenk
Vishrav Chaudhary
Shuo Sun
Hongyu Gong
Francisco Guzmán
CVBM
26
401
0
10 Jul 2019
1