Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.05791
Cited By
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
10 July 2019
Holger Schwenk
Vishrav Chaudhary
Shuo Sun
Hongyu Gong
Francisco Guzmán
CVBM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia"
50 / 225 papers shown
Title
Language Varieties of Italy: Technology Challenges and Opportunities
Alan Ramponi
27
7
0
20 Sep 2022
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics
Daniil Larionov
Jens Grunwald
Christoph Leiter
Steffen Eger
28
5
0
20 Sep 2022
Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching
Kunbo Ding
Weijie Liu
Yuejian Fang
Zhe Zhao
Qi Ju
Xuefeng Yang
23
1
0
13 Sep 2022
Multilingual Bidirectional Unsupervised Translation Through Multilingual Finetuning and Back-Translation
Bryan Li
Mohammad Sadegh Rasooli
Ajay Patel
Chris Callison-Burch
50
4
0
06 Sep 2022
CJaFr-v3 : A Freely Available Filtered Japanese-French Aligned Corpus
Raoul Blin
Fabien Cromierès
20
1
0
28 Aug 2022
Benchmarking Azerbaijani Neural Machine Translation
Chih-Chen Chen
William Chen
29
0
0
29 Jul 2022
Finetuning a Kalaallisut-English machine translation system using web-crawled data
Alex Jones
25
2
0
05 Jun 2022
Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian
T. Shamardina
Vladislav Mikhailov
Daniil Chernianskii
Alena Fenogenova
Marat Saidov
A. Valeeva
Tatiana Shavrina
I. Smurov
E. Tutubalina
Ekaterina Artemova
DeLMO
16
30
0
03 Jun 2022
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training
Yan Zeng
Wangchunshu Zhou
Ao Luo
Ziming Cheng
Xinsong Zhang
VLM
29
30
0
01 Jun 2022
Exploring Diversity in Back Translation for Low-Resource Machine Translation
Laurie Burchell
Alexandra Birch
Kenneth Heafield
31
15
0
01 Jun 2022
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
Genta Indra Winata
Alham Fikri Aji
Samuel Cahyawijaya
Rahmad Mahendra
Fajri Koto
...
Pascale Fung
Timothy Baldwin
Jey Han Lau
Rico Sennrich
Sebastian Ruder
42
78
0
31 May 2022
EMS: Efficient and Effective Massively Multilingual Sentence Embedding Learning
Zhuoyuan Mao
Chenhui Chu
Sadao Kurohashi
43
1
0
31 May 2022
Understanding and Mitigating the Uncertainty in Zero-Shot Translation
Wenxuan Wang
Wenxiang Jiao
Shuo Wang
Zhaopeng Tu
Michael R. Lyu
UQLM
42
9
0
20 May 2022
Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters
Yang Xiang
Zhihua Wu
Weibao Gong
Siyu Ding
Xianjie Mo
...
Yue Yu
Ge Li
Yu Sun
Yanjun Ma
Dianhai Yu
24
5
0
19 May 2022
OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval
Tong Niu
Kazuma Hashimoto
Yingbo Zhou
Caiming Xiong
VLM
29
5
0
17 May 2022
SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Sameer Khurana
Antoine Laurent
James R. Glass
27
36
0
17 May 2022
Controlling Translation Formality Using Pre-trained Multilingual Language Models
Elijah Matthew Rippeth
Sweta Agrawal
Marine Carpuat
AI4CE
52
15
0
13 May 2022
Controlling Formality in Low-Resource NMT with Domain Adaptation and Re-Ranking: SLT-CDT-UoS at IWSLT2022
S. Vincent
Loïc Barrault
Carolina Scarton
21
6
0
12 May 2022
Building Machine Translation Systems for the Next Thousand Languages
Ankur Bapna
Isaac Caswell
Julia Kreutzer
Orhan Firat
D. Esch
...
Apurva Shah
Yanping Huang
Zhehuai Chen
Yonghui Wu
Macduff Hughes
56
98
0
09 May 2022
A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
David Ifeoluwa Adelani
Jesujoba Oluwadara Alabi
Angela Fan
Julia Kreutzer
Xiaoyu Shen
...
Ayodele Awokoya
Happy Buzaaba
Blessing K. Sibanda
Andiswa Bukula
Sam Manthalu
29
111
0
04 May 2022
Non-Autoregressive Machine Translation: It's Not as Fast as it Seems
Jindvrich Helcl
Barry Haddow
Alexandra Birch
27
20
0
04 May 2022
How can NLP Help Revitalize Endangered Languages? A Case Study and Roadmap for the Cherokee Language
Shiyue Zhang
B. Frey
Joey Tianyi Zhou
24
36
0
25 Apr 2022
The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer
Pavel Efimov
Leonid Boytsov
E. Arslanova
Pavel Braslavski
30
7
0
13 Apr 2022
Efficient Cluster-Based k-Nearest-Neighbor Machine Translation
Dexin Wang
Kai Fan
Boxing Chen
Deyi Xiong
29
31
0
13 Apr 2022
Considerations for Multilingual Wikipedia Research
Isaac Johnson
Emily A. Lescak
27
3
0
05 Apr 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
...
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
42
100
0
24 Mar 2022
A Survey on Cross-Lingual Summarization
Jiaan Wang
Fandong Meng
Duo Zheng
Yunlong Liang
Zhixu Li
Jianfeng Qu
Jie Zhou
AILaw
28
60
0
23 Mar 2022
Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
25
32
0
16 Mar 2022
Can Synthetic Translations Improve Bitext Quality?
Eleftheria Briakou
Marine Carpuat
25
5
0
15 Mar 2022
OCR Improves Machine Translation for Low-Resource Languages
Oana Ignat
Jean Maillard
Vishrav Chaudhary
Francisco Guzmán
45
10
0
27 Feb 2022
JParaCrawl v3.0: A Large-scale English-Japanese Parallel Corpus
Makoto Morishita
Katsuki Chousa
Jun Suzuki
Masaaki Nagata
25
27
0
25 Feb 2022
USCORE: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation
Jonas Belouadi
Steffen Eger
33
20
0
21 Feb 2022
Sequence-to-Sequence Resources for Catalan
Ona de Gibert
Ksenia Kharitonova
B. Figueras
Jordi Armengol-Estapé
Maite Melero
19
0
0
14 Feb 2022
Human Interpretation of Saliency-based Explanation Over Text
Hendrik Schuff
Alon Jacovi
Heike Adel
Yoav Goldberg
Ngoc Thang Vu
MILM
XAI
FAtt
148
39
0
27 Jan 2022
Multilingual Open Text Release 1: Public Domain News in 44 Languages
Chester Palen-Michel
June-Woo Kim
Constantine Lignos
VLM
29
12
0
14 Jan 2022
DOCmT5: Document-Level Pretraining of Multilingual Language Models
Chia-Hsuan Lee
Aditya Siddhant
Viresh Ratnakar
Melvin Johnson
LRM
25
9
0
16 Dec 2021
Dataset Geography: Mapping Language Data to Language Users
Fahim Faisal
Yinkai Wang
Antonios Anastasopoulos
72
23
0
07 Dec 2021
Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21
Lintang Sutawika
Jan Christian Blaise Cruz
11
3
0
20 Nov 2021
BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine Translation
Eleftheria Briakou
Sida Wang
Luke Zettlemoyer
Marjan Ghazvininejad
34
5
0
12 Nov 2021
Improving Large-scale Language Models and Resources for Filipino
Jan Christian Blaise Cruz
C. Cheng
AI4CE
32
27
0
11 Nov 2021
Analyzing Architectures for Neural Machine Translation Using Low Computational Resources
Aditya Mandke
Onkar Litake
Dipali M. Kadam
28
1
0
06 Nov 2021
FacTeR-Check: Semi-automated fact-checking through Semantic Similarity and Natural Language Inference
Alejandro Martín
Javier Huertas-Tato
Álvaro Huertas-García
Guillermo Villar-Rodríguez
David Camacho
HILM
27
31
0
27 Oct 2021
PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation
Long Doan
L. T. Nguyen
Nguyen Luong Tran
T. Hoang
Dat Quoc Nguyen
33
22
0
23 Oct 2021
Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction
Shubhanshu Mishra
A. Haghighi
VLM
29
4
0
20 Oct 2021
Substructure Distribution Projection for Zero-Shot Cross-Lingual Dependency Parsing
Freda Shi
Kevin Gimpel
Karen Livescu
147
7
0
16 Oct 2021
EdinSaar@WMT21: North-Germanic Low-Resource Multilingual NMT
Svetlana Tchistiakova
Jesujoba Oluwadara Alabi
Koel Dutta Chowdhury
Sourav Dutta
Dana Ruiter
VLM
36
6
0
29 Sep 2021
Improving Arabic Diacritization by Learning to Diacritize and Translate
Brian Thompson
A. Alshehri
42
10
0
29 Sep 2021
Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering
Fahim Faisal
Antonios Anastasopoulos
37
4
0
24 Sep 2021
Everything Is All It Takes: A Multipronged Strategy for Zero-Shot Cross-Lingual Information Extraction
M. Yarmohammadi
Shijie Wu
Marc Marone
Haoran Xu
Seth Ebner
...
Craig Harman
Kenton W. Murray
Aaron Steven White
Mark Dredze
Benjamin Van Durme
31
28
0
14 Sep 2021
Fine Grained Human Evaluation for English-to-Chinese Machine Translation: A Case Study on Scientific Text
Ming Liu
Heng Zhang
Guanhao Wu
34
1
0
13 Sep 2021
Previous
1
2
3
4
5
Next