Learning Multilingual Sentence Representations with Cross-lingual
Consistency Regularization

Learning Multilingual Sentence Representations with Cross-lingual Consistency Regularization

12 June 2023

ArXiv (abs)PDF HTML

Papers citing "Learning Multilingual Sentence Representations with Cross-lingual Consistency Regularization"

18 / 18 papers shown

Title
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study Menglong Cui Pengzhi Gao Wei Liu Jian Luan Bin Wang LRM 92 5 0 04 Feb 2025
LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation Zhuoyuan Mao Tetsuji Nakagawa FedML 48 20 0 16 Feb 2023
No Language Left Behind: Scaling Human-Centered Machine Translation Nllb team Marta R. Costa-jussá James Cross Onur cCelebi Maha Elbayad ... Alexandre Mourachko C. Ropers Safiyyah Saleem Holger Schwenk Jeff Wang MoE 226 1,266 0 11 Jul 2022
Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages Kevin Heffernan Onur cCelebi Holger Schwenk 126 55 0 25 May 2022
Language-agnostic BERT Sentence Embedding Fangxiaoyu Feng Yinfei Yang Daniel Cer N. Arivazhagan Wei Wang 165 913 0 03 Jul 2020
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation Nils Reimers Iryna Gurevych 104 1,029 0 21 Apr 2020
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB Holger Schwenk Guillaume Wenzek Sergey Edunov Edouard Grave Armand Joulin 89 261 0 10 Nov 2019
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs Ahmed El-Kishky Vishrav Chaudhary Francisco Guzman Philipp Koehn 97 199 0 10 Nov 2019
Simple, Scalable Adaptation for Neural Machine Translation Ankur Bapna N. Arivazhagan Orhan Firat AI4CE 113 417 0 18 Sep 2019
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia Holger Schwenk Vishrav Chaudhary Shuo Sun Hongyu Gong Francisco Guzmán CVBM 110 407 0 10 Jul 2019
Multilingual Universal Sentence Encoder for Semantic Retrieval Yinfei Yang Daniel Cer Amin Ahmad Mandy Guo Jax Law ... Steve Yuan Chris Tar Yun-hsuan Sung B. Strope R. Kurzweil 3DV 78 479 0 09 Jul 2019
Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax Yinfei Yang Gustavo Hernández Ábrego Steve Yuan Mandy Guo Qinlan Shen Daniel Cer Yun-hsuan Sung B. Strope R. Kurzweil 73 117 0 22 Feb 2019
Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings Mikel Artetxe Holger Schwenk 65 202 0 03 Nov 2018
Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora Marcin Junczys-Dowmunt 53 135 0 01 Sep 2018
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing Taku Kudo John Richardson 201 3,526 0 19 Aug 2018
FastText.zip: Compressing text classification models Armand Joulin Edouard Grave Piotr Bojanowski Matthijs Douze Hervé Jégou Tomas Mikolov MQ 86 1,216 0 12 Dec 2016
Bag of Tricks for Efficient Text Classification Armand Joulin Edouard Grave Piotr Bojanowski Tomas Mikolov VLM 177 4,630 0 06 Jul 2016
Neural Machine Translation of Rare Words with Subword Units Rico Sennrich Barry Haddow Alexandra Birch 224 7,755 0 31 Aug 2015