ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.05596
  4. Cited By
Samanantar: The Largest Publicly Available Parallel Corpora Collection
  for 11 Indic Languages
v1v2v3v4 (latest)

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages

12 April 2021
Gowtham Ramesh
Sumanth Doddapaneni
Aravinth Bheemaraj
Mayank Jobanputra
AK Raghavan
Ajit Sharma
Sujit Sahoo
Harshita Diddee
J. Mahalakshmi
Divyanshu Kakwani
Navneet Kumar
Aswin Pradeep
Srihari Nagaraj
K. Deepak
Vivek Raghavan
Anoop Kunchukuttan
Pratyush Kumar
Mitesh Khapra
    LRM
ArXiv (abs)PDFHTML

Papers citing "Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages"

47 / 47 papers shown
Title
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Ram Mohan Rao Kadiyala
Siddartha Pullakhandam
Siddhant Gupta
Drishti Sharma
Jebish Purbey
Kanwal Mehreen
Muhammad Arham
Hamza Farooq
104
0
0
13 Apr 2025
A kinetic-based regularization method for data science applications
Abhisek Ganguly
Alessandro Gabbana
Vybhav Rao
Sauro Succi
Santosh Ansumali
128
0
0
06 Mar 2025
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models
Hieu Man
Nghia Trung Ngo
Viet Dac Lai
Ryan Rossi
Franck Dernoncourt
T. Nguyen
583
1
0
01 Jan 2025
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual
  Machine Translation
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
Naman Goyal
Cynthia Gao
Vishrav Chaudhary
Peng-Jen Chen
Guillaume Wenzek
Da Ju
Sanjan Krishnan
MarcÁurelio Ranzato
Francisco Guzman
Angela Fan
107
587
0
06 Jun 2021
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Julia Kreutzer
Isaac Caswell
Lisa Wang
Ahsan Wahab
D. Esch
...
Duygu Ataman
Orevaoghene Ahia
Oghenefego Ahia
Sweta Agrawal
Mofetoluwa Adeyemi
56
278
0
22 Mar 2021
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
311
285
0
02 Feb 2021
Data Augmentation and Terminology Integration for Domain-Specific
  Sinhala-English-Tamil Statistical Machine Translation
Data Augmentation and Terminology Integration for Domain-Specific Sinhala-English-Tamil Statistical Machine Translation
Aloka Fernando
Surangika Ranathunga
G. Dias
37
24
0
05 Nov 2020
Subword Segmentation and a Single Bridge Language Affect Zero-Shot
  Neural Machine Translation
Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation
Annette Rios Gonzales
Mathias Müller
Rico Sennrich
44
19
0
03 Nov 2020
mT5: A massively multilingual pre-trained text-to-text transformer
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
142
2,559
0
22 Oct 2020
Complete Multilingual Neural Machine Translation
Complete Multilingual Neural Machine Translation
Markus Freitag
Orhan Firat
61
44
0
20 Oct 2020
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New
  Datasets for Bengali-English Machine Translation
Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation
Tahmid Hasan
Abhik Bhattacharjee
Kazi Samin Mubasshir
Masum Hasan
Madhusudan Basak
M. Rahman
Rifat Shahriyar
VLM
57
77
0
20 Sep 2020
Revisiting Low Resource Status of Indian Languages in Machine
  Translation
Revisiting Low Resource Status of Indian Languages in Machine Translation
Jerin Philip
Shashank Siripragada
Vinay P. Namboodiri
C. V. Jawahar
61
28
0
11 Aug 2020
Multilingual Translation with Extensible Multilingual Pretraining and
  Finetuning
Multilingual Translation with Extensible Multilingual Pretraining and Finetuning
Y. Tang
C. Tran
Xian Li
Peng-Jen Chen
Naman Goyal
Vishrav Chaudhary
Jiatao Gu
Angela Fan
CLL
131
462
0
02 Aug 2020
A Multilingual Parallel Corpora Collection Effort for Indian Languages
A Multilingual Parallel Corpora Collection Effort for Indian Languages
Shashank Siripragrada
Jerin Philip
Vinay P. Namboodiri
C. V. Jawahar
VLM
59
47
0
15 Jul 2020
Language-agnostic BERT Sentence Embedding
Language-agnostic BERT Sentence Embedding
Fangxiaoyu Feng
Yinfei Yang
Daniel Cer
N. Arivazhagan
Wei Wang
168
913
0
03 Jul 2020
TICO-19: the Translation Initiative for Covid-19
TICO-19: the Translation Initiative for Covid-19
Antonios Anastasopoulos
A. Cattelan
Zi-Yi Dou
Marcello Federico
C. Federman
...
Mengmeng Niu
A. Oktem
Eric Paquin
G. Tang
Sylwia Tur
59
92
0
03 Jul 2020
Making Monolingual Sentence Embeddings Multilingual using Knowledge
  Distillation
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
Nils Reimers
Iryna Gurevych
104
1,030
0
21 Apr 2020
Neural Machine Translation System of Indic Languages -- An Attention
  based Approach
Neural Machine Translation System of Indic Languages -- An Attention based Approach
Parth Shah
Vishvajit Bakrola
53
21
0
02 Feb 2020
PMIndia -- A Collection of Parallel Corpora of Languages of India
PMIndia -- A Collection of Parallel Corpora of Languages of India
Barry Haddow
Faheem Kirefu
51
102
0
27 Jan 2020
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
Holger Schwenk
Guillaume Wenzek
Sergey Edunov
Edouard Grave
Armand Joulin
91
261
0
10 Nov 2019
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
Ahmed El-Kishky
Vishrav Chaudhary
Francisco Guzman
Philipp Koehn
100
199
0
10 Nov 2019
Multilingual Neural Machine Translation with Language Clustering
Multilingual Neural Machine Translation with Language Clustering
Xu Tan
Jiale Chen
Di He
Yingce Xia
Tao Qin
Tie-Yan Liu
220
112
0
25 Aug 2019
Massively Multilingual Neural Machine Translation in the Wild: Findings
  and Challenges
Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges
N. Arivazhagan
Ankur Bapna
Orhan Firat
Dmitry Lepikhin
Melvin Johnson
...
George F. Foster
Colin Cherry
Wolfgang Macherey
Zhiwen Chen
Yonghui Wu
90
428
0
11 Jul 2019
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from
  Wikipedia
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
Holger Schwenk
Vishrav Chaudhary
Shuo Sun
Hongyu Gong
Francisco Guzmán
CVBM
110
407
0
10 Jul 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
274
2,323
0
02 May 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLMFaML
114
3,156
0
01 Apr 2019
The Missing Ingredient in Zero-Shot Neural Machine Translation
The Missing Ingredient in Zero-Shot Neural Machine Translation
N. Arivazhagan
Ankur Bapna
Orhan Firat
Roee Aharoni
Melvin Johnson
Wolfgang Macherey
81
117
0
17 Mar 2019
Massively Multilingual Neural Machine Translation
Massively Multilingual Neural Machine Translation
Roee Aharoni
Melvin Johnson
Orhan Firat
LRMAI4CE
85
489
0
28 Feb 2019
The FLoRes Evaluation Datasets for Low-Resource Machine Translation:
  Nepali-English and Sinhala-English
The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English
Francisco Guzmán
Peng-Jen Chen
Myle Ott
J. Pino
Guillaume Lample
Philipp Koehn
Vishrav Chaudhary
MarcÁurelio Ranzato
73
145
0
04 Feb 2019
Cross-lingual Language Model Pretraining
Cross-lingual Language Model Pretraining
Guillaume Lample
Alexis Conneau
104
2,748
0
22 Jan 2019
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual
  Transfer and Beyond
Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
Mikel Artetxe
Holger Schwenk
3DV
154
1,016
0
26 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,175
0
11 Oct 2018
Trivial Transfer Learning for Low-Resource Neural Machine Translation
Trivial Transfer Learning for Low-Resource Neural Machine Translation
Tom Kocmi
Ondrej Bojar
86
172
0
02 Sep 2018
Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora
Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora
Marcin Junczys-Dowmunt
53
135
0
01 Sep 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,196
0
20 Apr 2018
The IIT Bombay English-Hindi Parallel Corpus
The IIT Bombay English-Hindi Parallel Corpus
Anoop Kunchukuttan
Pratik Mehta
P. Bhattacharyya
AIMat
89
251
0
08 Oct 2017
Transfer Learning across Low-Resource, Related Languages for Neural
  Machine Translation
Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation
Toan Q. Nguyen
David Chiang
72
210
0
31 Aug 2017
Six Challenges for Neural Machine Translation
Six Challenges for Neural Machine Translation
Philipp Koehn
Rebecca Knowles
AAMLAIMat
373
1,225
0
12 Jun 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
732
132,363
0
12 Jun 2017
Billion-scale similarity search with GPUs
Billion-scale similarity search with GPUs
Jeff Johnson
Matthijs Douze
Hervé Jégou
257
3,737
0
28 Feb 2017
Google's Multilingual Neural Machine Translation System: Enabling
  Zero-Shot Translation
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Melvin Johnson
M. Schuster
Quoc V. Le
M. Krikun
Yonghui Wu
...
F. Viégas
Martin Wattenberg
Gregory S. Corrado
Macduff Hughes
Jeffrey Dean
124
2,096
0
14 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
911
6,796
0
26 Sep 2016
Multi-Way, Multilingual Neural Machine Translation with a Shared
  Attention Mechanism
Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism
Orhan Firat
Kyunghyun Cho
Yoshua Bengio
LRMAIMat
273
627
0
06 Jan 2016
Improving Neural Machine Translation Models with Monolingual Data
Improving Neural Machine Translation Models with Monolingual Data
Rico Sennrich
Barry Haddow
Alexandra Birch
257
2,723
0
20 Nov 2015
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
228
7,757
0
31 Aug 2015
Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
578
27,327
0
01 Sep 2014
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
424
43,814
0
01 May 2014
1