Towards Leaving No Indic Language Behind: Building Monolingual Corpora,
Benchmark and Models for Indic Languages

Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages

11 December 2022

Sumanth Doddapaneni

Rahul Aralikatte

Shreyansh Goyal

Mitesh M. Khapra

Anoop Kunchukuttan

Papers citing "Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages"

13 / 13 papers shown

Title
CMLFormer: A Dual Decoder Transformer with Switching Point Learning for Code-Mixed Language Modeling Aditeya Baral Allen George Ajith Roshan Nayak Mrityunjay Abhijeet Bhanja 7 0 0 19 May 2025
Adapting Multilingual LLMs to Low-Resource Languages using Continued Pre-training and Synthetic Corpus Raviraj Joshi Kanishk Singla Anusha Kamath Raunak Kalani Rakesh Paul Utkarsh Vaidya Sanjay Singh Chauhan Niranjan Wartikar Eileen Long SyDa CLL 35 2 0 18 Oct 2024
INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages A. Singh Rudra Murthy Vishwajeet Kumar Jaydeep Sen Ashish Mittal Ganesh Ramakrishnan 42 6 0 18 Jul 2024
An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models Nandini Mundra Aditya Nanda Kishore Raj Dabre Ratish Puduppully Anoop Kunchukuttan Mitesh Khapra 30 3 0 08 Jul 2024
Too Late to Train, Too Early To Use? A Study on Necessity and Viability of Low-Resource Bengali LLMs Tamzeed Mahfuz Satak Kumar Dey Ruwad Naswan Hasnaen Adil Khondker Salman Sayeed Haz Sameen Shahgir 36 0 0 29 Jun 2024
Decoding the Diversity: A Review of the Indic AI Research Landscape Sankalp KJ Vinija Jain S. Bhaduri Tamoghna Roy Aman Chadha 55 5 0 13 Jun 2024
Direct Punjabi to English speech translation using discrete units Prabhjot Kaur L. A. M. Bush Weisong Shi 31 0 0 25 Feb 2024
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks Sanchit Ahuja Divyanshu Aggarwal Varun Gumma Ishaan Watts Ashutosh Sathe ... Rishav Hada Prachi Jain Maxamed Axmed Kalika Bali Sunayana Sitaram ELM 42 39 0 13 Nov 2023
A Comprehensive Analysis of Adapter Efficiency Nandini Mundra Sumanth Doddapaneni Raj Dabre Anoop Kunchukuttan Ratish Puduppully Mitesh M. Khapra 23 10 0 12 May 2023
AfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages Machel Reid Junjie Hu Graham Neubig Y. Matsuo 77 31 0 10 Sep 2021
MLQA: Evaluating Cross-lingual Extractive Question Answering Patrick Lewis Barlas Oğuz Ruty Rinott Sebastian Riedel Holger Schwenk ELM 246 493 0 16 Oct 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,984 0 20 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,746 0 26 Sep 2016