ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.14815
  4. Cited By
Adapting Multilingual LLMs to Low-Resource Languages using Continued Pre-training and Synthetic Corpus

Adapting Multilingual LLMs to Low-Resource Languages using Continued Pre-training and Synthetic Corpus

18 October 2024
Raviraj Joshi
Kanishk Singla
Anusha Kamath
Raunak Kalani
Rakesh Paul
Utkarsh Vaidya
Sanjay Singh Chauhan
Niranjan Wartikar
Eileen Long
    SyDa
    CLL
ArXivPDFHTML

Papers citing "Adapting Multilingual LLMs to Low-Resource Languages using Continued Pre-training and Synthetic Corpus"

23 / 23 papers shown
Title
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Ram Mohan Rao Kadiyala
Siddartha Pullakhandam
Siddhant Gupta
Drishti Sharma
Jebish Purbey
Kanwal Mehreen
Muhammad Arham
Hamza Farooq
104
0
0
13 Apr 2025
Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi
Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi
Monojit Choudhury
Shivam Chauhan
Rocktim Jyoti Das
Dhruv Sahnan
Xudong Han
...
Rituraj Joshi
Gurpreet Gosal
Avraham Sheinin
Natalia Vassilieva
Preslav Nakov
84
1
0
08 Apr 2025
L3Cube-IndicQuest: A Benchmark Question Answering Dataset for Evaluating
  Knowledge of LLMs in Indic Context
L3Cube-IndicQuest: A Benchmark Question Answering Dataset for Evaluating Knowledge of LLMs in Indic Context
Pritika Rohera
Chaitrali Ginimav
Akanksha Salunke
Gayatri Sawant
Raviraj Joshi
68
3
0
13 Sep 2024
RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining
RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining
Anh-Dung Vo
Minseong Jung
Wonbeen Lee
Daewoo Choi
31
5
0
21 Aug 2024
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLM
MoE
OSLM
115
873
0
31 Jul 2024
INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages
INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages
A. Singh
Rudra Murthy
Vishwajeet Kumar
Jaydeep Sen
Ashish Mittal
Ganesh Ramakrishnan
170
6
0
18 Jul 2024
Adapting Multilingual LLMs to Low-Resource Languages with Knowledge
  Graphs via Adapters
Adapting Multilingual LLMs to Low-Resource Languages with Knowledge Graphs via Adapters
Daniil Gurgurov
Mareike Hartmann
Simon Ostermann
66
8
0
01 Jul 2024
LlamaTurk: Adapting Open-Source Generative Large Language Models for
  Low-Resource Language
LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language
Cagri Toraman
VLM
96
5
0
13 May 2024
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Gerald Shen
Zhilin Wang
Olivier Delalleau
Jiaqi Zeng
Yi Dong
...
Sahil Jain
Ali Taghibakhshi
Markel Sanz Ausin
Ashwath Aithal
Oleksii Kuchaiev
103
15
0
02 May 2024
AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM
  Experts
AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts
Shaona Ghosh
Prasoon Varshney
Erick Galinkin
Christopher Parisien
ELM
73
48
0
09 Apr 2024
Injecting New Knowledge into Large Language Models via Supervised
  Fine-Tuning
Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning
Nick Mecklenburg
Yiyou Lin
Xiaoxiao Li
Daniel Holstein
Leonardo Nunes
...
Ranveer Chandra
Vijay Aski
Pavan Kumar Reddy Yannam
Tolga Aktas
Todd Hendry
52
28
0
30 Mar 2024
LLMs Are Few-Shot In-Context Low-Resource Language Learners
LLMs Are Few-Shot In-Context Low-Resource Language Learners
Samuel Cahyawijaya
Holy Lovenia
Pascale Fung
87
48
0
25 Mar 2024
Tamil-Llama: A New Tamil Language Model Based on Llama 2
Tamil-Llama: A New Tamil Language Model Based on Llama 2
Abhinand Balachandran
50
32
0
10 Nov 2023
FinGPT: Large Generative Models for a Small Language
FinGPT: Large Generative Models for a Small Language
Risto Luukkonen
Ville Komulainen
Jouni Luoma
Anni Eskelinen
Jenna Kanerva
...
Mikko Merioksa
Jyrki Heinonen
Aija Vahtola
Samuel Antao
S. Pyysalo
LM&MA
44
46
0
03 Nov 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
351
4,312
0
09 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
385
3,981
0
29 May 2023
IndicTrans2: Towards High-Quality and Accessible Machine Translation
  Models for all 22 Scheduled Indian Languages
IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages
Jay Gala
Pranjal A. Chitale
AK Raghavan
Varun Gumma
Sumanth Doddapaneni
...
Vivek Raghavan
Pratyush Kumar
Mitesh M. Khapra
Raj Dabre
Anoop Kunchukuttan
57
137
0
25 May 2023
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca
Yiming Cui
Ziqing Yang
Xin Yao
ALM
57
314
0
17 Apr 2023
Towards Leaving No Indic Language Behind: Building Monolingual Corpora,
  Benchmark and Models for Indic Languages
Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages
Sumanth Doddapaneni
Rahul Aralikatte
Gowtham Ramesh
Shreyansh Goyal
Mitesh M. Khapra
Anoop Kunchukuttan
Pratyush Kumar
ELM
76
85
0
11 Dec 2022
L3Cube-MahaNLP: Marathi Natural Language Processing Datasets, Models,
  and Library
L3Cube-MahaNLP: Marathi Natural Language Processing Datasets, Models, and Library
Raviraj Joshi
71
27
0
29 May 2022
IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic
  Languages
IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages
Aman Kumar
Himani Shrotriya
P. Sahu
Raj Dabre
Ratish Puduppully
Anoop Kunchukuttan
Amogh Mishra
Mitesh M. Khapra
Pratyush Kumar
77
41
0
10 Mar 2022
MuRIL: Multilingual Representations for Indian Languages
MuRIL: Multilingual Representations for Indian Languages
Simran Khanuja
Diksha Bansal
Sarvesh Mehtani
Savya Khosla
Atreyee Dey
...
Shachi Dave
Shruti Gupta
Subhash Chandra Bose Gali
Vishnu Subramanian
Partha P. Talukdar
80
289
0
19 Mar 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
329
1,904
0
17 Sep 2019
1