ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.00677
  4. Cited By
IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model
  for Indonesian NLP

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP

2 November 2020
Fajri Koto
Afshin Rahimi
Jey Han Lau
Timothy Baldwin
ArXiv (abs)PDFHTML

Papers citing "IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP"

49 / 49 papers shown
Title
The State of Large Language Models for African Languages: Progress and Challenges
The State of Large Language Models for African Languages: Progress and Challenges
Kedir Yassin Hussen
W. Sewunetie
Abinew Ali Ayele
Sukairaj Hafiz Imam
Shamsuddeen Hassan Muhammad
Seid Muhie Yimam
38
0
0
02 Jun 2025
Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi
Llama-3-Nanda-10B-Chat: An Open Generative Large Language Model for Hindi
Monojit Choudhury
Shivam Chauhan
Rocktim Jyoti Das
Dhruv Sahnan
Xudong Han
...
Rituraj Joshi
Gurpreet Gosal
Avraham Sheinin
Natalia Vassilieva
Preslav Nakov
101
1
0
08 Apr 2025
SailCompass: Towards Reproducible and Robust Evaluation for Southeast
  Asian Languages
SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages
Jia Guo
Longxu Dou
Guangtao Zeng
Stanley Kok
Wei Lu
Qian Liu
ELMLRM
129
2
0
02 Dec 2024
Crowdsourcing Lexical Diversity
Crowdsourcing Lexical Diversity
H. Khalilia
Jahna Otterbacher
Gábor Bella
Rusma Noortyani
Shandy Darma
Fausto Giunchiglia
66
2
0
30 Oct 2024
Responsible Multilingual Large Language Models: A Survey of Development,
  Applications, and Societal Impact
Responsible Multilingual Large Language Models: A Survey of Development, Applications, and Societal Impact
Junhua Liu
Bin Fu
LRM
44
1
0
23 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic
  Reasoning Tasks
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLMCoGeReLMVLMLRM
83
0
0
17 Oct 2024
Generating bilingual example sentences with large language models as
  lexicography assistants
Generating bilingual example sentences with large language models as lexicography assistants
Raphael Merx
Ekaterina Vylomova
Kemal Kurniawan
59
3
0
04 Oct 2024
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia
Fajri Koto
ELM
161
3
0
13 Sep 2024
Enhancing Natural Language Inference Performance with Knowledge Graph
  for COVID-19 Automated Fact-Checking in Indonesian Language
Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language
Arief Purnama Muharram
Ayu Purwarianti
79
2
0
22 Aug 2024
Cendol: Open Instruction-tuned Generative Large Language Models for
  Indonesian Languages
Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Rifki Afina Putri
Emmanuel Dave
...
Bryan Wilie
Genta Indra Winata
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
139
18
0
09 Apr 2024
MasonTigers at SemEval-2024 Task 1: An Ensemble Approach for Semantic
  Textual Relatedness
MasonTigers at SemEval-2024 Task 1: An Ensemble Approach for Semantic Textual Relatedness
Dhiman Goswami
Sadiya Sayara Chowdhury Puspo
Md. Nishat Raihan
Al Nahian Bin Emran
Amrita Ganguly
Marcos Zampieri
108
2
0
22 Mar 2024
Investigating Text Shortening Strategy in BERT: Truncation vs
  Summarization
Investigating Text Shortening Strategy in BERT: Truncation vs Summarization
Mirza Alim Mutasodirin
Radityo Eko Prasojo
50
8
0
19 Mar 2024
Simple Hack for Transformers against Heavy Long-Text Classification on a
  Time- and Memory-Limited GPU Service
Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service
Mirza Alim Mutasodirin
Radityo Eko Prasojo
Achmad F. Abka
Hanif Rasyidi
VLM
58
0
0
19 Mar 2024
Few-Shot Cross-Lingual Transfer for Prompting Large Language Models in
  Low-Resource Languages
Few-Shot Cross-Lingual Transfer for Prompting Large Language Models in Low-Resource Languages
Christopher Toukmaji
LRM
81
1
0
09 Mar 2024
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural
Wilson Wongso
David Samuel Setiawan
Steven Limcorn
Ananto Joyoadikusumo
58
1
0
04 Mar 2024
Aya Model: An Instruction Finetuned Open-Access Multilingual Language
  Model
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Ahmet Üstün
Viraat Aryabumi
Zheng-Xin Yong
Wei-Yin Ko
Daniel D'souza
...
Shayne Longpre
Niklas Muennighoff
Marzieh Fadaee
Julia Kreutzer
Sara Hooker
ALMELMSyDaLRM
98
230
0
12 Feb 2024
Zero-shot Sentiment Analysis in Low-Resource Languages Using a
  Multilingual Sentiment Lexicon
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
Fajri Koto
Tilman Beck
Zeerak Talat
Iryna Gurevych
Timothy Baldwin
95
7
0
03 Feb 2024
NLEBench+NorGLM: A Comprehensive Empirical Analysis and Benchmark
  Dataset for Generative Language Models in Norwegian
NLEBench+NorGLM: A Comprehensive Empirical Analysis and Benchmark Dataset for Generative Language Models in Norwegian
Peng Liu
Lemei Zhang
Terje Nissen Farup
Even W. Lauvrak
Jon Espen Ingvaldsen
Simen Eide
J. Gulla
Zhirong Yang
ELM
97
6
0
03 Dec 2023
IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian
  Local Languages
IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages
Muhammad Farid Adilazuarda
Samuel Cahyawijaya
Genta Indra Winata
Pascale Fung
Ayu Purwarianti
102
12
0
21 Nov 2023
Bridging the Digital Divide: Performance Variation across Socio-Economic
  Factors in Vision-Language Models
Bridging the Digital Divide: Performance Variation across Socio-Economic Factors in Vision-Language Models
Joan Nwatu
Oana Ignat
Rada Mihalcea
80
11
0
09 Nov 2023
COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances
COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances
Haryo Akbarianto Wibowo
Erland Hilman Fuadi
Made Nindyatama Nityasya
Radityo Eko Prasojo
Alham Fikri Aji
LRM
120
24
0
02 Nov 2023
QASiNa: Religious Domain Question Answering using Sirah Nabawiyah
QASiNa: Religious Domain Question Answering using Sirah Nabawiyah
Muhammad Razif Rizqullah
Ayu Purwarianti
Alham Fikri Aji
83
15
0
12 Oct 2023
Large Language Models Only Pass Primary School Exams in Indonesia: A
  Comprehensive Test on IndoMMLU
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU
Fajri Koto
Nurul Aisyah
Haonan Li
Timothy Baldwin
AI4EdLRMELM
104
46
0
07 Oct 2023
NusaWrites: Constructing High-Quality Corpora for Underrepresented and
  Extremely Low-Resource Languages
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Dea Adhista
Emmanuel Dave
...
Genta Indra Winata
David Moeljadi
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
85
9
0
19 Sep 2023
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in
  Indonesian
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian
Willy Fitra Hendria
58
3
0
20 Jun 2023
bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark
bgGLUE: A Bulgarian General Language Understanding Evaluation Benchmark
Momchil Hardalov
Pepa Atanasova
Todor Mihaylov
G. Angelova
K. Simov
P. Osenova
Ves Stoyanov
Ivan Koychev
Preslav Nakov
Dragomir R. Radev
ELMFedML
79
4
0
04 Jun 2023
Bactrian-X: Multilingual Replicable Instruction-Following Models with
  Low-Rank Adaptation
Bactrian-X: Multilingual Replicable Instruction-Following Models with Low-Rank Adaptation
Haonan Li
Fajri Koto
Minghao Wu
Alham Fikri Aji
Timothy Baldwin
ALM
72
76
0
24 May 2023
InstructAlign: High-and-Low Resource Language Alignment via Continual
  Crosslingual Instruction Tuning
InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning
Samuel Cahyawijaya
Holy Lovenia
Tiezheng Yu
Willy Chung
Pascale Fung
ALM
91
15
0
23 May 2023
A Comprehensive Analysis of Adapter Efficiency
A Comprehensive Analysis of Adapter Efficiency
Nandini Mundra
Sumanth Doddapaneni
Raj Dabre
Anoop Kunchukuttan
Ratish Puduppully
Mitesh M. Khapra
60
11
0
12 May 2023
GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark
GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark
Dongyang Li
Ruixue Ding
Qiang-Wei Zhang
Zheng Li
Boli Chen
...
Yao Xu
Xin Li
Ning Guo
Fei Huang
Xiaofeng He
ELMVLM
65
6
0
11 May 2023
ScandEval: A Benchmark for Scandinavian Natural Language Processing
ScandEval: A Benchmark for Scandinavian Natural Language Processing
Dan Saattrup Nielsen
ELM
80
14
0
03 Apr 2023
Sejarah dan Perkembangan Teknik Natural Language Processing (NLP) Bahasa
  Indonesia: Tinjauan tentang sejarah, perkembangan teknologi, dan aplikasi NLP
  dalam bahasa Indonesia
Sejarah dan Perkembangan Teknik Natural Language Processing (NLP) Bahasa Indonesia: Tinjauan tentang sejarah, perkembangan teknologi, dan aplikasi NLP dalam bahasa Indonesia
Mukhlis Amien
23
5
0
28 Mar 2023
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Samuel Cahyawijaya
Holy Lovenia
Alham Fikri Aji
Genta Indra Winata
Bryan Wilie
...
Timothy Baldwin
Sebastian Ruder
Herry Sujaini
S. Sakti
Ayu Purwarianti
127
50
0
19 Dec 2022
This is the way: designing and compiling LEPISZCZE, a comprehensive NLP
  benchmark for Polish
This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish
Lukasz Augustyniak
Kamil Tagowski
Albert Sawczyn
Denis Janiak
Roman Bartusiak
...
Arkadiusz Janz
Piotr Szymañski
M. Morzy
Tomasz Kajdanowicz
Maciej Piasecki
62
12
0
23 Nov 2022
GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language
  Models at Almost No Cost
GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language Models at Almost No Cost
Qingcheng Zeng
Lucas Garay
Peilin Zhou
Dading Chong
Yining Hua
Jiageng Wu
Yi-Cheng Pan
Han Zhou
Rob Voigt
Jie Yang
VLM
56
28
0
13 Nov 2022
Rethinking Annotation: Can Language Learners Contribute?
Rethinking Annotation: Can Language Learners Contribute?
Haneul Yoo
Rifki Afina Putri
Changyoon Lee
Youngin Lee
So-Yeon Ahn
Dongyeop Kang
Alice Oh
92
0
0
13 Oct 2022
Location-based Twitter Filtering for the Creation of Low-Resource
  Language Datasets in Indonesian Local Languages
Location-based Twitter Filtering for the Creation of Low-Resource Language Datasets in Indonesian Local Languages
Mukhlis Amien
Chong Feng
Heyan Huang
52
3
0
15 Jun 2022
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local
  Languages
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
Genta Indra Winata
Alham Fikri Aji
Samuel Cahyawijaya
Rahmad Mahendra
Fajri Koto
...
Pascale Fung
Timothy Baldwin
Jey Han Lau
Rico Sennrich
Sebastian Ruder
107
88
0
31 May 2022
MaskEval: Weighted MLM-Based Evaluation for Text Summarization and
  Simplification
MaskEval: Weighted MLM-Based Evaluation for Text Summarization and Simplification
Yu Lu Liu
Rachel Bawden
Thomas Scaliom
Benoît Sagot
Jackie C.K. Cheung
71
4
0
24 May 2022
One Country, 700+ Languages: NLP Challenges for Underrepresented
  Languages and Dialects in Indonesia
One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
Alham Fikri Aji
Genta Indra Winata
Fajri Koto
Samuel Cahyawijaya
Ade Romadhony
...
David Moeljadi
Radityo Eko Prasojo
Timothy Baldwin
Jey Han Lau
Sebastian Ruder
107
106
0
24 Mar 2022
CINO: A Chinese Minority Pre-trained Language Model
CINO: A Chinese Minority Pre-trained Language Model
Ziqing Yang
Zihang Xu
Yiming Cui
Baoxin Wang
Min Lin
Dayong Wu
Zhigang Chen
86
25
0
28 Feb 2022
Which Student is Best? A Comprehensive Knowledge Distillation Exam for
  Task-Specific BERT Models
Which Student is Best? A Comprehensive Knowledge Distillation Exam for Task-Specific BERT Models
Made Nindyatama Nityasya
Haryo Akbarianto Wibowo
Rendi Chevi
Radityo Eko Prasojo
Alham Fikri Aji
78
6
0
03 Jan 2022
IndoNLI: A Natural Language Inference Dataset for Indonesian
IndoNLI: A Natural Language Inference Dataset for Indonesian
Rahmad Mahendra
Alham Fikri Aji
Samuel Louvan
Fahrurrozi Rahman
Clara Vania
70
32
0
27 Oct 2021
IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with
  Effective Domain-Specific Vocabulary Initialization
IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization
Fajri Koto
Jey Han Lau
Timothy Baldwin
VLM
121
85
0
10 Sep 2021
AMMUS : A Survey of Transformer-based Pretrained Models in Natural
  Language Processing
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
VLMLM&MA
109
270
0
12 Aug 2021
Evaluating the Efficacy of Summarization Evaluation across Languages
Evaluating the Efficacy of Summarization Evaluation across Languages
Fajri Koto
Jey Han Lau
Timothy Baldwin
112
19
0
02 Jun 2021
Discourse Probing of Pretrained Language Models
Discourse Probing of Pretrained Language Models
Fajri Koto
Jey Han Lau
Tim Baldwin
78
53
0
13 Apr 2021
Liputan6: A Large-scale Indonesian Dataset for Text Summarization
Liputan6: A Large-scale Indonesian Dataset for Text Summarization
Fajri Koto
Jey Han Lau
Timothy Baldwin
86
46
0
02 Nov 2020
PhoBERT: Pre-trained language models for Vietnamese
PhoBERT: Pre-trained language models for Vietnamese
Dat Quoc Nguyen
A. Nguyen
273
357
0
02 Mar 2020
1