ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.09212
  4. Cited By
CMMLU: Measuring massive multitask language understanding in Chinese
v1v2 (latest)

CMMLU: Measuring massive multitask language understanding in Chinese

15 June 2023
Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
    ALMELM
ArXiv (abs)PDFHTML

Papers citing "CMMLU: Measuring massive multitask language understanding in Chinese"

16 / 66 papers shown
Title
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
149
1,942
0
08 Sep 2021
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELMAIMatReCodALM
216
2,004
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
236
5,665
0
07 Jul 2021
Understanding by Understanding Not: Modeling Negation in Language Models
Understanding by Understanding Not: Modeling Negation in Language Models
Arian Hosseini
Siva Reddy
Dzmitry Bahdanau
R. Devon Hjelm
Alessandro Sordoni
Rameswar Panda
89
90
0
07 May 2021
GLM: General Language Model Pretraining with Autoregressive Blank
  Infilling
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Zhengxiao Du
Yujie Qian
Xiao Liu
Ming Ding
J. Qiu
Zhilin Yang
Jie Tang
BDLAI4CE
142
1,554
0
18 Mar 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
183
2,405
0
05 Mar 2021
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
184
4,572
0
07 Sep 2020
CLUE: A Chinese Language Understanding Evaluation Benchmark
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu
Hai Hu
Xuanwei Zhang
Lu Li
Chenjie Cao
...
Cong Yue
Xinrui Zhang
Zhen-Yi Yang
Kyle Richardson
Zhenzhong Lan
ELM
90
386
0
13 Apr 2020
Negated and Misprimed Probes for Pretrained Language Models: Birds Can
  Talk, But Cannot Fly
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly
Nora Kassner
Hinrich Schütze
68
324
0
08 Nov 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
182
2,532
0
19 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language
  Understanding Systems
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
274
2,323
0
02 May 2019
CommonsenseQA: A Question Answering Challenge Targeting Commonsense
  Knowledge
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Alon Talmor
Jonathan Herzig
Nicholas Lourie
Jonathan Berant
RALM
144
1,752
0
02 Nov 2018
Know What You Don't Know: Unanswerable Questions for SQuAD
Know What You Don't Know: Unanswerable Questions for SQuAD
Pranav Rajpurkar
Robin Jia
Percy Liang
RALMELM
292
2,853
0
11 Jun 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,200
0
20 Apr 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning
  Challenge
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELMRALMLRM
172
2,670
0
14 Mar 2018
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
353
3,553
0
10 Jun 2015
Previous
12