ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.03300
  4. Cited By
Measuring Massive Multitask Language Understanding
v1v2v3 (latest)

Measuring Massive Multitask Language Understanding

7 September 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
    ELMRALM
ArXiv (abs)PDFHTML

Papers citing "Measuring Massive Multitask Language Understanding"

50 / 3,408 papers shown
Title
Fast and Slow Generating: An Empirical Study on Large and Small Language
  Models Collaborative Decoding
Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding
Kaiyan Zhang
Jianyu Wang
Ning Ding
Biqing Qi
Ermo Hua
Xingtai Lv
Bowen Zhou
112
9
0
18 Jun 2024
SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large
  Language Models
SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models
Somnath Banerjee
Soham Tripathy
Sayan Layek
Shanu Kumar
Animesh Mukherjee
Rima Hazra
95
7
0
18 Jun 2024
Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language
  Models
Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models
Lulu Zhao
Weihao Zeng
Xiaofeng Shi
Hua Zhou
Donglin Hao
Yonghua Lin
LM&MA
94
4
0
18 Jun 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELMLRM
140
43
0
18 Jun 2024
UBench: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions
UBench: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions
Xunzhi Wang
Zhuowei Zhang
Qiongyu Li
Gaonan Chen
Mengting Hu
Zhixin Han
Bitong Luo
Zhiyu li
Hang Gao
Mengting Hu
ELM
109
3
0
18 Jun 2024
Language Models are Surprisingly Fragile to Drug Names in Biomedical
  Benchmarks
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks
Jack Gallifant
Shan Chen
Pedro Moreira
Nikolaj Munch
Mingye Gao
Jackson Pond
Leo Anthony Celi
Hugo J. W. L. Aerts
Thomas Hartvigsen
Danielle S. Bitterman
115
13
0
17 Jun 2024
InternalInspector $I^2$: Robust Confidence Estimation in LLMs through
  Internal States
InternalInspector I2I^2I2: Robust Confidence Estimation in LLMs through Internal States
Mohammad Beigi
Ying Shen
Runing Yang
Zihao Lin
Qifan Wang
Ankith Mohan
Jianfeng He
Ming Jin
Chang-Tien Lu
Lifu Huang
HILM
83
10
0
17 Jun 2024
MedCalc-Bench: Evaluating Large Language Models for Medical Calculations
MedCalc-Bench: Evaluating Large Language Models for Medical Calculations
Nikhil Khandekar
Qiao Jin
Guangzhi Xiong
Soren Dunn
Serina S Applebaum
...
Amisha D. Dave
Andrew Taylor
Aidong Zhang
Qingyu Chen
Zhiyong Lu
LM&MAELM
129
14
0
17 Jun 2024
Self-MoE: Towards Compositional Large Language Models with
  Self-Specialized Experts
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Junmo Kang
Leonid Karlinsky
Hongyin Luo
Zhen Wang
Jacob A. Hansen
James Glass
David D. Cox
Yikang Shen
Rogerio Feris
Alan Ritter
MoMeMoE
93
11
0
17 Jun 2024
Exploring the Role of Large Language Models in Prompt Encoding for
  Diffusion Models
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
Bingqi Ma
Zhuofan Zong
Guanglu Song
Hongsheng Li
Yu Liu
88
23
0
17 Jun 2024
Safety Arithmetic: A Framework for Test-time Safety Alignment of
  Language Models by Steering Parameters and Activations
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations
Rima Hazra
Sayan Layek
Somnath Banerjee
Soujanya Poria
KELMLLMSV
79
13
0
17 Jun 2024
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective
  Unlearning in LLMs
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs
S. Kadhe
Farhan Ahmed
Dennis Wei
Nathalie Baracaldo
Inkit Padhi
MoMeMU
90
8
0
17 Jun 2024
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and
  BenchBuilder Pipeline
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Tianle Li
Wei-Lin Chiang
Evan Frick
Lisa Dunlap
Tianhao Wu
Banghua Zhu
Joseph E. Gonzalez
Ion Stoica
ALM
124
182
0
17 Jun 2024
Refusal in Language Models Is Mediated by a Single Direction
Refusal in Language Models Is Mediated by a Single Direction
Andy Arditi
Oscar Obeso
Aaquib Syed
Daniel Paleka
Nina Panickssery
Wes Gurnee
Neel Nanda
171
218
0
17 Jun 2024
Nemotron-4 340B Technical Report
Nemotron-4 340B Technical Report
Nvidia
:
Bo Adler
Niket Agarwal
Ashwath Aithal
...
Jimmy Zhang
Jing Zhang
Vivienne Zhang
Yian Zhang
Chen Zhu
128
69
0
17 Jun 2024
Meta Reasoning for Large Language Models
Meta Reasoning for Large Language Models
Peizhong Gao
Ao Xie
Shaoguang Mao
Wenshan Wu
Yan Xia
Haipeng Mi
Furu Wei
ReLMLLMAGLRM
103
10
0
17 Jun 2024
Tokenization Falling Short: The Curse of Tokenization
Tokenization Falling Short: The Curse of Tokenization
Yekun Chai
Yewei Fang
Qiwei Peng
Xuhong Li
74
0
0
17 Jun 2024
The Base-Rate Effect on LLM Benchmark Performance: Disambiguating
  Test-Taking Strategies from Benchmark Performance
The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance
Kyle Moore
Jesse Roberts
Thao Pham
Oseremhen Ewaleifoh
Doug Fisher
94
2
0
17 Jun 2024
Input Conditioned Graph Generation for Language Agents
Input Conditioned Graph Generation for Language Agents
Lukas Vierling
Jie Fu
Kai Chen
LLMAG
72
2
0
17 Jun 2024
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
  Intelligence
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
DeepSeek-AI
Qihao Zhu
Daya Guo
Zhihong Shao
Dejian Yang
...
Jiashi Li
Chenggang Zhao
Chong Ruan
Fuli Luo
Wenfeng Liang
MoELRMELMVLM
103
209
0
17 Jun 2024
Cultural Value Differences of LLMs: Prompt, Language, and Model Size
Cultural Value Differences of LLMs: Prompt, Language, and Model Size
Qishuai Zhong
Yike Yun
Aixin Sun
79
2
0
17 Jun 2024
HARE: HumAn pRiors, a key to small language model Efficiency
HARE: HumAn pRiors, a key to small language model Efficiency
Lingyun Zhang
Bin jin
Gaojian Ge
Lunhui Liu
Xuewen Shen
Mingyong Wu
Houqian Zhang
Yongneng Jiang
Shiqi Chen
Shi Pu
ALM
70
0
0
17 Jun 2024
CodeGemma: Open Code Models Based on Gemma
CodeGemma: Open Code Models Based on Gemma
CodeGemma Team
Heri Zhao
Jeffrey Hui
Joshua Howland
Nam Nguyen
...
Ale Jakse Hartman
Bin Ni
Kathy Korevec
Kelly Schaefer
Scott Huffman
VLM
120
129
0
17 Jun 2024
A Complete Survey on LLM-based AI Chatbots
A Complete Survey on LLM-based AI Chatbots
Sumit Kumar Dam
Choong Seon Hong
Yu Qiao
Chaoning Zhang
104
62
0
17 Jun 2024
Preserving Knowledge in Large Language Model with Model-Agnostic
  Self-Decompression
Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression
Zilun Zhang
Yutao Sun
Tiancheng Zhao
Leigang Sha
Ruochen Xu
Kyusong Lee
Jianwei Yin
CLLKELM
112
0
0
17 Jun 2024
Program Synthesis Benchmark for Visual Programming in XLogoOnline
  Environment
Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment
Chao Wen
Jacqueline Staub
Adish Singla
ELM
106
3
0
17 Jun 2024
Are Large Language Models True Healthcare Jacks-of-All-Trades?
  Benchmarking Across Health Professions Beyond Physician Exams
Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams
Zheheng Luo
Chenhan Yuan
Qianqian Xie
Sophia Ananiadou
ELMAI4MHLM&MA
84
0
0
17 Jun 2024
Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in
  Transformers
Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers
Qian Chen
Wen Wang
Qinglin Zhang
Siqi Zheng
Shiliang Zhang
Chong Deng
Hai Yu
Jiaqing Liu
Yukun Ma
Chong Zhang
69
1
0
17 Jun 2024
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Tong Zhu
Daize Dong
Xiaoye Qu
Jiacheng Ruan
Wenliang Chen
Yu Cheng
MoE
107
9
0
17 Jun 2024
FamiCom: Further Demystifying Prompts for Language Models with
  Task-Agnostic Performance Estimation
FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimation
Bangzheng Li
Ben Zhou
Xingyu Fu
Fei Wang
Dan Roth
Muhao Chen
86
6
0
17 Jun 2024
WeatherQA: Can Multimodal Language Models Reason about Severe Weather?
WeatherQA: Can Multimodal Language Models Reason about Severe Weather?
Chengqian Ma
Zhanxiang Hua
Alexandra Anderson-Frey
Vikram Iyer
Xin Liu
Lianhui Qin
106
6
0
17 Jun 2024
On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion
On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion
Chenghao Fan
Zhenyi Lu
Wei Wei
Jie Tian
Xiaoye Qu
Dangyang Chen
Yu Cheng
MoMe
112
6
0
17 Jun 2024
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
Zhenyi Lu
Chenghao Fan
Wei Wei
Xiaoye Qu
Dangyang Chen
Yu Cheng
MoMe
126
63
0
17 Jun 2024
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Anvesh Rao Vijjini
Somnath Basu Roy Chowdhury
Snigdha Chaturvedi
187
9
0
17 Jun 2024
The Potential and Challenges of Evaluating Attitudes, Opinions, and
  Values in Large Language Models
The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models
Bolei Ma
Xinpeng Wang
Tiancheng Hu
Anna Haensch
Michael A. Hedderich
Barbara Plank
Frauke Kreuter
ALM
103
6
0
16 Jun 2024
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness
  Evaluation in Large Language Models
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models
Yuqing Wang
Yun Zhao
LRMAAMLELM
95
2
0
16 Jun 2024
Eliminating Biased Length Reliance of Direct Preference Optimization via
  Down-Sampled KL Divergence
Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
Junru Lu
Jiazheng Li
Siyu An
Meng Zhao
Yulan He
Di Yin
Xing Sun
94
20
0
16 Jun 2024
Understanding Understanding: A Pragmatic Framework Motivated by Large
  Language Models
Understanding Understanding: A Pragmatic Framework Motivated by Large Language Models
Kevin Leyton-Brown
Y. Shoham
ELM
53
0
0
16 Jun 2024
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language
  Models
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
Zhuoran Jin
Pengfei Cao
Chenhao Wang
Zhitao He
Hongbang Yuan
Jiachun Li
Yubo Chen
Kang Liu
Jun Zhao
KELMMU
137
26
0
16 Jun 2024
On the Role of Entity and Event Level Conceptualization in Generalizable
  Reasoning: A Survey of Tasks, Methods, Applications, and Future Directions
On the Role of Entity and Event Level Conceptualization in Generalizable Reasoning: A Survey of Tasks, Methods, Applications, and Future Directions
Weiqi Wang
Tianqing Fang
Haochen Shi
Baixuan Xu
Wenxuan Ding
...
Wei Fan
Jiaxin Bai
Haoran Li
Xin Liu
Yangqiu Song
LRM
116
3
0
16 Jun 2024
A Comprehensive Survey of Scientific Large Language Models and Their
  Applications in Scientific Discovery
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
Yu Zhang
Xiusi Chen
Bowen Jin
Sheng Wang
Shuiwang Ji
Wei Wang
Jiawei Han
142
43
0
16 Jun 2024
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation
Yurun Song
Junchen Zhao
Ian G. Harris
Sangeetha Abdu Jyothi
98
5
0
16 Jun 2024
Applications of Generative AI in Healthcare: algorithmic, ethical, legal
  and societal considerations
Applications of Generative AI in Healthcare: algorithmic, ethical, legal and societal considerations
Onyekachukwu R. Okonji
Kamol Yunusov
Bonnie Gordon
MedIm
83
4
0
15 Jun 2024
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results
Reactor Mk.1 performances: MMLU, HumanEval and BBH test results
TJ Dunham
Henry Syahputra
67
1
0
15 Jun 2024
SciEx: Benchmarking Large Language Models on Scientific Exams with Human
  Expert Grading and Automatic Grading
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading
Tu Anh Dinh
Carlos Mullov
Leonard Barmann
Zhaolin Li
Danni Liu
...
Michael Beigl
Rainer Stiefelhagen
Carsten Dachsbacher
Klemens Bohm
Jan Niehues
ELM
91
12
0
14 Jun 2024
Quantifying Variance in Evaluation Benchmarks
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
106
15
0
14 Jun 2024
GenQA: Generating Millions of Instructions from a Handful of Prompts
GenQA: Generating Millions of Instructions from a Handful of Prompts
Jiuhai Chen
Rifaa Qadri
Yuxin Wen
Neel Jain
John Kirchenbauer
Dinesh Manocha
Tom Goldstein
ALM
156
24
0
14 Jun 2024
BABILong: Testing the Limits of LLMs with Long Context
  Reasoning-in-a-Haystack
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Ivan Rodkin
Dmitry Sorokin
Artyom Sorokin
Andrey Kravchenko
RALMALMLRMReLMELM
104
82
0
14 Jun 2024
Knowledge Editing in Language Models via Adapted Direct Preference
  Optimization
Knowledge Editing in Language Models via Adapted Direct Preference Optimization
Amit Rozner
Barak Battash
Lior Wolf
Ofir Lindenbaum
KELM
112
14
0
14 Jun 2024
GEB-1.3B: Open Lightweight Large Language Model
GEB-1.3B: Open Lightweight Large Language Model
Jie Wu
Yufeng Zhu
Lei Shen
Xuqing Lu
ALM
46
0
0
14 Jun 2024
Previous
123...383940...676869
Next