Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.18798
Cited By
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions
26 February 2025
Gyeongje Cho
Yeonkyoung So
Jaejin Lee
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions"
21 / 21 papers shown
Title
Lessons from the Trenches on Reproducible Evaluation of Language Models
Stella Biderman
Hailey Schoelkopf
Lintang Sutawika
Leo Gao
J. Tow
...
Xiangru Tang
Kevin A. Wang
Genta Indra Winata
Franccois Yvon
Andy Zou
ELM
ALM
148
63
3
23 May 2024
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question?
Nishant Balepur
Abhilasha Ravichander
Rachel Rudinger
ELM
81
26
0
19 Feb 2024
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALM
LRM
123
387
0
04 Jan 2024
The language of prompting: What linguistic properties make a prompt successful?
Alina Leidinger
R. Rooij
Ekaterina Shutova
51
45
0
03 Nov 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
313
3,895
0
29 May 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.0K
14,179
0
15 Mar 2023
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLM
OSLM
AI4CE
292
3,634
0
02 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
754
12,835
0
04 Mar 2022
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
114
773
0
01 Dec 2021
Do Prompt-Based Models Really Understand the Meaning of their Prompts?
Albert Webson
Ellie Pavlick
LRM
101
368
0
02 Sep 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
201
5,454
0
07 Jul 2021
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
157
4,377
0
07 Sep 2020
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning
Jian Liu
Leyang Cui
Hanmeng Liu
Dandan Huang
Yile Wang
Yue Zhang
RALM
76
365
0
16 Jul 2020
PIQA: Reasoning about Physical Commonsense in Natural Language
Yonatan Bisk
Rowan Zellers
Ronan Le Bras
Jianfeng Gao
Yejin Choi
OOD
LRM
116
1,776
0
26 Nov 2019
Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference
Yonatan Belinkov
Adam Poliak
Stuart M. Shieber
Benjamin Van Durme
Alexander M. Rush
51
95
0
09 Jul 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
145
2,446
0
19 May 2019
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
227
579
0
02 May 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELM
RALM
LRM
134
2,567
0
14 Mar 2018
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
134
1,175
0
06 Mar 2018
Crowdsourcing Multiple Choice Science Questions
Johannes Welbl
Nelson F. Liu
Matt Gardner
AI4Ed
60
497
0
19 Jul 2017
RACE: Large-scale ReAding Comprehension Dataset From Examinations
Guokun Lai
Qizhe Xie
Hanxiao Liu
Yiming Yang
Eduard H. Hovy
ELM
162
1,343
0
15 Apr 2017
1