ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.03882
  4. Cited By
Large Language Models Are Not Robust Multiple Choice Selectors

Large Language Models Are Not Robust Multiple Choice Selectors

7 September 2023
Chujie Zheng
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
ArXivPDFHTML

Papers citing "Large Language Models Are Not Robust Multiple Choice Selectors"

50 / 54 papers shown
Title
What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction
What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction
Eitan Wagner
Omri Abend
46
0
0
04 May 2025
Systematic Bias in Large Language Models: Discrepant Response Patterns in Binary vs. Continuous Judgment Tasks
Systematic Bias in Large Language Models: Discrepant Response Patterns in Binary vs. Continuous Judgment Tasks
Yi-Long Lu
C. Zhang
Wei Wang
39
0
0
28 Apr 2025
When2Call: When (not) to Call Tools
When2Call: When (not) to Call Tools
Hayley Ross
Ameya Sunil Mahabaleshwarkar
Yoshi Suhara
101
0
0
26 Apr 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
68
0
0
26 Apr 2025
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography
I-Sheng Fang
Jun-Cheng Chen
LRM
VLM
32
0
0
14 Apr 2025
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Ram Mohan Rao Kadiyala
Siddartha Pullakhandam
Siddhant Gupta
Drishti Sharma
Jebish Purbey
Kanwal Mehreen
Muhammad Arham
Hamza Farooq
38
0
0
13 Apr 2025
Large Language Models Could Be Rote Learners
Large Language Models Could Be Rote Learners
Yuyang Xu
Renjun Hu
Haochao Ying
Jian Wu
Xing Shi
Wei Lin
ELM
214
0
0
11 Apr 2025
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning
Jie Ma
Zhitao Gao
Qi Chai
Jing Liu
Peijie Wang
Jing Tao
Zhou Su
73
1
0
01 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Ziyi Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
95
4
0
01 Apr 2025
Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models
Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models
Mats Faulborn
Indira Sen
Max Pellert
Andreas Spitz
David Garcia
ELM
50
0
0
20 Mar 2025
Teaching LLMs How to Learn with Contextual Fine-Tuning
Younwoo Choi
Muhammad Adil Asif
Ziwen Han
John Willes
Rahul G. Krishnan
LRM
41
0
0
12 Mar 2025
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval
Yu Zhang
Shutong Qiao
Jiaqi Zhang
Tzu-Heng Lin
Chen Gao
Yongqian Li
LM&Ro
LM&MA
98
1
0
07 Mar 2025
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions
Yizhe Zhang
Richard He Bai
Zijin Gu
Ruixiang Zhang
Jiatao Gu
Emmanuel Abbe
Samy Bengio
Navdeep Jaitly
LRM
BDL
72
1
0
25 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
241
1
0
22 Feb 2025
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance
Guangxiang Zhao
Saier Hu
Xiaoqi Jian
Jinzhu Wu
Yuhan Wu
Change Jia
Lin Sun
Xiangzheng Zhang
99
0
0
18 Feb 2025
Savaal: Scalable Concept-Driven Question Generation to Enhance Human Learning
Savaal: Scalable Concept-Driven Question Generation to Enhance Human Learning
Kimia Noorbakhsh
Joseph Chandler
Pantea Karimi
M. Alizadeh
H. Balakrishnan
LRM
51
1
0
18 Feb 2025
Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models
Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models
Prateek Chhikara
54
1
0
16 Feb 2025
Aligning Black-box Language Models with Human Judgments
Aligning Black-box Language Models with Human Judgments
Gerrit J. J. van den Burg
Gen Suzuki
Wei Liu
Murat Sensoy
ALM
84
0
0
07 Feb 2025
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs
Bryan Guan
Tanya Roosta
Peyman Passban
Mehdi Rezagholizadeh
102
0
0
06 Feb 2025
Option-ID Based Elimination For Multiple Choice Questions
Option-ID Based Elimination For Multiple Choice Questions
Zhenhao Zhu
Bulou Liu
Qingyao Ai
Yong-Jin Liu
54
0
0
25 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
141
72
0
20 Jan 2025
M$^3$oralBench: A MultiModal Moral Benchmark for LVLMs
M3^33oralBench: A MultiModal Moral Benchmark for LVLMs
Bei Yan
Jie M. Zhang
Zhiyuan Chen
Shiguang Shan
Xilin Chen
ELM
54
1
0
31 Dec 2024
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation
Leonidas Zotos
H. Rijn
Malvina Nissim
85
0
0
16 Dec 2024
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
121
1
0
26 Oct 2024
Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context
Unleashing Multi-Hop Reasoning Potential in Large Language Models through Repetition of Misordered Context
Sangwon Yu
Ik-hwan Kim
Jongyoon Song
Saehyung Lee
Junsung Park
Sungroh Yoon
LRM
72
0
0
09 Oct 2024
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Pratiksha Thaker
Shengyuan Hu
Neil Kale
Yash Maurya
Zhiwei Steven Wu
Virginia Smith
MU
53
12
0
03 Oct 2024
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations
Yuchen Fan
Xin Zhong
Heng Zhou
Yuchen Zhang
Mingyu Liang
Chengxing Xie
Ermo Hua
Ning Ding
Bowen Zhou
ALM
ELM
31
0
0
02 Oct 2024
Mitigating Selection Bias with Node Pruning and Auxiliary Options
Mitigating Selection Bias with Node Pruning and Auxiliary Options
Hyeong Kyu Choi
Weijie Xu
Chi Xue
Stephanie Eckman
Chandan K. Reddy
42
1
0
27 Sep 2024
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option Shuffling
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option Shuffling
Blessed Guda
Gabrial Zencha A.
Lawrence Francis
Carlee Joe-Wong
33
1
0
21 Sep 2024
How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments
How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments
Yusuke Ide
Yuto Nishida
Miyu Oba
Miyu Oba
Justin Vasselli
Hidetaka Kamigaito
Taro Watanabe
46
2
0
19 Aug 2024
Forecasting Live Chat Intent from Browsing History
Forecasting Live Chat Intent from Browsing History
Se-eun Yoon
Ahmad Bin Rabiah
Zaid Alibadi
Surya Kallumadi
Julian McAuley
AI4TS
39
0
0
07 Aug 2024
Eliminating Position Bias of Language Models: A Mechanistic Approach
Eliminating Position Bias of Language Models: A Mechanistic Approach
Ziqi Wang
Hanlin Zhang
Xiner Li
Kuan-Hao Huang
Chi Han
Shuiwang Ji
Sham Kakade
Hao Peng
Heng Ji
72
12
0
01 Jul 2024
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Lynn Chua
Badih Ghazi
Yangsibo Huang
Pritish Kamath
Ravi Kumar
Pasin Manurangsi
Amer Sinha
Chulin Xie
Chiyuan Zhang
69
1
0
23 Jun 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
58
13
0
13 Jun 2024
OLMES: A Standard for Language Model Evaluations
OLMES: A Standard for Language Model Evaluations
Yuling Gu
Oyvind Tafjord
Bailey Kuehl
Dany Haddad
Jesse Dodge
Hannaneh Hajishirzi
ELM
45
14
0
12 Jun 2024
CVQA: Culturally-diverse Multilingual Visual Question Answering
  Benchmark
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David Romero
Chenyang Lyu
Haryo Akbarianto Wibowo
Teresa Lynn
Injy Hamed
...
Oana Ignat
Joan Nwatu
Rada Mihalcea
Thamar Solorio
Alham Fikri Aji
48
26
0
10 Jun 2024
Discovering Bias in Latent Space: An Unsupervised Debiasing Approach
Discovering Bias in Latent Space: An Unsupervised Debiasing Approach
Dyah Adila
Shuai Zhang
Boran Han
Yuyang Wang
AAML
LLMSV
36
6
0
05 Jun 2024
Unveiling Selection Biases: Exploring Order and Token Sensitivity in
  Large Language Models
Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models
Sheng-Lun Wei
Cheng-Kuang Wu
Hen-Hsen Huang
Hsin-Hsi Chen
42
11
0
05 Jun 2024
Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization
Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization
Yuchi Liu
Jaskirat Singh
Gaowen Liu
Ali Payani
Liang Zheng
LLMAG
82
4
0
30 May 2024
On Fairness of Low-Rank Adaptation of Large Models
On Fairness of Low-Rank Adaptation of Large Models
Zhoujie Ding
Ken Ziyu Liu
Pura Peetathawatchai
Berivan Isik
Sanmi Koyejo
50
4
0
27 May 2024
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single
  Process
Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
Ermo Hua
Biqing Qi
Kaiyan Zhang
Yue Yu
Ning Ding
Xingtai Lv
Kai Tian
Bowen Zhou
43
3
0
20 May 2024
Listen Again and Choose the Right Answer: A New Paradigm for Automatic
  Speech Recognition with Large Language Models
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Yuchen Hu
Chen Chen
Chengwei Qin
Qiushi Zhu
Eng Siong Chng
Ruizhe Li
AuLLM
KELM
56
5
0
16 May 2024
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Jie Ma
Min Hu
Pinghui Wang
Wangchun Sun
Lingyun Song
Hongbin Pei
Jun Liu
Youtian Du
50
4
0
18 Apr 2024
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation
  of Large Language Models
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Zhengran Zeng
Wei Ye
Jindong Wang
Yue Zhang
Shikun Zhang
46
1
0
09 Apr 2024
Enhancing Event Causality Identification with Rationale and
  Structure-Aware Causal Question Answering
Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering
Baiyan Zhang
Qin Chen
Jie Zhou
Jian Jin
Liang He
21
3
0
17 Mar 2024
Defending Jailbreak Prompts via In-Context Adversarial Game
Defending Jailbreak Prompts via In-Context Adversarial Game
Yujun Zhou
Yufei Han
Haomin Zhuang
Kehan Guo
Zhenwen Liang
Hongyan Bao
Xiangliang Zhang
LLMAG
AAML
42
12
0
20 Feb 2024
EmoBench: Evaluating the Emotional Intelligence of Large Language Models
EmoBench: Evaluating the Emotional Intelligence of Large Language Models
Sahand Sabour
Siyang Liu
Zheyuan Zhang
June M. Liu
Jinfeng Zhou
Alvionna S. Sunaryo
Juanzi Li
Tatia M.C. Lee
Rada Mihalcea
Minlie Huang
37
12
0
19 Feb 2024
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in
  Closed-Source LLMs
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Simone Balloccu
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
SILM
ELM
PILM
35
159
0
06 Feb 2024
JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
Chang Gao
Wenxuan Zhang
Guizhen Chen
Wai Lam
58
5
0
04 Oct 2023
Leveraging Large Language Models for Multiple Choice Question Answering
Leveraging Large Language Models for Multiple Choice Question Answering
Joshua Robinson
Christopher Rytting
David Wingate
ELM
148
187
0
22 Oct 2022
12
Next