ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.08656
  4. Cited By
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs
v1v2 (latest)

LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs

16 August 2024
Do Xuan Long
Hai Nguyen Ngoc
Tiviatis Sim
Hieu Dao
Shafiq Joty
Kenji Kawaguchi
Nancy F. Chen
Min-Yen Kan
ArXiv (abs)PDFHTML

Papers citing "LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs"

39 / 39 papers shown
Title
AMPO: Active Multi-Preference Optimization for Self-play Preference Selection
AMPO: Active Multi-Preference Optimization for Self-play Preference Selection
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
90
0
0
25 Feb 2025
Software Performance Engineering for Foundation Model-Powered Software
  (FMware)
Software Performance Engineering for Foundation Model-Powered Software (FMware)
Haoxiang Zhang
Shi Chang
Arthur Leung
Kishanthan Thangarajah
Boyuan Chen
Hanan Lutfiyya
Ahmed E. Hassan
263
1
0
14 Nov 2024
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of
  Large Language Models
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models
Do Xuan Long
Duong Ngoc Yen
Anh Tuan Luu
Kenji Kawaguchi
Min-Yen Kan
Nancy F. Chen
KELMELMLRM
83
7
0
01 Nov 2024
Let Me Speak Freely? A Study on the Impact of Format Restrictions on
  Performance of Large Language Models
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
Zhi Rui Tam
Cheng-Kuang Wu
Yi-Lin Tsai
Chieh-Yen Lin
Hung-yi Lee
Yun-Nung Chen
49
31
0
05 Aug 2024
Exploring the Impact of the Output Format on the Evaluation of Large
  Language Models for Code Translation
Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code Translation
Marcos Macedo
Yuan Tian
F. Côgo
Bram Adams
64
17
0
25 Mar 2024
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias
  in Factual Knowledge Extraction
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction
Ziyang Xu
Keqin Peng
Liang Ding
Dacheng Tao
Xiliang Lu
61
10
0
15 Mar 2024
Using LLMs for the Extraction and Normalization of Product Attribute
  Values
Using LLMs for the Extraction and Normalization of Product Attribute Values
Alexander Brinkmann
Nick Baumann
Christian Bizer
89
11
0
04 Mar 2024
FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability
FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability
Congying Xia
Chen Xing
Jiangshu Du
Xinyi Yang
Yihao Feng
Ran Xu
Wenpeng Yin
Caiming Xiong
ALM
82
54
0
28 Feb 2024
Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration
Mitigating Biases of Large Language Models in Stance Detection with Counterfactual Augmented Calibration
Ang Li
Jingqian Zhao
Bin Liang
Lin Gui
Hui Wang
Xi Zeng
Xingwei Liang
Kam-Fai Wong
Ruifeng Xu
49
0
0
22 Feb 2024
Benchmarking Large Language Models on Controllable Generation under
  Diversified Instructions
Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
Yihan Chen
Benfeng Xu
Quan Wang
Yi Liu
Zhendong Mao
ALMELM
81
29
0
01 Jan 2024
LLMs Accelerate Annotation for Medical Information Extraction
LLMs Accelerate Annotation for Medical Information Extraction
Akshay Goel
Almog Gueta
Omry Gilon
Chang Liu
Sofia Erell
...
Shashir Reddy
Rupesh Kartha
Jean Steiner
Itay Laish
Amir Feder
89
113
0
04 Dec 2023
JarviX: A LLM No code Platform for Tabular Data Analysis and
  Optimization
JarviX: A LLM No code Platform for Tabular Data Analysis and Optimization
Shang-Ching Liu
ShengKun Wang
Wenqi Lin
Chung-Wei Hsiung
Yi-Chen Hsieh
Yu-Ping Cheng
Sian-Hong Luo
Tsungyao Chang
Jianwei Zhang
58
18
0
03 Dec 2023
Universal Self-Consistency for Large Language Model Generation
Universal Self-Consistency for Large Language Model Generation
Xinyun Chen
Renat Aksitov
Uri Alon
Jie Jessie Ren
Kefan Xiao
Pengcheng Yin
Sushant Prakash
Charles Sutton
Xuezhi Wang
Denny Zhou
LRM
78
75
0
29 Nov 2023
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Shashank Gupta
Vaishnavi Shrivastava
Ameet Deshpande
Ashwin Kalyan
Peter Clark
Ashish Sabharwal
Tushar Khot
181
122
0
08 Nov 2023
Evaluating Large Language Models: A Comprehensive Survey
Evaluating Large Language Models: A Comprehensive Survey
Zishan Guo
Renren Jin
Chuang Liu
Yufei Huang
Dan Shi
...
Linhao Yu
Yan Liu
Jiaxuan Li
Bojian Xiong
Deyi Xiong
ELMLM&MA
71
196
0
30 Oct 2023
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical
  Reasoning
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Ke Wang
Houxing Ren
Aojun Zhou
Zimu Lu
Sichun Luo
Weikang Shi
Renrui Zhang
Linqi Song
Mingjie Zhan
Hongsheng Li
ReLMLRMSyDa
99
105
0
05 Oct 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context
  Understanding
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Yushi Bai
Xin Lv
Jiajie Zhang
Hong Lyu
Jiankai Tang
...
Aohan Zeng
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
LLMAGRALM
92
600
0
28 Aug 2023
Is GPT-4 a Good Data Analyst?
Is GPT-4 a Good Data Analyst?
Liying Cheng
Xingxuan Li
Lidong Bing
LM&MAELM
83
100
0
24 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALMLM&MA
280
629
0
03 May 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,699
0
15 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.5K
13,420
0
27 Feb 2023
Leveraging Large Language Models for Multiple Choice Question Answering
Leveraging Large Language Models for Multiple Choice Question Answering
Joshua Robinson
Christopher Rytting
David Wingate
ELM
202
199
0
22 Oct 2022
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
...
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALMELMLRMReLM
266
1,131
0
17 Oct 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLMLRM
526
4,498
0
24 May 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
515
6,293
0
05 Apr 2022
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic
  Dataset for Narrative Comprehension
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension
Ying Xu
Dakuo Wang
Mo Yu
Daniel E. Ritchie
Bingsheng Yao
...
Xiaojuan Ma
Diyi Yang
Nanyun Peng
Zhou Yu
M. Warschauer
AI4Ed
68
105
0
26 Mar 2022
An Explanation of In-context Learning as Implicit Bayesian Inference
An Explanation of In-context Learning as Implicit Bayesian Inference
Sang Michael Xie
Aditi Raghunathan
Percy Liang
Tengyu Ma
ReLMBDLVPVLMLRM
208
763
0
03 Nov 2021
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
326
4,569
0
27 Oct 2021
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
149
1,938
0
08 Sep 2021
End-to-End Self-Debiasing Framework for Robust NLU Training
End-to-End Self-Debiasing Framework for Robust NLU Training
Abbas Ghaddar
Philippe Langlais
Mehdi Rezagholizadeh
Ahmad Rashid
UQCV
52
38
0
05 Sep 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
  Bias in NLP
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
311
386
0
28 Feb 2021
Mitigating Bias in Calibration Error Estimation
Mitigating Bias in Calibration Error Estimation
Rebecca Roelofs
Nicholas Cain
Jonathon Shlens
Michael C. Mozer
75
95
0
15 Dec 2020
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
184
4,553
0
07 Sep 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
853
42,332
0
28 May 2020
SciREX: A Challenge Dataset for Document-Level Information Extraction
SciREX: A Challenge Dataset for Document-Level Information Extraction
Sarthak Jain
Madeleine van Zuylen
Hannaneh Hajishirzi
Iz Beltagy
72
163
0
01 May 2020
SPECTER: Document-level Representation Learning using Citation-informed
  Transformers
SPECTER: Document-level Representation Learning using Citation-informed Transformers
Arman Cohan
Sergey Feldman
Iz Beltagy
Doug Downey
Daniel S. Weld
AI4TS
84
556
0
15 Apr 2020
The Curious Case of Neural Text Degeneration
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
199
3,210
0
22 Apr 2019
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question
  Answering
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
188
2,689
0
25 Sep 2018
SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations
  from Scientific Publications
SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications
Isabelle Augenstein
Mrinal Das
Sebastian Riedel
Lakshmi Vikraman
Andrew McCallum
71
339
0
10 Apr 2017
1