Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.05685
Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"
50 / 2,990 papers shown
Title
Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources
Joachim De Baer
A. Seza Doğruöz
T. Demeester
Chris Develder
66
0
0
25 Feb 2025
Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring
Xuansheng Wu
Padmaja Pravin Saraf
Gyeong-Geon Lee
Ehsan Latif
Ninghao Liu
Xiaoming Zhai
65
7
0
24 Feb 2025
Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders
Xuansheng Wu
Jiayi Yuan
Wenlin Yao
Xiaoming Zhai
Ninghao Liu
LLMSV
85
7
0
24 Feb 2025
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
118
2
0
24 Feb 2025
Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs
Gengyuan Zhang
Mingcong Ding
Tong Liu
Yao Zhang
Volker Tresp
115
1
0
24 Feb 2025
TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding
Zhaoxuan Wu
Zijian Zhou
Arun Verma
Alok Prakash
Daniela Rus
Bryan Kian Hsiang Low
67
0
0
24 Feb 2025
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Chenghao Fan
Zhenyi Lu
Sichen Liu
Xiaoye Qu
Xiaoye Qu
Wei Wei
Yu Cheng
MoE
281
0
0
24 Feb 2025
Grounded Persuasive Language Generation for Automated Marketing
Jibang Wu
Chenghao Yang
Simon Mahns
Chaoqi Wang
Hao Zhu
Fei Fang
Haifeng Xu
51
1
0
24 Feb 2025
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective
Simon Geisler
Tom Wollschlager
M. H. I. Abdalla
Vincent Cohen-Addad
Johannes Gasteiger
Stephan Günnemann
AAML
88
2
0
24 Feb 2025
REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction
Omar Sharif
Joseph Gatto
Madhusudan Basak
S. Preum
73
0
0
24 Feb 2025
Mitigating Bias in RAG: Controlling the Embedder
Taeyoun Kim
Jacob Mitchell Springer
Aditi Raghunathan
Maarten Sap
63
1
0
24 Feb 2025
Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following
Jie Zeng
Qianyu He
Qingyu Ren
Jiaqing Liang
Yanghua Xiao
Weikang Zhou
Zeye Sun
Fei Yu
90
1
0
24 Feb 2025
From Documents to Dialogue: Building KG-RAG Enhanced AI Assistants
Manisha Mukherjee
Sungchul Kim
Xiang Chen
Dan Luo
Tong Yu
Tung Mai
RALM
52
1
0
24 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
75
0
0
24 Feb 2025
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
67
0
0
24 Feb 2025
RLTHF: Targeted Human Feedback for LLM Alignment
Yifei Xu
Tusher Chakraborty
Emre Kıcıman
Bibek Aryal
Eduardo Rodrigues
...
Rafael Padilha
Leonardo Nunes
Shobana Balakrishnan
Songwu Lu
Ranveer Chandra
126
1
0
24 Feb 2025
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models
Yupeng Chang
Yi-Ju Chang
Yuan Wu
AI4CE
ALM
112
0
0
24 Feb 2025
DReSD: Dense Retrieval for Speculative Decoding
Milan Gritta
Huiyin Xue
Gerasimos Lampouras
RALM
105
0
0
24 Feb 2025
NUTSHELL: A Dataset for Abstract Generation from Scientific Talks
Maike Züfle
Sara Papi
Beatrice Savoldi
Marco Gaido
L. Bentivogli
Jan Niehues
43
1
0
24 Feb 2025
Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews
Mengqiao Liu
Tevin Wang
Cassandra A. Cohen
Sarah Li
Chenyan Xiong
LRM
79
0
0
24 Feb 2025
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
Yuheng Zhang
Dian Yu
Tao Ge
Linfeng Song
Zhichen Zeng
Haitao Mi
Nan Jiang
Dong Yu
76
1
0
24 Feb 2025
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance
Chenghua Huang
Lu Wang
Fangkai Yang
Pu Zhao
Zechao Li
Qingwei Lin
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
OffRL
57
1
0
24 Feb 2025
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
Yepeng Weng
Dianwen Mei
Huishi Qiu
Xujie Chen
Li Liu
Jiang Tian
Zhongchao Shi
68
0
0
24 Feb 2025
Hallucination Detection in LLMs Using Spectral Features of Attention Maps
Jakub Binkowski
Denis Janiak
Albert Sawczyn
Bogdan Gabrys
Tomasz Kajdanowicz
84
0
0
24 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
79
8
0
24 Feb 2025
Streaming Looking Ahead with Token-level Self-reward
Han Zhang
Ruixin Hong
Dong Yu
49
1
0
24 Feb 2025
Uncovering the Hidden Threat of Text Watermarking from Users with Cross-Lingual Knowledge
Mansour Al Ghanim
Jiaqi Xue
Rochana Prih Hastuti
Mengxin Zheng
Yan Solihin
Qian Lou
WaLM
76
0
0
23 Feb 2025
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
Yin Wu
Quanyu Long
Jing Li
Jianfei Yu
Wenya Wang
VLM
60
2
0
23 Feb 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
69
0
0
23 Feb 2025
DISC: Dynamic Decomposition Improves LLM Inference Scaling
Jonathan Light
Wei Cheng
Wu Yue
Masafumi Oyamada
Mengdi Wang
Santiago Paternain
Haifeng Chen
ReLM
LRM
71
2
0
23 Feb 2025
Speed and Conversational Large Language Models: Not All Is About Tokens per Second
Javier Conde
Miguel González
Pedro Reviriego
Zhen Gao
Shanshan Liu
Fabrizio Lombardi
40
3
0
23 Feb 2025
RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis
Jianwei Wang
Junyao Yang
Haoran Li
Huiping Zhuang
Cen Chen
Huiping Zhuang
SyDa
68
0
0
23 Feb 2025
Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs
Jonathan Rystrøm
Hannah Rose Kirk
Scott A. Hale
53
4
0
23 Feb 2025
Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming
Rui Li
Peiyi Wang
Jingyuan Ma
Di Zhang
Lei Sha
Zhifang Sui
LLMAG
65
0
0
22 Feb 2025
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Patrick Tser Jern Kon
Jiachen Liu
Qiuyi Ding
Yiming Qiu
Zhenning Yang
Yibo Huang
Jayanth Srinivasa
Myungjin Lee
Mosharaf Chowdhury
Ang Chen
56
3
0
22 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
277
1
0
22 Feb 2025
C-3DPO: Constrained Controlled Classification for Direct Preference Optimization
Kavosh Asadi
Julien Han
Xingzi Xu
Dominique Perrault-Joncas
Shoham Sabach
Karim Bouyarmane
Mohammad Ghavamzadeh
38
0
0
22 Feb 2025
LegalBench.PT: A Benchmark for Portuguese Law
Beatriz Canaverde
Telmo Pessoa Pires
Leonor Melo Ribeiro
Andre F. T. Martins
AILaw
ELM
68
0
0
22 Feb 2025
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao
Yige Yuan
Ziyang Chen
Mingxiao Li
Shangsong Liang
Zhaochun Ren
V. Honavar
125
6
0
21 Feb 2025
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Minzhi Li
William B. Held
Michael Joseph Ryan
Kunat Pipatanakul
Potsawee Manakul
Hao Zhu
Diyi Yang
AuLLM
ALM
65
0
0
21 Feb 2025
Federated Fine-Tuning of Large Language Models: Kahneman-Tversky vs. Direct Preference Optimization
Fernando Spadea
Oshani Seneviratne
58
0
0
21 Feb 2025
SafeInt: Shielding Large Language Models from Jailbreak Attacks via Safety-Aware Representation Intervention
Jiaqi Wu
Chen Chen
Chunyan Hou
Xiaojie Yuan
AAML
64
0
0
21 Feb 2025
BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in Alignment
Sizhe Wang
Yongqi Tong
Hengyuan Zhang
Dawei Li
Xin Zhang
Tianlong Chen
118
5
0
21 Feb 2025
Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning
Xuyang Wu
Jinming Nian
Zhiqiang Tao
Zhiqiang Tao
Hsin-Tai Wu
Yi Fang
LRM
74
0
0
21 Feb 2025
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Shuo Tang
Xianghe Pang
Zexi Liu
Bohan Tang
Guangyi Liu
Xiaowen Dong
Yanjie Wang
Yanfeng Wang
Tian Jin
SyDa
LLMAG
135
5
0
21 Feb 2025
Pub-Guard-LLM: Detecting Fraudulent Biomedical Articles with Reliable Explanations
Lihu Chen
Shuojie Fu
Gabriel Freedman
Cemre Zor
Guy Martin
James Kinross
Uddhav Vaghela
Ovidiu Serban
Francesca Toni
DeLMO
84
0
0
21 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRM
AI4CE
ReLM
KELM
276
2
0
21 Feb 2025
Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study
Wenwen Xie
Gray Gwizdz
Dongji Feng
92
0
0
20 Feb 2025
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Michael Luo
Xiaoxiang Shi
Colin Cai
Tianjun Zhang
Justin Wong
...
Chi Wang
Yanping Huang
Zhifeng Chen
Joseph E. Gonzalez
Ion Stoica
68
3
0
20 Feb 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
78
24
0
20 Feb 2025
Previous
1
2
3
...
11
12
13
...
58
59
60
Next