Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.05685
Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"
50 / 2,926 papers shown
Title
Cultural Learning-Based Culture Adaptation of Language Models
Chen Cecilia Liu
Anna Korhonen
Iryna Gurevych
48
0
0
03 Apr 2025
Representation Bending for Large Language Model Safety
Ashkan Yousefpour
Taeheon Kim
Ryan S. Kwon
Seungbeen Lee
Wonje Jeung
Seungju Han
Alvin Wan
Harrison Ngan
Youngjae Yu
Jonghyun Choi
AAML
ALM
KELM
59
2
0
02 Apr 2025
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Junwen Pan
Rui Zhang
Xin Wan
Yuan Zhang
Ming Lu
Qi She
VLM
46
1
0
02 Apr 2025
PaperBench: Evaluating AI's Ability to Replicate AI Research
Giulio Starace
Oliver Jaffe
Dane Sherburn
James Aung
Jun Shern Chan
...
Benjamin Kinsella
Wyatt Thompson
Johannes Heidecke
Amelia Glaese
Tejal Patwardhan
ALM
ELM
815
7
0
02 Apr 2025
LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution
Zhuoran Yang
Jie Peng
Zhen Tan
Tianlong Chen
Yanyong Zhang
AAML
44
0
0
02 Apr 2025
An Illusion of Progress? Assessing the Current State of Web Agents
Tianci Xue
Weijian Qi
Tianneng Shi
Chan Hee Song
Boyu Gou
D. Song
Huan Sun
Yu Su
LLMAG
ELM
Presented at
ResearchTrend Connect | LLMAG
on
21 May 2025
123
4
1
02 Apr 2025
Refining Interactions: Enhancing Anisotropy in Graph Neural Networks with Language Semantics
Zhaoxing Li
Xiaoming Zhang
Haifeng Zhang
Chengxiang Liu
44
0
0
02 Apr 2025
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
Jianhao Chen
Zishuo Xun
Bocheng Zhou
Han Qi
Qiaosheng Zhang
...
Wei Hu
Yuzhong Qu
W. Ouyang
Wanli Ouyang
Shuyue Hu
74
1
0
01 Apr 2025
DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism
Dengchun Li
Naizheng Wang
Zihao Zhang
Haoyang Yin
Lei Duan
Meng Xiao
Mingjie Tang
MoE
61
1
0
01 Apr 2025
Automated Factual Benchmarking for In-Car Conversational Systems using Large Language Models
Rafael Giebisch
Ken E. Friedl
Lev Sorokin
Andrea Stocco
HILM
60
0
0
01 Apr 2025
AI Judges in Design: Statistical Perspectives on Achieving Human Expert Equivalence With Vision-Language Models
Kristen M. Edwards
Farnaz Tehranchi
Scarlett R. Miller
Faez Ahmed
72
0
0
01 Apr 2025
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi
Hritik Bansal
Arian Hosseini
Aditya Grover
Kai-Wei Chang
Marcus Rohrbach
Anna Rohrbach
OffRL
LRM
50
2
0
01 Apr 2025
Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications
Hongliu Cao
Ilias Driouich
Robin Singh
Eoin Thomas
ELM
45
0
0
01 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
81
0
0
01 Apr 2025
Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-Bench
Ziyi Liu
Priyanka Dey
Zhenyu Zhao
Jen-tse Huang
Rahul Gupta
Yong-Jin Liu
Jieyu Zhao
41
0
0
01 Apr 2025
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
Junhao Cheng
Yuying Ge
Yixiao Ge
Jing Liao
Ying Shan
VGen
AI4CE
68
0
0
01 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
66
1
0
01 Apr 2025
Efficient Construction of Model Family through Progressive Training Using Model Expansion
Kazuki Yano
Sho Takase
Sosuke Kobayashi
Shun Kiyono
Jun Suzuki
58
0
0
01 Apr 2025
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language
Yoonshik Kim
Jaeyoon Jung
44
0
0
31 Mar 2025
Learning a Canonical Basis of Human Preferences from Binary Ratings
Kailas Vodrahalli
Wei Wei
James Zou
52
0
0
31 Mar 2025
Enhancing Image Resolution of Solar Magnetograms: A Latent Diffusion Model Approach
Francesco P. Ramunno
Paolo Massa
Vitaliy Kinakh
Brandon Panos
A. Csillaghy
Slava Voloshynovskiy
DiffM
58
0
0
31 Mar 2025
JudgeLRM: Large Reasoning Models as a Judge
Nuo Chen
Zhiyuan Hu
Qingyun Zou
Jiaying Wu
Qian Wang
Bryan Hooi
Bingsheng He
ReLM
ELM
LRM
75
9
0
31 Mar 2025
Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models
Youmi Ma
Sakae Mizuki
Kazuki Fujii
Taishi Nakamura
Masanari Ohi
...
Takumi Okamoto
Shigeki Ishida
Rio Yokota
Hiroya Takamura
Naoaki Okazaki
ALM
61
0
0
31 Mar 2025
Green MLOps to Green GenOps: An Empirical Study of Energy Consumption in Discriminative and Generative AI Operations
Adrián Sánchez-Mompó
Ioannis Mavromatis
Peizheng Li
Konstantinos Katsaros
Aftab Khan
48
0
0
31 Mar 2025
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development
Minghan Wang
Ye Bai
Yuanda Wang
Thuy-Trang Vu
Ehsan Shareghi
Gholamreza Haffari
57
0
0
31 Mar 2025
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs
Siqi Fan
Xiusheng Huang
Yiqun Yao
Xuezhi Fang
Kang Liu
Peng Han
Shuo Shang
Aixin Sun
Yequan Wang
LLMAG
45
1
0
30 Mar 2025
The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction
Yihuai Hong
Dian Zhou
Meng Cao
Lei Yu
Zhijing Jin
LRM
56
0
0
29 Mar 2025
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Gabriel Recchia
Chatrik Singh Mangat
Issac Li
Gayatri Krishnakumar
ALM
88
0
0
29 Mar 2025
When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
Tuo Liang
Zhe Hu
Jing Li
Hao Zhang
Yiren Lu
...
Yiran Qiao
Disheng Liu
Jeirui Peng
Jing Ma
Yu Yin
59
0
0
29 Mar 2025
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
Vivek Iyer
Ricardo Rei
Pinzhen Chen
Alexandra Birch
SyDa
LM&MA
78
0
0
29 Mar 2025
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions
Yubo Li
Yidi Miao
Xueying Ding
Ramayya Krishnan
R. Padman
49
0
0
28 Mar 2025
Probabilistic Uncertain Reward Model
Wangtao Sun
Xiang Cheng
Xing Yu
Haotian Xu
Zhao Yang
Shizhu He
Jun Zhao
Kang Liu
60
0
0
28 Mar 2025
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
Shivam Mehta
Nebojsa Jojic
Hannes Gamper
34
0
0
28 Mar 2025
Process Reward Modeling with Entropy-Driven Uncertainty
Lang Cao
Renhong Chen
Yingtian Zou
Chao Peng
Wu Ning
...
Yansen Wang
Peishuo Su
Mofan Peng
Zijie Chen
Yitong Li
44
0
0
28 Mar 2025
Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing
Johan Wahréus
Ahmed Mohamed Hussain
P. Papadimitratos
63
0
0
27 Mar 2025
Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach
Javier Coronado-Blázquez
HILM
ELM
74
0
0
27 Mar 2025
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti
Massimiliano Mancini
Enrico Fini
Yiming Wang
Paolo Rota
Elisa Ricci
VLM
Presented at
ResearchTrend Connect | VLM
on
07 May 2025
101
0
0
27 Mar 2025
Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best?
Ashish Sardana
HILM
VLM
78
0
0
27 Mar 2025
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
Zhicheng Lee
S. Cao
Jinxin Liu
Jing Zhang
Weichuan Liu
Xiaoyin Che
Lei Hou
Juanzi Li
ReLM
LRM
97
2
0
27 Mar 2025
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang
Yue Liao
Kang Rong
Fengyun Rao
Yibo Yang
Si Liu
80
0
0
26 Mar 2025
Vision as LoRA
Han Wang
Yongjie Ye
Bingru Li
Yuxiang Nie
Jinghui Lu
Jingqun Tang
Yanjie Wang
Can Huang
90
2
0
26 Mar 2025
StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Zhicheng Guo
Sijie Cheng
Yuchen Niu
Hao Wang
Sicheng Zhou
Wenbing Huang
Yang Liu
CLL
OffRL
95
0
0
26 Mar 2025
TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes
Raj Sanjay Shah
Lei Xu
Qianchu Liu
Jon Burnsky
Drew Bertagnolli
Chaitanya P. Shivade
LM&MA
101
0
0
26 Mar 2025
Beyond Intermediate States: Explaining Visual Redundancy through Language
Dingchen Yang
Bowen Cao
Anran Zhang
Weibo Gu
Winston Hu
Guang Chen
VLM
84
0
0
26 Mar 2025
Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation
Yunkai Liang
Zhangyu Chen
Pengfei Zuo
Zhi Zhou
Xu Chen
Zhou Yu
94
4
0
26 Mar 2025
Linguistic Blind Spots of Large Language Models
Jiali Cheng
Hadi Amiri
65
1
0
25 Mar 2025
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
Haoyu Fu
Diankun Zhang
Zongchuang Zhao
Jianfeng Cui
Dingkang Liang
Chong Zhang
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
53
2
0
25 Mar 2025
Test-Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards
Alexander Gambashidze
Konstantin Sobolev
Andrey Kuznetsov
Ivan Oseledets
VLM
LRM
56
0
0
25 Mar 2025
ImageSet2Text: Describing Sets of Images through Text
Piera Riccio
F. Galati
Kajetan Schweighofer
Noa Garcia
Nuria Oliver
VLM
CoGe
79
0
0
25 Mar 2025
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
Seungone Kim
Ian Wu
Jinu Lee
Xiang Yue
Seongyun Lee
...
Kiril Gashteovski
Carolin (Haas) Lawrence
J. Hockenmaier
Graham Neubig
Sean Welleck
LRM
60
2
0
25 Mar 2025
Previous
1
2
3
...
5
6
7
...
57
58
59
Next