ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18601
  4. Cited By
Flex-Judge: Think Once, Judge Anywhere

Flex-Judge: Think Once, Judge Anywhere

24 May 2025
Jongwoo Ko
S. Kim
Sungwoo Cho
Se-Young Yun
    ELM
    LRM
ArXivPDFHTML

Papers citing "Flex-Judge: Think Once, Judge Anywhere"

50 / 69 papers shown
Title
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Chenxi Whitehouse
Tianlu Wang
Ping Yu
Xian Li
Jason Weston
Ilia Kulikov
Swarnadeep Saha
ALM
ELM
LRM
51
2
0
15 May 2025
RM-R1: Reward Modeling as Reasoning
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Yu Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
270
9
0
05 May 2025
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs
Will Cai
Tianneng Shi
Xuandong Zhao
Dawn Song
45
4
0
07 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Ziwei Sun
Yang Liu
Y. Wu
OffRL
LRM
81
30
0
03 Apr 2025
JudgeLRM: Large Reasoning Models as a Judge
JudgeLRM: Large Reasoning Models as a Judge
Nuo Chen
Zhiyuan Hu
Qingyun Zou
Jiaying Wu
Qian Wang
Bryan Hooi
Bingsheng He
ReLM
ELM
LRM
96
10
0
31 Mar 2025
Qwen2.5-Omni Technical Report
Qwen2.5-Omni Technical Report
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
...
K. Dang
Bin Zhang
Xinyu Wang
Yunfei Chu
Junyang Lin
VGen
AuLLM
114
31
0
26 Mar 2025
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
140
430
0
20 Feb 2025
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Yuhui Zhang
Tao Yu
Haochen Tian
Chaoyou Fu
Peiyan Li
...
Yan Li
Di Zhang
Liang Wang
Rong Jin
Tieniu Tan
63
17
0
17 Feb 2025
Learning to Summarize from LLM-generated Feedback
Learning to Summarize from LLM-generated Feedback
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
129
8
0
28 Jan 2025
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Lei Li
Y. X. Wei
Zhihui Xie
Xuqing Yang
Yifan Song
...
Tianyu Liu
Sujian Li
Bill Yuchen Lin
Dianbo Sui
Qiang Liu
VLM
CoGe
142
28
0
26 Nov 2024
LLaMo: Large Language Model-based Molecular Graph Assistant
LLaMo: Large Language Model-based Molecular Graph Assistant
Jinyoung Park
Minseong Bae
Dohwan Ko
Hyunwoo J. Kim
60
2
0
31 Oct 2024
CodeJudge: Evaluating Code Generation with Large Language Models
CodeJudge: Evaluating Code Generation with Large Language Models
Weixi Tong
Tianyi Zhang
ELM
ALM
39
11
0
03 Oct 2024
LLaVA-Critic: Learning to Evaluate Multimodal Models
LLaVA-Critic: Learning to Evaluate Multimodal Models
Tianyi Xiong
Xinze Wang
Dong Guo
Qinghao Ye
Haoqi Fan
Quanquan Gu
Heng Huang
Chunyuan Li
MLLM
VLM
LRM
74
43
0
03 Oct 2024
Generative Reward Models
Generative Reward Models
Dakota Mahan
Duy Phung
Rafael Rafailov
Chase Blagden
Nathan Lile
Louis Castricato
Jan-Philipp Fränken
Chelsea Finn
Alon Albalak
VLM
SyDa
OffRL
37
36
0
02 Oct 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
...
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLM
LM&MA
81
6
0
25 Sep 2024
A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models
A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models
Ryandhimas E. Zezario
Sabato Marco Siniscalchi
Hsin-Min Wang
Yu Tsao
63
3
0
16 Sep 2024
Qwen2-Audio Technical Report
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLM
VLM
48
129
0
15 Jul 2024
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for
  Text-to-Image Generation?
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?
Zhaorun Chen
Yichao Du
Zichen Wen
Yiyang Zhou
Chenhang Cui
...
Jiawei Zhou
Zhuokai Zhao
Rafael Rafailov
Chelsea Finn
Huaxiu Yao
EGVM
MLLM
81
32
0
05 Jul 2024
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
A. Bavaresco
Raffaella Bernardi
Leonardo Bertolazzi
Desmond Elliott
Raquel Fernández
...
David Schlangen
Alessandro Suglia
Aditya K Surikuchi
Ece Takmaz
A. Testoni
ALM
ELM
68
69
0
26 Jun 2024
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual
  Generation
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Baiqi Li
Zhiqiu Lin
Deepak Pathak
Jiayao Li
Yixin Fei
...
Tiffany Ling
Xide Xia
Pengchuan Zhang
Graham Neubig
Deva Ramanan
EGVM
61
31
0
19 Jun 2024
MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular
  Property Prediction
MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction
Yuyan Liu
Sirui Ding
Sheng Zhou
Wenqi Fan
Qiaoyu Tan
57
9
0
18 Jun 2024
BoNBoN Alignment for Large Language Models and the Sweetness of
  Best-of-n Sampling
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling
Lin Gui
Cristina Garbacea
Victor Veitch
BDL
LM&MA
51
43
0
02 Jun 2024
A Survey of Multimodal Large Language Model from A Data-centric
  Perspective
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
73
41
0
26 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
86
409
0
23 May 2024
Prometheus 2: An Open Source Language Model Specialized in Evaluating
  Other Language Models
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Seungone Kim
Juyoung Suk
Shayne Longpre
Bill Yuchen Lin
Jamin Shin
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
MoMe
ALM
ELM
83
182
0
02 May 2024
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning
Zhaorui Yang
Tianyu Pang
Hao Feng
Han Wang
Wei Chen
Minfeng Zhu
Qian Liu
ALM
49
44
0
21 Feb 2024
LlaSMol: Advancing Large Language Models for Chemistry with a
  Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset
LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset
Botao Yu
Frazier N. Baker
Ziqi Chen
Xia Ning
Huan Sun
LM&MA
72
48
0
14 Feb 2024
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu
Jaehong Yoon
Mohit Bansal
100
5
0
08 Feb 2024
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with
  Vision-Language Benchmark
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
Dongping Chen
Ruoxi Chen
Shilin Zhang
Yinuo Liu
Yaochen Wang
Huichi Zhou
Qihui Zhang
Yao Wan
Pan Zhou
Lichao Sun
ELM
29
110
0
07 Feb 2024
Towards 3D Molecule-Text Interpretation in Language Models
Towards 3D Molecule-Text Interpretation in Language Models
Changhao Nai
Zhiyuan Liu
Yancheng Luo
Xiang Wang
Xiangnan He
Kenji Kawaguchi
Tat-Seng Chua
Qi Tian
AI4CE
52
46
0
25 Jan 2024
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained
  Evaluation
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation
Seongyun Lee
Seungone Kim
Sue Hyun Park
Geewook Kim
Minjoon Seo
MLLM
39
31
0
12 Jan 2024
Can Large Language Models Understand Molecules?
Can Large Language Models Understand Molecules?
Seyedeh Shaghayegh Sadeghi
Alan Bui
Ali Forooghi
Jianguo Lu
A. Ngom
AI4CE
30
11
0
05 Jan 2024
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR
  Understanding
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding
Senqiao Yang
Jiaming Liu
Ray Zhang
Mingjie Pan
Zoey Guo
Xiaoqi Li
Zehui Chen
Peng Gao
Yandong Guo
Shanghang Zhang
3DV
60
63
0
21 Dec 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
  Fine-grained Correctional Human Feedback
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
157
198
0
01 Dec 2023
Diffusion Model Alignment Using Direct Preference Optimization
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace
Meihua Dang
Rafael Rafailov
Linqi Zhou
Aaron Lou
Senthil Purushwalkam
Stefano Ermon
Caiming Xiong
Shafiq Joty
Nikhil Naik
EGVM
77
251
0
21 Nov 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
62
315
0
14 Nov 2023
The Impact of Large Language Models on Scientific Discovery: a
  Preliminary Study using GPT-4
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
Microsoft Research AI4Science
Microsoft Quantum
LM&MA
ELM
26
109
0
13 Nov 2023
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
82
125
0
26 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language
  Models
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALM
LM&MA
ELM
52
222
0
12 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
79
2,593
0
05 Oct 2023
Qwen Technical Report
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
153
1,709
0
28 Sep 2023
Large Language Models are Effective Text Rankers with Pairwise Ranking
  Prompting
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting
Zhen Qin
R. Jagerman
Kai Hui
Honglei Zhuang
Junru Wu
...
Tianqi Liu
Jialu Liu
Donald Metzler
Xuanhui Wang
Michael Bendersky
ALM
RALM
65
235
0
30 Jun 2023
Human Preference Score v2: A Solid Benchmark for Evaluating Human
  Preferences of Text-to-Image Synthesis
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Xiaoshi Wu
Yiming Hao
Keqiang Sun
Yixiong Chen
Feng Zhu
Rui Zhao
Hongsheng Li
69
274
0
15 Jun 2023
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for
  Large Language Models
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Yin Fang
Xiaozhuan Liang
Ningyu Zhang
Kangwei Liu
Rui Huang
Zhuo Chen
Xiaohui Fan
Huajun Chen
64
82
0
13 Jun 2023
Mind2Web: Towards a Generalist Agent for the Web
Mind2Web: Towards a Generalist Agent for the Web
Xiang Deng
Yu Gu
Boyuan Zheng
Shijie Chen
Samuel Stevens
Boshi Wang
Huan Sun
Yu-Chuan Su
LLMAG
50
431
0
09 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
220
4,085
0
09 Jun 2023
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning
  Optimization
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Yidong Wang
Zhuohao Yu
Zhengran Zeng
Linyi Yang
Cunxiang Wang
...
Jindong Wang
Xingxu Xie
Wei Ye
Shi-Bo Zhang
Yue Zhang
ALM
ELM
85
237
0
08 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
268
3,712
0
29 May 2023
Large Language Models are not Fair Evaluators
Large Language Models are not Fair Evaluators
Peiyi Wang
Lei Li
Liang Chen
Zefan Cai
Dawei Zhu
Binghuai Lin
Yunbo Cao
Qi Liu
Tianyu Liu
Zhifang Sui
ALM
87
542
0
29 May 2023
Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and
  Tie Calibration
Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration
Daniel Deutsch
George F. Foster
Markus Freitag
65
44
0
23 May 2023
12
Next