ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11141
  4. Cited By
Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans
v1v2 (latest)

Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans

16 May 2025
Yansheng Qiu
Li Xiao
Zhaopan Xu
Pengfei Zhou
Zheng Wang
Kai Zhang
    ELMLRM
ArXiv (abs)PDFHTML

Papers citing "Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans"

50 / 51 papers shown
Title
Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction
Diffusion with a Linguistic Compass: Steering the Generation of Clinically Plausible Future sMRI Representations for Early MCI Conversion Prediction
Zhihao Tang
Chaozhuo Li
Litian Zhang
Xi Zhang
DiffMMedIm
29
9
0
05 Jun 2025
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Weiye Xu
Jun Wang
Weiyun Wang
Zhe Chen
Wengang Zhou
...
Xiaohua Wang
Xizhou Zhu
Wenhai Wang
Jifeng Dai
Jinguo Zhu
VLMLRM
170
7
0
21 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLMVLM
205
132
1
14 Apr 2025
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Wasi Uddin Ahmad
Mehrzad Samadi
Somshubra Majumdar
Aleksander Ficek
Siddhartha Jain
Jocelyn Huang
Vahid Noroozi
Boris Ginsburg
LRM
139
13
0
02 Apr 2025
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
Zhiyuan Liu
Yuting Zhang
Feng Liu
Changwang Zhang
Ying Sun
Jun Wang
LRM
131
12
0
20 Mar 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Yi Yang
Xiaoxuan He
Hongkun Pan
Xiyan Jiang
Yan Deng
...
Dacheng Yin
Fengyun Rao
Minfeng Zhu
Bo Zhang
Wei Chen
VLMLRM
130
100
1
13 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
Lawrence Yunliang Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
130
39
0
13 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRMReLM
207
89
0
10 Mar 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu
Zeyi Sun
Yuhang Zang
Xiaoyi Dong
Yuhang Cao
Haodong Duan
Dahua Lin
Jiaqi Wang
ObjDVLMLRM
137
129
0
03 Mar 2025
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
354
699
0
20 Feb 2025
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
Huanqia Cai
Yijun Yang
Winston Hu
LRM
57
8
0
02 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
380
2,000
0
22 Jan 2025
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
129
92
1
15 Nov 2024
Harnessing Webpage UIs for Text-Rich Visual Understanding
Harnessing Webpage UIs for Text-Rich Visual Understanding
Junpeng Liu
Tianyue Ou
Yifan Song
Yuxiao Qu
Wai Lam
Chenyan Xiong
Wenhu Chen
Graham Neubig
Xiang Yue
121
10
0
17 Oct 2024
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language
  Models
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models
Shuai Peng
Di Fu
Liangcai Gao
Xiuqin Zhong
Hongguang Fu
Zhi Tang
MLLMVLMLRM
49
21
0
30 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLMSyDaVLM
126
867
0
06 Aug 2024
MMIU: Multimodal Multi-image Understanding for Evaluating Large
  Vision-Language Models
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Fanqing Meng
Jun Wang
Chuanhao Li
Quanfeng Lu
Hao Tian
...
Jifeng Dai
Ping Luo
Ping Luo
Kaipeng Zhang
Wenqi Shao
VLM
91
26
0
05 Aug 2024
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large
  Language Models
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
Wenhao Shi
Zhiqiang Hu
Yi Bin
Junhua Liu
Yang Yang
See-Kiong Ng
Lidong Bing
Roy Ka-Wei Lee
SyDaMLLMLRM
95
62
0
25 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DVMLLM
148
378
0
24 Jun 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
129
72
0
29 Apr 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLMVLM
133
642
0
25 Apr 2024
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical
  Vision-Language Models
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models
Songtao Jiang
Tuo Zheng
Yan Zhang
Yeying Jin
Li Yuan
Zuozhu Liu
MoE
118
22
0
16 Apr 2024
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models
  with a Self-Critique Pipeline
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Yifan Xu
Xiao Liu
Xinghan Liu
Zhenyu Hou
Yueyan Li
...
Aohan Zeng
Zhengxiao Du
Wenyi Zhao
Jie Tang
Yuxiao Dong
LRM
88
42
0
03 Apr 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual
  Math Problems?
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang
Dongzhi Jiang
Yichi Zhang
Haokun Lin
Ziyu Guo
...
Aojun Zhou
Pan Lu
Kai-Wei Chang
Peng Gao
Hongsheng Li
80
253
0
21 Mar 2024
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for
  Evaluating Vision Language Models
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
Rocktim Jyoti Das
Simeon Emilov Hristov
Haonan Li
Dimitar Iliyanov Dimitrov
Ivan Koychev
Preslav Nakov
CoGeELM
114
17
0
15 Mar 2024
Quiet-STaR: Language Models Can Teach Themselves to Think Before
  Speaking
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
E. Zelikman
Georges Harik
Yijia Shao
Varuna Jayasiri
Nick Haber
Noah D. Goodman
LLMAGReLMLRM
129
151
0
14 Mar 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models
  (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
89
87
0
10 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLMMLLM
262
1,216
0
21 Dec 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLMELMVLM
270
960
0
27 Nov 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of
  Vision-Language Models
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Liu
EgoVLRM
97
20
0
27 Nov 2023
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large
  Language Models
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models
Fangzhi Xu
Zhiyong Wu
Qiushi Sun
Siyu Ren
Fei Yuan
Shuai Yuan
Qika Lin
Yu Qiao
Jun Liu
LLMAG
86
37
0
15 Nov 2023
Language Models can be Logical Solvers
Language Models can be Logical Solvers
Jiazhan Feng
Ruochen Xu
Junheng Hao
Hiteshi Sharma
Yelong Shen
Dongyan Zhao
Weizhu Chen
ReLMLRMELM
95
27
0
10 Nov 2023
MathVista: Evaluating Mathematical Reasoning of Foundation Models in
  Visual Contexts
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRMMLLM
128
665
0
03 Oct 2023
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language
  Understanding
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
K. Mangalam
Raiymbek Akshulakov
Jitendra Malik
119
310
0
17 Aug 2023
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code
  Generation
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation
Dong Huang
Qi Bu
Yuhao Qing
Heming Cui
LRM
77
19
0
17 Aug 2023
AgentBench: Evaluating LLMs as Agents
AgentBench: Evaluating LLMs as Agents
Xiao Liu
Hao Yu
Hanchen Zhang
Yifan Xu
Xuanyu Lei
...
Yu-Chuan Su
Huan Sun
Minlie Huang
Yuxiao Dong
Jie Tang
ELMLLMAG
142
315
0
07 Aug 2023
Med-Flamingo: a Multimodal Medical Few-shot Learner
Med-Flamingo: a Multimodal Medical Few-shot Learner
Michael Moor
Qian Huang
Shirley Wu
Michihiro Yasunaga
C. Zakka
Yashodhara Dalmia
E. Reis
Pranav Rajpurkar
J. Leskovec
LM&MAMedIm
85
272
0
27 Jul 2023
MMBench: Is Your Multi-modal Model an All-around Player?
MMBench: Is Your Multi-modal Model an All-around Player?
Yuanzhan Liu
Haodong Duan
Yuanhan Zhang
Yue Liu
Songyang Zhang
...
Jiaqi Wang
Conghui He
Ziwei Liu
Kai-xiang Chen
Dahua Lin
133
1,059
0
12 Jul 2023
LogiCoT: Logical Chain-of-Thought Instruction-Tuning
LogiCoT: Logical Chain-of-Thought Instruction-Tuning
Hanmeng Liu
Zhiyang Teng
Leyang Cui
Chaoli Zhang
Qiji Zhou
Yue Zhang
LRM
62
26
0
20 May 2023
Structured Chain-of-Thought Prompting for Code Generation
Structured Chain-of-Thought Prompting for Code Generation
Jia Li
Ge Li
Yongming Li
Zhi Jin
LRM
94
135
0
11 May 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
165
2,069
0
20 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
571
4,925
0
17 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
432
4,656
0
30 Jan 2023
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
418
3,610
0
29 Apr 2022
ChartQA: A Benchmark for Question Answering about Charts with Visual and
  Logical Reasoning
ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
Ahmed Masry
Do Xuan Long
J. Tan
Shafiq Joty
Enamul Hoque
AIMat
134
685
0
19 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
557
4,421
0
28 Jan 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
855
9,714
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
362
4,598
0
27 Oct 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
684
41,563
0
22 Oct 2020
DocVQA: A Dataset for VQA on Document Images
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
151
747
0
01 Jul 2020
12
Next