ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.08862
  4. Cited By
Visual Agents as Fast and Slow Thinkers
v1v2v3v4 (latest)

Visual Agents as Fast and Slow Thinkers

16 August 2024
Guangyan Sun
Mingyu Jin
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
    LLMAGLRM
ArXiv (abs)PDFHTMLGithub (24★)

Papers citing "Visual Agents as Fast and Slow Thinkers"

50 / 120 papers shown
Title
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
Xiaoqiang Wang
Suyuchen Wang
Yun Zhu
Bang Liu
ReLMLRM
91
0
0
25 May 2025
DSADF: Thinking Fast and Slow for Decision Making
DSADF: Thinking Fast and Slow for Decision Making
Alex Zhihao Dou
Dongfei Cui
Jun Yan
Wei Wang
Benteng Chen
Haoming Wang
Zeke Xie
Shufei Zhang
OffRL
127
1
0
13 May 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLMOffRLLRM
177
47
0
27 Mar 2025
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang
Runnan Chen
Ziwen Li
Zhengqing Gao
Xiao He
Yandong Guo
Mingming Gong
Tongliang Liu
LRM
95
1
0
23 Mar 2025
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Zhiyu Lin
Yifei Gao
Xian Zhao
Yunfan Yang
Jitao Sang
LRM
134
5
0
23 Mar 2025
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Wei Li
Bing Hu
Rui Shao
Leyang Shen
Liqiang Nie
94
4
0
05 Mar 2025
MLLM-as-a-Judge for Image Safety without Human Labeling
MLLM-as-a-Judge for Image Safety without Human Labeling
Zhenting Wang
Shuming Hu
Shiyu Zhao
Xiaowen Lin
F. Xu
...
Nan Jiang
Lingjuan Lyu
Shiqing Ma
Dimitris N. Metaxas
Ankit Jain
349
5
0
31 Dec 2024
M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
M2^22PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
Taowen Wang
Yiyang Liu
James Liang
Junhan Zhao
Yiming Cui
...
Zenglin Xu
Cheng Han
Lifu Huang
Qifan Wang
Dongfang Liu
MLLMVLMLRM
78
17
0
24 Sep 2024
SURf: Teaching Large Vision-Language Models to Selectively Utilize
  Retrieved Information
SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information
Jiashuo Sun
Jihai Zhang
Yucheng Zhou
Zhaochen Su
Xiaoye Qu
Yu Cheng
70
13
0
21 Sep 2024
Hallucination Detection: Robustly Discerning Reliable Answers in Large
  Language Models
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models
Yuyan Chen
Qiang Fu
Yichen Yuan
Zhihao Wen
Ge Fan
Dayiheng Liu
Dongmei Zhang
Zhixu Li
Yanghua Xiao
HILM
64
76
0
04 Jul 2024
Self-playing Adversarial Language Game Enhances LLM Reasoning
Self-playing Adversarial Language Game Enhances LLM Reasoning
Pengyu Cheng
Tianhao Hu
Han Xu
Zhisong Zhang
Yong Dai
Lei Han
Nan Du
Nan Du
Xiaolong Li
SyDaLRMReLM
151
38
0
16 Apr 2024
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to
  Boost for Reasoning
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning
Yongqi Tong
Dawei Li
Sizhe Wang
Yujia Wang
Fei Teng
Jingbo Shang
LRM
94
58
0
29 Mar 2024
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models
Jiaxing Chen
Yuxuan Liu
Dehu Li
Xiang An
Weimo Deng
Ziyong Feng
Yongle Zhao
Yin Xie
LRM
60
14
0
28 Mar 2024
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive
  Dataset and Benchmark for Chain-of-Thought Reasoning
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Hao Shao
Shengju Qian
Han Xiao
Guanglu Song
Zhuofan Zong
Letian Wang
Yu Liu
Hongsheng Li
VGenLRMMLLM
108
75
0
25 Mar 2024
Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language
  Models
Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models
Zuyan Liu
Yuhao Dong
Yongming Rao
Jie Zhou
Jiwen Lu
LRM
57
21
0
19 Mar 2024
When Do We Not Need Larger Vision Models?
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLMLRM
101
48
0
19 Mar 2024
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of
  MLLM
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
YiXuan Wu
Yizhou Wang
Shixiang Tang
Wenhao Wu
Tong He
Wanli Ouyang
Jian Wu
Philip Torr
ObjDVLM
86
22
0
19 Mar 2024
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Ruyi Xu
Yuan Yao
Zonghao Guo
Junbo Cui
Zanlin Ni
Chunjiang Ge
Tat-Seng Chua
Zhiyuan Liu
Maosong Sun
Gao Huang
VLMMLLM
89
120
0
18 Mar 2024
Unified Hallucination Detection for Multimodal Large Language Models
Unified Hallucination Detection for Multimodal Large Language Models
Xiang Chen
Chenxi Wang
Yida Xue
Ningyu Zhang
Xiaoyan Yang
Qian Li
Yue Shen
Lei Liang
Jinjie Gu
Huajun Chen
HILM
93
45
0
05 Feb 2024
MouSi: Poly-Visual-Expert Vision-Language Models
MouSi: Poly-Visual-Expert Vision-Language Models
Xiaoran Fan
Tao Ji
Changhao Jiang
Shuo Li
Senjie Jin
...
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Zuxuan Wu
Yunchun Jiang
VLM
41
17
0
30 Jan 2024
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Hallucination is Inevitable: An Innate Limitation of Large Language Models
Ziwei Xu
Sanjay Jain
Mohan S. Kankanhalli
HILMLRM
142
253
0
22 Jan 2024
Image Translation as Diffusion Visual Programmers
Image Translation as Diffusion Visual Programmers
Cheng Han
James Liang
Qifan Wang
Majid Rabbani
S. Dianat
Raghuveer M. Rao
Ying Nian Wu
Dongfang Liu
51
8
0
18 Jan 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLMSyDaALMLRM
365
337
0
18 Jan 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi-An Ma
Yann LeCun
Saining Xie
VLMMLLM
94
347
0
11 Jan 2024
The Impact of Reasoning Step Length on Large Language Models
The Impact of Reasoning Step Length on Large Language Models
Mingyu Jin
Qinkai Yu
Dong Shu
Haiyan Zhao
Wenyue Hua
Yanda Meng
Yongfeng Zhang
Jundong Li
ReLMLRM
117
109
0
10 Jan 2024
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
95
158
0
21 Dec 2023
Mitigating Large Language Model Hallucinations via Autonomous Knowledge
  Graph-based Retrofitting
Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-based Retrofitting
Xinyan Guan
Yanjiang Liu
Hongyu Lin
Yaojie Lu
Xianpei Han
Xianpei Han
Le Sun
HILMKELM
71
76
0
22 Nov 2023
Meta Prompting for AI Systems
Meta Prompting for AI Systems
Yifan Zhang
Yang Yuan
Andrew Chi-Chih Yao
LLMAGLRM
77
6
0
20 Nov 2023
Monkey: Image Resolution and Text Label Are Important Things for Large
  Multi-modal Models
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Zhang Li
Biao Yang
Qiang Liu
Zhiyin Ma
Shuo Zhang
Jingxu Yang
Yabo Sun
Yuliang Liu
Xiang Bai
MLLM
94
275
0
11 Nov 2023
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and
  reusing ModulEs
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Zhenfang Chen
Rui Sun
Wenjun Liu
Yining Hong
Chuang Gan
LRM
86
15
0
08 Nov 2023
Holistic Analysis of Hallucination in GPT-4V(ision): Bias and
  Interference Challenges
Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges
Chenhang Cui
Yiyang Zhou
Xinyu Yang
Shirley Wu
Linjun Zhang
James Zou
Huaxiu Yao
MLLM
64
91
0
06 Nov 2023
CogVLM: Visual Expert for Pretrained Language Models
CogVLM: Visual Expert for Pretrained Language Models
Weihan Wang
Qingsong Lv
Wenmeng Yu
Wenyi Hong
Ji Qi
...
Bin Xu
Juanzi Li
Yuxiao Dong
Ming Ding
Jie Tang
VLMMLLM
97
508
0
06 Nov 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language
  Hallucination and Visual Illusion in Large Vision-Language Models
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Dinesh Manocha
VLMMLLM
100
194
0
23 Oct 2023
Improving Large Language Model Fine-tuning for Solving Math Problems
Improving Large Language Model Fine-tuning for Solving Math Problems
Yixin Liu
Avi Singh
C. D. Freeman
John D. Co-Reyes
Peter J. Liu
LRMReLM
80
49
0
16 Oct 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
236
470
0
14 Oct 2023
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language
  Models
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
Dongsheng Jiang
Yuchen Liu
Songlin Liu
Jiné Zhao
Hao Zhang
Zhen Gao
Xiaopeng Zhang
Jin Li
Hongkai Xiong
MLLMVLM
63
39
0
13 Oct 2023
Improved Baselines with Visual Instruction Tuning
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLMMLLM
123
2,807
0
05 Oct 2023
Analyzing and Mitigating Object Hallucination in Large Vision-Language
  Models
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
Yiyang Zhou
Chenhang Cui
Jaehong Yoon
Linjun Zhang
Zhun Deng
Chelsea Finn
Mohit Bansal
Huaxiu Yao
MLLM
130
184
0
01 Oct 2023
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought
  Reasoning: Advances, Frontiers and Future
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Tao He
Haotian Wang
Weihua Peng
Ming-Yuan Liu
Bing Qin
Ting Liu
LRMAI4CE
78
172
0
27 Sep 2023
Explainability for Large Language Models: A Survey
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jundong Li
LRM
90
461
0
02 Sep 2023
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Maciej Besta
Nils Blach
Aleš Kubíček
Robert Gerstenberger
Michal Podstawski
...
Joanna Gajda
Tomasz Lehmann
H. Niewiadomski
Piotr Nyczyk
Torsten Hoefler
LRMAI4CELM&Ro
148
701
0
18 Aug 2023
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large
  Language Models
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models
Yilin Wen
Zifeng Wang
Jimeng Sun
ReLM
70
74
0
17 Aug 2023
Boosting Logical Reasoning in Large Language Models through a New
  Framework: The Graph of Thought
Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought
Bin Lei
Pei-Hung Lin
C. Liao
Caiwen Ding
ReLMELMLRMAI4CE
61
40
0
16 Aug 2023
Detecting and Preventing Hallucinations in Large Vision Language Models
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal
Jihan Yin
Erhan Bas
MLLMVLM
65
174
0
11 Aug 2023
Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning
  to boost Foundation Modals
Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning to boost Foundation Modals
Fanglong Yao
Changyuan Tian
Jintao Liu
Zequn Zhang
Qing Liu
Li Jin
Shuchao Li
Xiaoyu Li
Xian Sun
ReLMLRM
64
17
0
11 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
102
716
0
04 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and
  Understanding of the Open World
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRMMLLM
83
88
0
03 Aug 2023
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
Bohao Li
Rui Wang
Guangzhi Wang
Yuying Ge
Yixiao Ge
Ying Shan
MLLMELM
117
567
0
30 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
329
12,044
0
18 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLMVLM
143
233
0
07 Jul 2023
123
Next