ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.08862
  4. Cited By
Visual Agents as Fast and Slow Thinkers
v1v2v3v4 (latest)

Visual Agents as Fast and Slow Thinkers

16 August 2024
Guangyan Sun
Mingyu Jin
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
    LLMAGLRM
ArXiv (abs)PDFHTMLGithub (24★)

Papers citing "Visual Agents as Fast and Slow Thinkers"

50 / 120 papers shown
Title
Visual Instruction Tuning with Polite Flamingo
Visual Instruction Tuning with Polite Flamingo
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
91
47
0
03 Jul 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image
  Understanding
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tongfei Sun
VLMMLLM
79
237
0
29 Jun 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
Ke Chen
Zhao Zhang
Weili Zeng
Richong Zhang
Feng Zhu
Rui Zhao
ObjD
92
650
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLMObjDVLM
109
763
0
26 Jun 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language
  Models
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Yunhang Shen
Yulei Qin
Mengdan Zhang
...
Xiawu Zheng
Ke Li
Xing Sun
Zhenyu Qiu
Rongrong Ji
ELMMLLM
107
856
0
23 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
397
4,422
0
09 Jun 2023
GRES: Generalized Referring Expression Segmentation
GRES: Generalized Referring Expression Segmentation
Chang Liu
Henghui Ding
Xudong Jiang
89
163
0
01 Jun 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALMOffRLLRM
193
1,228
0
31 May 2023
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large
  Language Models
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Gen Luo
Yiyi Zhou
Tianhe Ren
Shen Chen
Xiaoshuai Sun
Rongrong Ji
VLMMLLM
73
96
0
24 May 2023
Reasoning with Language Model is Planning with World Model
Reasoning with Language Model is Planning with World Model
Shibo Hao
Yi Gu
Haodi Ma
Joshua Jiahua Hong
Zhen Wang
D. Wang
Zhiting Hu
ReLMLRMLLMAG
143
599
0
24 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLMMLLMLRM
120
44
0
24 May 2023
DetGPT: Detect What You Need via Reasoning
DetGPT: Detect What You Need via Reasoning
Renjie Pi
Jiahui Gao
Shizhe Diao
Boyao Wang
Hanze Dong
...
Lewei Yao
Jianhua Han
Hang Xu
Lingpeng Kong Tong Zhang
Tong Zhang
LRMLM&Ro
66
98
0
23 May 2023
How Language Model Hallucinations Can Snowball
How Language Model Hallucinations Can Snowball
Muru Zhang
Ofir Press
William Merrill
Alisa Liu
Noah A. Smith
HILMLRM
128
279
0
22 May 2023
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large
  Language Models
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Junyi Li
Xiaoxue Cheng
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
HILMVLM
75
249
0
19 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for
  Vision-Centric Tasks
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLMVLM
88
491
0
18 May 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao
Dian Yu
Jeffrey Zhao
Izhak Shafran
Thomas Griffiths
Yuan Cao
Karthik Narasimhan
LM&RoLRMAI4CE
153
2,025
0
17 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLMLRM
296
807
0
17 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
132
2,095
0
11 May 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLMMLLM
286
955
0
27 Apr 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large
  Language Models
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLMMLLM
160
2,060
0
20 Apr 2023
Chameleon: Plug-and-Play Compositional Reasoning with Large Language
  Models
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Pan Lu
Baolin Peng
Hao Cheng
Michel Galley
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Jianfeng Gao
KELMMLLMLRM
103
324
0
19 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
560
4,910
0
17 Apr 2023
Segment Everything Everywhere All at Once
Segment Everything Everywhere All at Once
Xueyan Zou
Jianwei Yang
Hao Zhang
Feng Li
Linjie Li
Jianfeng Wang
Lijuan Wang
Jianfeng Gao
Yong Jae Lee
MLLMVLM
87
488
0
13 Apr 2023
OpenAGI: When LLM Meets Domain Experts
OpenAGI: When LLM Meets Domain Experts
Yingqiang Ge
Wenyue Hua
Kai Mei
Jianchao Ji
Juntao Tan
Shuyuan Xu
Zelong Li
Yongfeng Zhang
VLMLRM
96
230
0
10 Apr 2023
Segment Anything
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLMVLM
348
7,365
0
05 Apr 2023
Self-Refine: Iterative Refinement with Self-Feedback
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
...
Bodhisattwa Prasad Majumder
Katherine Hermann
Sean Welleck
Amir Yazdanbakhsh
Peter Clark
ReLMLRMDiffM
156
1,670
0
30 Mar 2023
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging
  Face
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
Yongliang Shen
Kaitao Song
Xu Tan
Dongsheng Li
Weiming Lu
Yueting Zhuang
MLLM
122
908
0
30 Mar 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,699
0
15 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLMLRMReLM
101
466
0
14 Mar 2023
PaLM-E: An Embodied Multimodal Language Model
PaLM-E: An Embodied Multimodal Language Model
Danny Driess
F. Xia
Mehdi S. M. Sajjadi
Corey Lynch
Aakanksha Chowdhery
...
Marc Toussaint
Klaus Greff
Andy Zeng
Igor Mordatch
Peter R. Florence
LM&Ro
117
1,663
0
06 Mar 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware
  Attention
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIPVLM
68
46
0
06 Mar 2023
Language Is Not All You Need: Aligning Perception with Language Models
Language Is Not All You Need: Aligning Perception with Language Models
Shaohan Huang
Li Dong
Wenhui Wang
Y. Hao
Saksham Singhal
...
Johan Bjorck
Vishrav Chaudhary
Subhojit Som
Xia Song
Furu Wei
VLMLRMMLLM
106
565
0
27 Feb 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.5K
13,420
0
27 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
429
4,641
0
30 Jan 2023
PACO: Parts and Attributes of Common Objects
PACO: Parts and Attributes of Common Objects
Vignesh Ramanathan
Anmol Kalia
Vladan Petrovic
Yiqian Wen
Baixue Zheng
...
Abhishek Kadian
Amir Mousavi
Yi-Zhe Song
Abhimanyu Dubey
D. Mahajan
VLM
87
104
0
04 Jan 2023
Generalized Decoding for Pixel, Image, and Language
Generalized Decoding for Pixel, Image, and Language
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
...
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
VLMMLLMObjD
101
259
0
21 Dec 2022
Towards Reasoning in Large Language Models: A Survey
Towards Reasoning in Large Language Models: A Survey
Jie Huang
Kevin Chen-Chuan Chang
LM&MAELMLRM
142
640
0
20 Dec 2022
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of
  What Matters
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters
Boshi Wang
Sewon Min
Xiang Deng
Jiaming Shen
You Wu
Luke Zettlemoyer
Huan Sun
LRMReLM
91
251
0
20 Dec 2022
Reasoning with Language Model Prompting: A Survey
Reasoning with Language Model Prompting: A Survey
Shuofei Qiao
Yixin Ou
Ningyu Zhang
Xiang Chen
Yunzhi Yao
Shumin Deng
Chuanqi Tan
Fei Huang
Huajun Chen
ReLMELMLRM
193
324
0
19 Dec 2022
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in
  Zero-Shot Reasoning
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Omar Shaikh
Hongxin Zhang
William B. Held
Michael S. Bernstein
Diyi Yang
ReLMLRM
135
200
0
15 Dec 2022
Solving Math Word Problems via Cooperative Reasoning induced Language
  Models
Solving Math Word Problems via Cooperative Reasoning induced Language Models
Xinyu Zhu
Junjie Wang
Lin Zhang
Yuxiang Zhang
Ruyi Gan
Jiaxing Zhang
Yujiu Yang
ReLMLRM
113
84
0
28 Oct 2022
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
Feng Liang
Bichen Wu
Xiaoliang Dai
Kunpeng Li
Yinan Zhao
Hang Zhang
Peizhao Zhang
Peter Vajda
Diana Marculescu
CLIPVLM
102
457
0
09 Oct 2022
Automatic Chain of Thought Prompting in Large Language Models
Automatic Chain of Thought Prompting in Large Language Models
Zhuosheng Zhang
Aston Zhang
Mu Li
Alexander J. Smola
ReLMLRM
150
632
0
07 Oct 2022
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAGReLMLRM
431
2,946
0
06 Oct 2022
Dynamic Prompt Learning via Policy Gradient for Semi-structured
  Mathematical Reasoning
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
Pan Lu
Liang Qiu
Kai-Wei Chang
Ying Nian Wu
Song-Chun Zhu
Tanmay Rajpurohit
Peter Clark
Ashwin Kalyan
ReLMLRM
164
296
0
29 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELMReLMLRM
283
1,296
0
20 Sep 2022
Language models show human-like content effects on reasoning tasks
Language models show human-like content effects on reasoning tasks
Ishita Dasgupta
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Hannah R. Sheahan
Antonia Creswell
D. Kumaran
James L. McClelland
Felix Hill
ReLMLRM
121
187
0
14 Jul 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLMLRM
526
4,077
0
24 May 2022
Least-to-Most Prompting Enables Complex Reasoning in Large Language
  Models
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
Denny Zhou
Nathanael Scharli
Le Hou
Jason W. Wei
Nathan Scales
...
Dale Schuurmans
Claire Cui
Olivier Bousquet
Quoc Le
Ed H. Chi
RALMLRMAI4CE
97
1,125
0
21 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
418
3,585
0
29 Apr 2022
Previous
123
Next