ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11559
  4. Cited By
Visual Programming: Compositional visual reasoning without training

Visual Programming: Compositional visual reasoning without training

18 November 2022
Tanmay Gupta
Aniruddha Kembhavi
    ReLMVLMLRM
ArXiv (abs)PDFHTML

Papers citing "Visual Programming: Compositional visual reasoning without training"

37 / 87 papers shown
Title
A Multimodal Automated Interpretability Agent
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham
Sarah Schwettmann
Franklin Wang
Achyuta Rajaram
Evan Hernandez
Jacob Andreas
Antonio Torralba
219
27
0
22 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
95
31
0
09 Apr 2024
Self-Training Large Language Models for Improved Visual Program
  Synthesis With Visual Reinforcement
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Zaid Khan
B. Vijaykumar
S. Schulter
Yun Fu
Manmohan Chandraker
LRMReLM
98
8
0
06 Apr 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
171
9
0
21 Mar 2024
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Zhicheng Guo
Sijie Cheng
Hao Wang
Shihao Liang
Yujia Qin
Peng Li
Zhiyuan Liu
Maosong Sun
Yang Liu
ELM
138
31
0
12 Mar 2024
What Is Missing in Multilingual Visual Reasoning and How to Fix It
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLMLRM
195
6
0
03 Mar 2024
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
  and Simulation
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation
Junting Chen
Yao Mu
Qiaojun Yu
Tianming Wei
Silang Wu
...
Wenqi Shao
Yu Qiao
Huazhe Xu
Mingyu Ding
Ping Luo
LM&Ro
83
12
0
22 Feb 2024
Common Sense Reasoning for Deepfake Detection
Common Sense Reasoning for Deepfake Detection
Yue Zhang
Ben Colman
Xiao Guo
Ali Shahriyari
Gaurav Bharaj
143
35
0
31 Jan 2024
GraphiMind: LLM-centric Interface for Information Graphics Design
GraphiMind: LLM-centric Interface for Information Graphics Design
Qiruin Huang
Min Lu
J. Lanir
Dani Lischinski
Daniel Cohen-Or
Hui Huang
MLLM
81
8
0
24 Jan 2024
CCA: Collaborative Competitive Agents for Image Editing
CCA: Collaborative Competitive Agents for Image Editing
Tiankai Hang
Shuyang Gu
Dong Chen
Xin Geng
Baining Guo
164
5
0
23 Jan 2024
Prompting Large Vision-Language Models for Compositional Reasoning
Prompting Large Vision-Language Models for Compositional Reasoning
Timothy Ossowski
Ming Jiang
Junjie Hu
CoGeVLMLRM
102
3
0
20 Jan 2024
LangProp: A code optimization framework using Large Language Models
  applied to driving
LangProp: A code optimization framework using Large Language Models applied to driving
Shu Ishida
Gianluca Corrado
George Fedoseev
Hudson Yeo
Lloyd Russell
Jamie Shotton
João F. Henriques
Anthony Hu
113
11
0
18 Jan 2024
Vlogger: Make Your Dream A Vlog
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang
Kunchang Li
Xinyuan Chen
Yaohui Wang
Ziwei Liu
Yu Qiao
Yali Wang
VGenDiffM
81
39
0
17 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&RoLLMAG
181
41
0
16 Jan 2024
Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub
Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub
Bohan Lyu
Xin Cong
Heyang Yu
Pan Yang
Yujia Qin
...
Zhong Zhang
Yukun Yan
Y. Lin
Zhiyuan Liu
Maosong Sun
LLMAG
82
5
0
28 Dec 2023
A Survey on Open-Set Image Recognition
A Survey on Open-Set Image Recognition
Qiulei Dong
Qiulei Dong
BDLObjD
92
6
0
25 Dec 2023
Can LLM find the green circle? Investigation and Human-guided tool
  manipulation for compositional generalization
Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization
Min Zhang
Jianfeng He
Shuo Lei
Murong Yue
Linhan Wang
Chang-Tien Lu
92
5
0
12 Dec 2023
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan
Jinke Ren
Chun-Mei Feng
Hengshuang Zhao
Shuguang Cui
Zhen Li
125
30
0
26 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLMVLM
113
126
0
09 Nov 2023
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
Nicholas Walker
Stefan Ultes
Pierre Lison
LM&Ro
159
1
0
03 Nov 2023
Woodpecker: Hallucination Correction for Multimodal Large Language
  Models
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Tong Xu
Hao Wang
Dianbo Sui
Yunhang Shen
Ke Li
Xingguo Sun
Enhong Chen
VLMMLLM
108
133
0
24 Oct 2023
Hypothesis Search: Inductive Reasoning with Language Models
Hypothesis Search: Inductive Reasoning with Language Models
Ruocheng Wang
E. Zelikman
Gabriel Poesia
Yewen Pu
Nick Haber
Noah D. Goodman
ReLMLRM
132
112
0
11 Sep 2023
Compositional Learning of Visually-Grounded Concepts Using Reinforcement
Compositional Learning of Visually-Grounded Concepts Using Reinforcement
Zijun Lin
Haidi Azaman
M Ganesh Kumar
Cheston Tan
CoGeOffRL
72
3
0
08 Sep 2023
Language Prompt for Autonomous Driving
Language Prompt for Autonomous Driving
Dongming Wu
Wencheng Han
Tiancai Wang
Yingfei Liu
Cheng-zhong Xu
Jianbing Shen
Jianbing Shen
VLM
134
87
0
08 Sep 2023
Rational Decision-Making Agent with Internalized Utility Judgment
Rational Decision-Making Agent with Internalized Utility Judgment
Yining Ye
Xin Cong
Shizuo Tian
Yujia Qin
Chong Liu
Y. Lin
Zhiyuan Liu
Maosong Sun
LLMAG
91
8
0
24 Aug 2023
NEOLAF, an LLM-powered neural-symbolic cognitive architecture
NEOLAF, an LLM-powered neural-symbolic cognitive architecture
Richard Tong
Cassie Chen Cao
Timothy Xueqian Lee
Guodong Zhao
Ray Wan
...
Xiangen Hu
Robin Schmucker
Jinsheng Pan
Julian Quevedo
Yu Lu
41
1
0
08 Aug 2023
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language
  Models
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Cheng-Yu Hsieh
Sibei Chen
Chun-Liang Li
Yasuhisa Fujii
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
LLMAGSyDa
148
44
0
01 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
146
127
0
25 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
  Language Models
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
132
519
0
12 Jul 2023
AmadeusGPT: a natural language interface for interactive animal
  behavioral analysis
AmadeusGPT: a natural language interface for interactive animal behavioral analysis
Shaokai Ye
Jessy Lauer
Mu Zhou
Alexander Mathis
Mackenzie W. Mathis
MLLMLLMAG
104
18
0
10 Jul 2023
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Rabiul Awal
Le Zhang
Aishwarya Agrawal
LRM
141
13
0
16 Jun 2023
Tell Me Where to Go: A Composable Framework for Context-Aware Embodied
  Robot Navigation
Tell Me Where to Go: A Composable Framework for Context-Aware Embodied Robot Navigation
Harel Biggie
Ajay Narasimha Mopidevi
Dusty Woods
Christoffer Heckman
LM&Ro
67
11
0
15 Jun 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute,
  Inspect, and Learn
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
96
76
0
14 Jun 2023
LayoutGPT: Compositional Visual Planning and Generation with Large
  Language Models
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Weixi Feng
Wanrong Zhu
Tsu-Jui Fu
Varun Jampani
Arjun Reddy Akula
Xuehai He
Sugato Basu
Xinze Wang
William Yang Wang
MLLM
125
180
0
24 May 2023
Visual Programming for Text-to-Image Generation and Evaluation
Visual Programming for Text-to-Image Generation and Evaluation
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
MLLM
119
51
0
24 May 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
Haoxuan You
Rui Sun
Zhecan Wang
Long Chen
Gengyu Wang
Hammad A. Ayyubi
Kai-Wei Chang
Shih-Fu Chang
VLMMLLMLRM
150
44
0
24 May 2023
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging
  Face
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
Yongliang Shen
Kaitao Song
Xu Tan
Dongsheng Li
Weiming Lu
Yueting Zhuang
MLLM
149
913
0
30 Mar 2023
Previous
12