Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.11559
Cited By
Visual Programming: Compositional visual reasoning without training
18 November 2022
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Visual Programming: Compositional visual reasoning without training"
50 / 87 papers shown
Title
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
Xiaolong Wang
Zhaolu Kang
Wangyuxuan Zhai
Xinyue Lou
Yunghwei Lai
...
Yawen Wang
Kaiyu Huang
Yile Wang
Peng Li
Yang Liu
19
0
0
20 Jun 2025
Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints
Sunil Kumar
Bowen Zhao
Leo Parker Dirac
Paulina Varshavskaya
LRM
15
0
0
10 Jun 2025
HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains
Shijie Wang
Yilun Zhang
Zeyu Lai
Dexing Kong
24
0
0
09 Jun 2025
Language-Vision Planner and Executor for Text-to-Visual Reasoning
Yichang Xu
Gaowen Liu
Ramana Rao Kompella
Sihao Hu
Tiansheng Huang
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
Ling Liu
LRM
VLM
23
0
0
09 Jun 2025
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
Tianyi Bai
Zengjie Hu
Fupeng Sun
Jiantao Qiu
Yizhen Jiang
Guangxin He
Bohan Zeng
Conghui He
Binhang Yuan
Wentao Zhang
OffRL
LRM
17
0
0
08 Jun 2025
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Chaoyang Wang
Zeyu Zhang
Haiyun Jiang
OffRL
LRM
25
0
0
07 Jun 2025
Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
Z. Babaiee
Peyman M. Kiasari
Daniela Rus
Radu Grosu
45
0
0
06 Jun 2025
Efficiently Enhancing General Agents With Hierarchical-categorical Memory
Changze Qiao
Mingming Lu
LLMAG
34
0
0
28 May 2025
Thinking with Generated Images
Ethan Chern
Zhulin Hu
Steffi Chern
Siqi Kou
Jiadi Su
Yan Ma
Zhijie Deng
Pengfei Liu
LRM
63
1
0
28 May 2025
RefAV: Towards Planning-Centric Scenario Mining
Cainan Davidson
Deva Ramanan
Neehar Peri
89
2
0
27 May 2025
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Mingyuan Wu
Jingcheng Yang
Jize Jiang
Meitang Li
Kaizhuo Yan
Hanchao Yu
Minjia Zhang
Chengxiang Zhai
Klara Nahrstedt
LRM
173
0
0
25 May 2025
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation
Jiwan Chung
Junhyeok Kim
Siyeol Kim
Jaeyoung Lee
Min Soo Kim
Youngjae Yu
LRM
95
0
0
24 May 2025
Neuro-Symbolic Query Compiler
Yuyao Zhang
Zhicheng Dou
Xiaoxi Li
Jiajie Jin
Yongkang Wu
Zhonghua Li
Qi Ye
Ji-Rong Wen
NAI
111
0
0
17 May 2025
Visually Interpretable Subtask Reasoning for Visual Question Answering
Yu Cheng
A. Goel
Hakan Bilen
LRM
68
0
0
12 May 2025
Manipulating Multimodal Agents via Cross-Modal Prompt Injection
Le Wang
Zonghao Ying
Tianyuan Zhang
Siyuan Liang
Shengshan Hu
Mingchuan Zhang
A. Liu
Xianglong Liu
AAML
177
4
0
19 Apr 2025
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Yutong Xia
Ao Qu
Yunhan Zheng
Yihong Tang
Dingyi Zhuang
...
Cathy Wu
Roger Zimmermann
Lijun Sun
Roger Zimmermann
Jinhua Zhao
AI4CE
394
2
0
15 Apr 2025
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke
Vijay Kumar B G
Xingjian Leng
Zhixi Cai
Zaid Khan
Weiqing Wang
P. D. Haghighi
H. Rezatofighi
Manmohan Chandraker
159
1
0
25 Mar 2025
ChatStitch: Visualizing Through Structures via Surround-View Unsupervised Deep Image Stitching with Collaborative LLM-Agents
Hao Liang
Zhipeng Dong
Kaixin Chen
M. Fu
Yufeng Yue
Yi Yang
Mengyin Fu
106
0
0
19 Mar 2025
PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play
Wei Fang
Yang Zhang
Kaizhi Qian
James R. Glass
Yada Zhu
LLMAG
93
0
0
18 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
88
0
0
12 Mar 2025
ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points
Qingming Huang
Runze Zhang
Kangjun Liu
Minglun Gong
Hao Zhang
Hui Huang
3DPC
AI4CE
106
1
0
04 Mar 2025
Program Synthesis Dialog Agents for Interactive Decision-Making
Matthew Toles
Nikhil Balwani
Rattandeep Singh
Valentina Giulia Sartori Rodriguez
Zhou Yu
135
0
0
26 Feb 2025
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
LRM
110
16
0
24 Feb 2025
Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection
Boyu Mi
Hanqing Wang
Tai Wang
Yilun Chen
Jiangmiao Pang
137
0
0
21 Feb 2025
MoVer: Motion Verification for Motion Graphics Animations
Jiaju Ma
Maneesh Agrawala
VGen
121
0
0
19 Feb 2025
Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
Zeqing Wang
Wentao Wan
Qiqing Lao
Runmeng Chen
Minjie Lang
Keze Wang
Liang Lin
Liang Lin
LRM
234
3
0
17 Feb 2025
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
Utkarsh Mall
Cheng Perng Phoo
Mia Chiquier
Bharath Hariharan
Kavita Bala
Carl Vondrick
145
1
0
17 Feb 2025
VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Chunbai Zhang
Chao Wang
Yang Zhou
Yan Peng
LRM
ReLM
150
0
0
02 Feb 2025
PuzzleGPT: Emulating Human Puzzle-Solving Ability for Time and Location Prediction
Hammad A. Ayyubi
Xuande Feng
Junzhang Liu
Xudong Lin
Zhecan Wang
Shih-Fu Chang
77
1
0
24 Jan 2025
Neuro-Symbolic AI in 2024: A Systematic Review
Brandon C. Colelough
William Regli
NAI
163
13
0
09 Jan 2025
AutoPresent: Designing Structured Visuals from Scratch
Jiaxin Ge
Zora Z. Wang
Xuhui Zhou
Yi-Hao Peng
Sanjay Subramanian
...
Maarten Sap
Alane Suhr
Daniel Fried
Graham Neubig
Trevor Darrell
99
8
0
01 Jan 2025
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Dimitrios Mallis
Ahmet Serdar Karadeniz
Sebastian Cavada
Danila Rukhovich
Niki Maria Foteinopoulou
K. Cherenkova
Anis Kacem
Djamila Aouada
182
7
0
18 Dec 2024
Empowering LLMs to Understand and Generate Complex Vector Graphics
Ximing Xing
Juncheng Hu
Guotao Liang
Jing Zhang
Dong Xu
Qian Yu
192
12
0
15 Dec 2024
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin
Yunsheng Li
Dongdong Chen
Weijian Xu
Ronald Clark
Philip Torr
VLM
ObjD
548
1
0
12 Dec 2024
LossAgent: Towards Any Optimization Objectives for Image Processing with LLM Agents
Bingchen Li
Xin Li
Yiting Lu
Zhibo Chen
194
1
0
05 Dec 2024
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu
Jiangming Wang
Yuan Meng
Yanning Zhang
Le Sun
Zhi Wang
532
0
0
25 Nov 2024
GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers
Éloi Zablocki
Valentin Gerard
Amaia Cardiel
Eric Gaussier
Matthieu Cord
Eduardo Valle
164
0
0
23 Nov 2024
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Yongdong Luo
Xiawu Zheng
Xiao Yang
Guilin Li
Haojia Lin
Jinfa Huang
Jiayi Ji
Yong Li
Jiebo Luo
Rongrong Ji
VLM
199
28
0
20 Nov 2024
TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos
Leonardo Plini
Luca Scofano
Edoardo De Matteis
Guido Maria DÁmely di Melendugno
Alessandro Flaborea
Andrea Sanchietti
G. Farinella
Fabio Galasso
Antonino Furnari
LRM
EgoV
116
1
0
04 Nov 2024
VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning
Yichao Liang
Nishanth Kumar
Hao Tang
Adrian Weller
J. Tenenbaum
Tom Silver
Joao Henriques
Kevin Ellis
131
12
0
30 Oct 2024
GRS: Generating Robotic Simulation Tasks from Real-World Images
Alex Zook
Fan-Yun Sun
Josef Spjut
Valts Blukis
Stan Birchfield
Jonathan Tremblay
101
4
0
20 Oct 2024
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan
Elias Stengel-Eskin
Jaemin Cho
Joey Tianyi Zhou
VGen
159
3
0
08 Oct 2024
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
A. Mavrogiannis
Dehao Yuan
Yiannis Aloimonos
LM&Ro
87
0
0
23 Sep 2024
What Makes a Maze Look Like a Maze?
Joy Hsu
Jiayuan Mao
J. Tenenbaum
Noah D. Goodman
Jiajun Wu
OCL
130
6
0
12 Sep 2024
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering
Ruoyue Shen
Nakamasa Inoue
Koichi Shinoda
71
1
0
30 Jul 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLM
DiffM
132
40
0
08 Jul 2024
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
Zeyu Leo Liu
Shrey Pandit
Xi Ye
Eunsol Choi
Greg Durrett
KELM
ALM
135
7
0
08 Jul 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLM
VLM
130
28
0
18 Jun 2024
ParSEL: Parameterized Shape Editing with Language
Aditya Ganeshan
Ryan Y. Huang
Xianghao Xu
R. K. Jones
Daniel E. Ritchie
KELM
79
3
0
30 May 2024
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes
Shangzhan Zhang
Sida Peng
Tao Xu
Yuanbo Yang
Tianrun Chen
Nan Xue
Yujun Shen
Hujun Bao
Ruizhen Hu
Xiaowei Zhou
DiffM
100
11
0
26 Apr 2024
1
2
Next