ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11559
  4. Cited By
Visual Programming: Compositional visual reasoning without training

Visual Programming: Compositional visual reasoning without training

18 November 2022
Tanmay Gupta
Aniruddha Kembhavi
    ReLM
    VLM
    LRM
ArXivPDFHTML

Papers citing "Visual Programming: Compositional visual reasoning without training"

50 / 309 papers shown
Title
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal
  Learning with Missing Modalities and Data Scarcity
Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
Zhuo Zhi
Ziquan Liu
M. Elbadawi
Adam Daneshmend
Mine Orlu
Abdul Basit
Andreas Demosthenous
Miguel R. D. Rodrigues
36
2
0
14 Mar 2024
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Zhicheng Guo
Sijie Cheng
Hao Wang
Shihao Liang
Yujia Qin
Peng Li
Zhiyuan Liu
Maosong Sun
Yang Liu
ELM
52
23
0
12 Mar 2024
A Modular Approach for Multimodal Summarization of TV Shows
A Modular Approach for Multimodal Summarization of TV Shows
Louis Mahon
Mirella Lapata
26
9
0
06 Mar 2024
What Is Missing in Multilingual Visual Reasoning and How to Fix It
What Is Missing in Multilingual Visual Reasoning and How to Fix It
Yueqi Song
Simran Khanuja
Graham Neubig
VLM
LRM
97
6
0
03 Mar 2024
Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning
Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning
Shuo Yang
Zirui Shang
Yongqi Wang
Derong Deng
Hongwei Chen
Qiyuan Cheng
Xinxiao Wu
VLM
38
6
0
02 Mar 2024
From Summary to Action: Enhancing Large Language Models for Complex
  Tasks with Open World APIs
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs
Yulong Liu
Yunlong Yuan
Chunwei Wang
Jianhua Han
Yongqiang Ma
Li Zhang
Nanning Zheng
Hang Xu
LLMAG
39
5
0
28 Feb 2024
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi
Runpei Dong
Shaochen Zhang
Haoran Geng
Chunrui Han
Zheng Ge
Li Yi
Kaisheng Ma
41
52
0
27 Feb 2024
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist
  Autonomous Agents for Desktop and Web
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor
Y. Butala
M. Russak
Jing Yu Koh
Kiran Kamble
Waseem Alshikh
Ruslan Salakhutdinov
LLMAG
51
44
0
27 Feb 2024
Selective "Selective Prediction": Reducing Unnecessary Abstention in
  Vision-Language Reasoning
Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning
Tejas Srinivasan
Jack Hessel
Tanmay Gupta
Bill Yuchen Lin
Yejin Choi
Jesse Thomason
Khyathi Raghavi Chandu
24
7
0
23 Feb 2024
AutoMMLab: Automatically Generating Deployable Models from Language
  Instructions for Computer Vision Tasks
AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks
Zekang Yang
Wang Zeng
Sheng Jin
Chao Qian
Ping Luo
Wentao Liu
MLLM
VLM
61
8
0
23 Feb 2024
Large Multimodal Agents: A Survey
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
37
39
0
23 Feb 2024
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
  and Simulation
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation
Junting Chen
Yao Mu
Qiaojun Yu
Tianming Wei
Silang Wu
...
Wenqi Shao
Yu Qiao
Huazhe Xu
Mingyu Ding
Ping Luo
LM&Ro
34
11
0
22 Feb 2024
CODIS: Benchmarking Context-Dependent Visual Comprehension for
  Multimodal Large Language Models
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
Fuwen Luo
Chi Chen
Zihao Wan
Zhaolu Kang
Qidong Yan
...
Xiaoyue Mi
Peng Li
Ning Ma
Maosong Sun
Yang Liu
43
5
0
21 Feb 2024
Tell Me More! Towards Implicit User Intention Understanding of Language
  Model Driven Agents
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
Cheng Qian
Bingxiang He
Zhuang Zhong
Jia Deng
Yujia Qin
...
Zhong Zhang
Jie Zhou
Yankai Lin
Zhiyuan Liu
Maosong Sun
33
28
0
14 Feb 2024
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating
  Unconventional Objects
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects
Yutaro Yamada
Khyathi Raghavi Chandu
Yuchen Lin
Jack Hessel
Ilker Yildirim
Yejin Choi
AI4CE
23
12
0
14 Feb 2024
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous
  Driving and Zero-Shot Instruction Following
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following
Brian Yang
Huangyuan Su
N. Gkanatsios
Tsung-Wei Ke
Ayush Jain
Jeff Schneider
Katerina Fragkiadaki
DiffM
37
20
0
09 Feb 2024
LLMs for Coding and Robotics Education
LLMs for Coding and Robotics Education
Peng Shu
Huaqin Zhao
Hanqi Jiang
Yiwei Li
Shaochen Xu
...
Zheng Liu
Guoyu Lu
Le Guan
Gong Chen
Xianqiao Wang Tianming Liu
42
5
0
09 Feb 2024
Editable Scene Simulation for Autonomous Driving via Collaborative
  LLM-Agents
Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
Yuxi Wei
Zi Wang
Yifan Lu
Chenxin Xu
Chang-rui Liu
Hao Zhao
Siheng Chen
Yanfeng Wang
VGen
65
59
0
08 Feb 2024
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou
You Li
Fan Ma
Zongxin Yang
Yi Yang
DiffM
22
57
0
08 Feb 2024
Open-Universe Indoor Scene Generation using LLM Program Synthesis and
  Uncurated Object Databases
Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases
Rio Aguina-Kang
Maxim Gumin
Do Heon Han
Stewart Morris
Seung Jean Yoo
Aditya Ganeshan
R. K. Jones
Qiuhong Anna Wei
Kailiang Fu
Daniel E. Ritchie
3DV
45
24
0
05 Feb 2024
Solution-oriented Agent-based Models Generation with Verifier-assisted
  Iterative In-context Learning
Solution-oriented Agent-based Models Generation with Verifier-assisted Iterative In-context Learning
Tong Niu
Weihao Zhang
Rong Zhao
LLMAG
32
2
0
04 Feb 2024
Common Sense Reasoning for Deepfake Detection
Common Sense Reasoning for Deepfake Detection
Yue Zhang
Ben Colman
Xiao Guo
Ali Shahriyari
Gaurav Bharaj
32
30
0
31 Jan 2024
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Elias Stengel-Eskin
Archiki Prasad
Mohit Bansal
25
13
0
29 Jan 2024
GraphiMind: LLM-centric Interface for Information Graphics Design
GraphiMind: LLM-centric Interface for Information Graphics Design
Qiruin Huang
Min Lu
J. Lanir
Dani Lischinski
Daniel Cohen-Or
Hui Huang
MLLM
27
7
0
24 Jan 2024
TroVE: Inducing Verifiable and Efficient Toolboxes for Solving
  Programmatic Tasks
TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks
Zhiruo Wang
Daniel Fried
Graham Neubig
25
19
0
23 Jan 2024
CCA: Collaborative Competitive Agents for Image Editing
CCA: Collaborative Competitive Agents for Image Editing
Tiankai Hang
Shuyang Gu
Dong Chen
Xin Geng
Baining Guo
33
5
0
23 Jan 2024
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
  Generating with Multimodal LLMs
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Ling Yang
Zhaochen Yu
Chenlin Meng
Minkai Xu
Stefano Ermon
Bin Cui
CoGe
DiffM
48
115
0
22 Jan 2024
Prompting Large Vision-Language Models for Compositional Reasoning
Prompting Large Vision-Language Models for Compositional Reasoning
Timothy Ossowski
Ming Jiang
Junjie Hu
CoGe
VLM
LRM
43
3
0
20 Jan 2024
PhotoScout: Synthesis-Powered Multi-Modal Image Search
PhotoScout: Synthesis-Powered Multi-Modal Image Search
Celeste Barnaby
Qiaochu Chen
Chenglong Wang
Işıl Dillig
42
2
0
19 Jan 2024
LangProp: A code optimization framework using Large Language Models
  applied to driving
LangProp: A code optimization framework using Large Language Models applied to driving
Shu Ishida
Gianluca Corrado
George Fedoseev
Hudson Yeo
Lloyd Russell
Jamie Shotton
João F. Henriques
Anthony Hu
59
11
0
18 Jan 2024
DiffusionGPT: LLM-Driven Text-to-Image Generation System
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Jie Qin
Jie Wu
Weifeng Chen
Yuxi Ren
Huixian Li
Hefeng Wu
Xuefeng Xiao
Rui Wang
S. Wen
DiffM
62
25
0
18 Jan 2024
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and
  Visual Question Generation
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation
Kohei Uehara
Nabarun Goswami
Hanqin Wang
Toshiaki Baba
Kohtaro Tanaka
...
Takagi Naoya
Ryo Umagami
Yingyi Wen
Tanachai Anakewat
Tatsuya Harada
LRM
28
2
0
18 Jan 2024
Image Translation as Diffusion Visual Programmers
Image Translation as Diffusion Visual Programmers
Cheng Han
James Liang
Qifan Wang
Majid Rabbani
S. Dianat
Raghuveer M. Rao
Ying Nian Wu
Dongfang Liu
29
8
0
18 Jan 2024
Vlogger: Make Your Dream A Vlog
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang
Kunchang Li
Xinyuan Chen
Yaohui Wang
Ziwei Liu
Yu Qiao
Yali Wang
VGen
DiffM
38
35
0
17 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
69
35
0
16 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models
  (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
47
69
0
10 Jan 2024
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results
  for Video Question Answering
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering
Yueqian Wang
Yuxuan Wang
Kai Chen
Dongyan Zhao
33
2
0
08 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as
  Programmers
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRM
VLM
27
9
0
03 Jan 2024
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
  Empowers Large Language Models to Serve as Intelligent Agents
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents
Ke Yang
Jiateng Liu
John Wu
Chaoqi Yang
Yi R. Fung
...
Xu Cao
Xingyao Wang
Yiquan Wang
Chenhui Xu
Chengxiang Zhai
LLMAG
ELM
26
75
0
01 Jan 2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
40
144
0
28 Dec 2023
GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension
GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension
Bohan Lyu
Xin Cong
Heyang Yu
Pan Yang
Yujia Qin
...
Zhong Zhang
Yukun Yan
Yankai Lin
Zhiyuan Liu
Maosong Sun
LLMAG
35
5
0
28 Dec 2023
A Survey on Open-Set Image Recognition
A Survey on Open-Set Image Recognition
Jiaying Sun
Qiulei Dong
BDL
ObjD
34
3
0
25 Dec 2023
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
60
122
0
21 Dec 2023
Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows
Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows
Madeleine Grunde-McLaughlin
Michelle S. Lam
Ranjay Krishna
Daniel S. Weld
Jeffrey Heer
AI4CE
50
21
0
18 Dec 2023
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
Zhi Gao
Yuntao Du
Xintong Zhang
Xiaojian Ma
Wenjuan Han
Song-Chun Zhu
Qing Li
LLMAG
VLM
31
21
0
18 Dec 2023
A Survey of Reasoning with Foundation Models
A Survey of Reasoning with Foundation Models
Jiankai Sun
Chuanyang Zheng
E. Xie
Zhengying Liu
Ruihang Chu
...
Xipeng Qiu
Yi-Chen Guo
Hui Xiong
Qun Liu
Zhenguo Li
ReLM
LRM
AI4CE
30
76
0
17 Dec 2023
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs
Zhongyi Zhou
Jing Jin
Vrushank Phadnis
Xiuxiu Yuan
Jun Jiang
...
A. Olwal
David Kim
Ram Iyengar
Na Li
Andrea Colaço
33
0
0
15 Dec 2023
Can LLM find the green circle? Investigation and Human-guided tool
  manipulation for compositional generalization
Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization
Min Zhang
Jianfeng He
Shuo Lei
Murong Yue
Linhan Wang
Chang-Tien Lu
45
4
0
12 Dec 2023
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Fan Ma
Xiaojie Jin
Heng Wang
Yuchen Xian
Jiashi Feng
Yi Yang
26
47
0
12 Dec 2023
Visual Program Distillation: Distilling Tools and Programmatic Reasoning
  into Vision-Language Models
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu
Otilia Stretcu
Chun-Ta Lu
Krishnamurthy Viswanathan
Kenji Hata
Enming Luo
Ranjay Krishna
Ariel Fuxman
VLM
LRM
MLLM
49
29
0
05 Dec 2023
Previous
1234567
Next