ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.01795
  4. Cited By
Multimodal Procedural Planning via Dual Text-Image Prompting

Multimodal Procedural Planning via Dual Text-Image Prompting

2 May 2023
Yujie Lu
Pan Lu
Zhiyu Zoey Chen
Wanrong Zhu
Xinze Wang
William Yang Wang
    LM&Ro
ArXiv (abs)PDFHTMLGithub (92★)

Papers citing "Multimodal Procedural Planning via Dual Text-Image Prompting"

31 / 31 papers shown
Title
Latent Beam Diffusion Models for Decoding Image Sequences
Latent Beam Diffusion Models for Decoding Image Sequences
Guilherme Fernandes
Vasco Ramos
Regev Cohen
Idan Szpektor
João Magalhães
164
1
0
26 Mar 2025
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
483
0
0
01 Dec 2024
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large
  Language Models
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Chan Hee Song
Jiaman Wu
Clay Washington
Brian M Sadler
Wei-Lun Chao
Yu-Chuan Su
LLMAGLM&Ro
138
422
0
08 Dec 2022
InstructPix2Pix: Learning to Follow Image Editing Instructions
InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
213
1,835
0
17 Nov 2022
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination
Yue Yang
Wenlin Yao
Hongming Zhang
Xiaoyang Wang
Dong Yu
Jianshu Chen
VLM
81
22
0
21 Oct 2022
Visualize Before You Write: Imagination-Guided Open-Ended Text
  Generation
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
Wanrong Zhu
An Yan
Yujie Lu
Wenda Xu
Xinze Wang
Miguel P. Eckstein
William Yang Wang
114
36
0
07 Oct 2022
Language Models with Image Descriptors are Strong Few-Shot
  Video-Language Learners
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
...
Chenguang Zhu
Derek Hoiem
Shih-Fu Chang
Joey Tianyi Zhou
Heng Ji
MLLMVLM
216
142
0
22 May 2022
Language Models Can See: Plugging Visual Controls in Text Generation
Language Models Can See: Plugging Visual Controls in Text Generation
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLMMLLM
97
97
0
05 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
418
3,610
0
29 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLMDiffM
413
6,916
0
13 Apr 2022
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Michael Ahn
Anthony Brohan
Noah Brown
Yevgen Chebotar
Omar Cortes
...
Ted Xiao
Peng Xu
Sichun Xu
Mengyuan Yan
Andy Zeng
LM&Ro
195
1,988
0
04 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLMLRM
162
588
0
01 Apr 2022
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Oran Gafni
Adam Polyak
Oron Ashual
Shelly Sheynin
Devi Parikh
Yaniv Taigman
DiffM
92
524
0
24 Mar 2022
Things not Written in Text: Exploring Spatial Commonsense from Visual
  Signals
Things not Written in Text: Exploring Spatial Commonsense from Visual Signals
Xiao Liu
Da Yin
Yansong Feng
Dongyan Zhao
LRM
66
46
0
15 Mar 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLMObjD
157
880
0
07 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
555
4,413
0
28 Jan 2022
GLIDE: Towards Photorealistic Image Generation and Editing with
  Text-Guided Diffusion Models
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol
Prafulla Dhariwal
Aditya A. Ramesh
Pranav Shyam
Pamela Mishkin
Bob McGrew
Ilya Sutskever
Mark Chen
364
3,630
0
20 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
  for Zero-shot and Few-shot Tasks
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
119
133
0
02 Dec 2021
Design Guidelines for Prompt Engineering Text-to-Image Generative Models
Design Guidelines for Prompt Engineering Text-to-Image Generative Models
Vivian Liu
Lydia B. Chilton
65
502
0
14 Sep 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
256
422
0
10 Sep 2021
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
185
789
0
25 Jun 2021
CogView: Mastering Text-to-Image Generation via Transformers
CogView: Mastering Text-to-Image Generation via Transformers
Ming Ding
Zhuoyi Yang
Wenyi Hong
Wendi Zheng
Chang Zhou
...
Junyang Lin
Xu Zou
Zhou Shao
Hongxia Yang
Jie Tang
ViTVLM
127
782
0
26 May 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
169
1,588
0
18 Apr 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
420
5,000
0
24 Feb 2021
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.3K
12,316
0
27 Aug 2019
Generative Modeling by Estimating Gradients of the Data Distribution
Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song
Stefano Ermon
SyDaDiffM
258
3,961
0
12 Jul 2019
Procedure Planning in Instructional Videos
Procedure Planning in Instructional Videos
C. Chang
De-An Huang
Danfei Xu
Ehsan Adeli
Li Fei-Fei
Juan Carlos Niebles
77
103
0
02 Jul 2019
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image
  Synthesis
DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
Minfeng Zhu
Pingbo Pan
Wei Chen
Yi Yang
GAN
59
583
0
02 Apr 2019
AttnGAN: Fine-Grained Text to Image Generation with Attentional
  Generative Adversarial Networks
AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
Tao Xu
Pengchuan Zhang
Qiuyuan Huang
Han Zhang
Zhe Gan
Xiaolei Huang
Xiaodong He
GANViT
118
1,722
0
28 Nov 2017
Neural Discrete Representation Learning
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDLSSLOCL
235
5,079
0
02 Nov 2017
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Narain Sohl-Dickstein
Eric A. Weiss
Niru Maheswaranathan
Surya Ganguli
SyDaDiffM
312
7,031
0
12 Mar 2015
1