ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14598
  4. Cited By
Vision + Language Applications: A Survey

Vision + Language Applications: A Survey

24 May 2023
Yutong Zhou
N. Shimada
    VLM
ArXiv (abs)PDFHTMLGithub (2346★)

Papers citing "Vision + Language Applications: A Survey"

50 / 111 papers shown
Title
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li
Xuewen Liu
Dongrong Fu
Jianquan Li
Qingyi Gu
Kurt Keutzer
Zhen Dong
EGVMVGenDiffM
156
2
0
26 Aug 2024
DreamBooth3D: Subject-Driven Text-to-3D Generation
DreamBooth3D: Subject-Driven Text-to-3D Generation
Amit Raj
S. Kaza
Ben Poole
Michael Niemeyer
Nataniel Ruiz
...
Kfir Aberman
Michael Rubinstein
Jonathan T. Barron
Yuanzhen Li
Varun Jampani
DiffM
90
228
0
23 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
100
548
0
07 Mar 2023
Modulating Pretrained Diffusion Models for Multimodal Image Synthesis
Modulating Pretrained Diffusion Models for Multimodal Image Synthesis
Cusuh Ham
James Hays
Jingwan Lu
Krishna Kumar Singh
Zhifei Zhang
Tobias Hinz
DiffM
86
24
0
24 Feb 2023
Adding Conditional Control to Text-to-Image Diffusion Models
Adding Conditional Control to Text-to-Image Diffusion Models
Lvmin Zhang
Anyi Rao
Maneesh Agrawala
AI4CE
182
4,175
1
10 Feb 2023
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image
  Diffusion Models
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
Hila Chefer
Yuval Alaluf
Yael Vinker
Lior Wolf
Daniel Cohen-Or
DiffM
114
515
0
31 Jan 2023
Text-To-4D Dynamic Scene Generation
Text-To-4D Dynamic Scene Generation
Uriel Singer
Shelly Sheynin
Adam Polyak
Oron Ashual
Iurii Makarov
...
Naman Goyal
Andrea Vedaldi
Devi Parikh
Justin Johnson
Yaniv Taigman
DiffM
91
156
0
26 Jan 2023
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and
  Text-to-Image Diffusion Models
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
Jiale Xu
Xintao Wang
Weihao Cheng
Yan-Pei Cao
Ying Shan
Xiaohu Qie
Shenghua Gao
241
165
0
28 Dec 2022
Optimizing Prompts for Text-to-Image Generation
Optimizing Prompts for Text-to-Image Generation
Y. Hao
Zewen Chi
Li Dong
Furu Wei
107
151
0
19 Dec 2022
Training-Free Structured Diffusion Guidance for Compositional
  Text-to-Image Synthesis
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Weixi Feng
Xuehai He
Tsu-Jui Fu
Varun Jampani
Arjun Reddy Akula
P. Narayana
Sugato Basu
Xinze Wang
William Yang Wang
CoGe
130
318
0
09 Dec 2022
Multi-Concept Customization of Text-to-Image Diffusion
Multi-Concept Customization of Text-to-Image Diffusion
Nupur Kumari
Bin Zhang
Richard Y. Zhang
Eli Shechtman
Jun-Yan Zhu
165
875
0
08 Dec 2022
Ensuring Visual Commonsense Morality for Text-to-Image Generation
Ensuring Visual Commonsense Morality for Text-to-Image Generation
Seong-Oak Park
Suhong Moon
Jinkyu Kim
60
2
0
07 Dec 2022
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D
  Generation
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation
Haochen Wang
Xiaodan Du
Jiahao Li
Raymond A. Yeh
Gregory Shakhnarovich
DiffM
146
550
0
01 Dec 2022
Tell Me What Happened: Unifying Text-guided Video Completion via
  Multimodal Masked Video Generation
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Tsu-Jui Fu
Licheng Yu
Ning Zhang
Cheng-Yang Fu
Jong-Chyi Su
William Yang Wang
Sean Bell
VGen
115
38
0
23 Nov 2022
InstructPix2Pix: Learning to Follow Image Editing Instructions
InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
211
1,830
0
17 Nov 2022
Easily Accessible Text-to-Image Generation Amplifies Demographic
  Stereotypes at Large Scale
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale
Federico Bianchi
Pratyusha Kalluri
Esin Durmus
Faisal Ladhak
Myra Cheng
Debora Nozza
Tatsunori Hashimoto
Dan Jurafsky
James Zou
Aylin Caliskan
DiffMVLM
116
315
0
07 Nov 2022
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert
  Denoisers
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji
Seungjun Nah
Xun Huang
Arash Vahdat
Jiaming Song
...
Timo Aila
S. Laine
Bryan Catanzaro
Tero Karras
Xuan Li
VLMMoE
177
828
0
02 Nov 2022
UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal
  Guidance
UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance
Wei Li
Xue Xu
Xinyan Xiao
Jiacheng Liu
Hu Yang
...
Zhanpeng Wang
Zhifan Feng
Qiaoqiao She
Yajuan Lyu
Hua Wu
187
30
0
28 Oct 2022
DiffEdit: Diffusion-based semantic image editing with mask guidance
DiffEdit: Diffusion-based semantic image editing with mask guidance
Guillaume Couairon
Jakob Verbeek
Holger Schwenk
Matthieu Cord
DiffM
145
511
0
20 Oct 2022
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for
  Text-to-Image Generation
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation
Rui Li
Weihua Li
Yi Yang
Hanyu Wei
Jianhua Jiang
Quan-wei Bai
DiffM
122
11
0
18 Oct 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
200
3,500
0
16 Oct 2022
One Model to Edit Them All: Free-Form Text-Driven Image Manipulation
  with Semantic Modulations
One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations
Yi-Chun Zhu
Hongyu Liu
Yibing Song
Ziyang Yuan
Xintong Han
Chun Yuan
Qifeng Chen
Jue Wang
VLMDiffM
92
32
0
14 Oct 2022
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
Ivan Kapelyukh
Vitalis Vosylius
Edward Johns
LM&RoDiffM
203
148
0
05 Oct 2022
Phenaki: Variable Length Video Generation From Open Domain Textual
  Description
Phenaki: Variable Length Video Generation From Open Domain Textual Description
Ruben Villegas
Mohammad Babaeizadeh
Pieter-Jan Kindermans
Hernan Moraldo
Han Zhang
M. Saffar
Santiago Castro
Julius Kunze
D. Erhan
DiffMVGen
136
395
0
05 Oct 2022
Imagen Video: High Definition Video Generation with Diffusion Models
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho
William Chan
Chitwan Saharia
Jay Whang
Ruiqi Gao
...
Diederik P. Kingma
Ben Poole
Mohammad Norouzi
David J. Fleet
Tim Salimans
VGen
171
1,542
0
05 Oct 2022
Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with
  Hierarchical Neural Embeddings
Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings
Tenglong Ao
Qingzhe Gao
Yuke Lou
Baoquan Chen
Libin Liu
SLR
63
63
0
04 Oct 2022
DreamFusion: Text-to-3D using 2D Diffusion
DreamFusion: Text-to-3D using 2D Diffusion
Ben Poole
Ajay Jain
Jonathan T. Barron
B. Mildenhall
174
2,433
0
29 Sep 2022
Human Motion Diffusion Model
Human Motion Diffusion Model
Guy Tevet
Sigal Raab
Brian Gordon
Yonatan Shafir
Daniel Cohen-Or
Amit H. Bermano
DiffMVGen
274
767
0
29 Sep 2022
Make-A-Video: Text-to-Video Generation without Text-Video Data
Make-A-Video: Text-to-Video Generation without Text-Video Data
Uriel Singer
Adam Polyak
Thomas Hayes
Xiaoyue Yin
Jie An
...
Oron Ashual
Oran Gafni
Devi Parikh
Sonal Gupta
Yaniv Taigman
DiffMVGen
83
1,428
0
29 Sep 2022
Best Prompts for Text-to-Image Models and How to Find Them
Best Prompts for Text-to-Image Models and How to Find Them
Nikita Pavlichenko
Dmitry Ustalov
DiffM
74
60
0
23 Sep 2022
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
Mingyuan Zhang
Zhongang Cai
Liang Pan
Fangzhou Hong
Xinying Guo
Lei Yang
Ziwei Liu
DiffMVGen
112
579
0
31 Aug 2022
Robust Sound-Guided Image Manipulation
Robust Sound-Guided Image Manipulation
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
DiffM
72
7
0
30 Aug 2022
Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Frido: Feature Pyramid Diffusion for Complex Scene Image Synthesis
Wanshu Fan
Yen-Chun Chen
Dongdong Chen
Yu Cheng
Lu Yuan
Yu-Chiang Frank Wang
DiffM
78
96
0
29 Aug 2022
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for
  Subject-Driven Generation
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Nataniel Ruiz
Yuanzhen Li
Varun Jampani
Yael Pritch
Michael Rubinstein
Kfir Aberman
279
2,891
0
25 Aug 2022
Prompt-to-Prompt Image Editing with Cross Attention Control
Prompt-to-Prompt Image Editing with Cross Attention Control
Amir Hertz
Ron Mokady
J. Tenenbaum
Kfir Aberman
Yael Pritch
Daniel Cohen-Or
DiffM
206
1,789
0
02 Aug 2022
Text-Guided Synthesis of Artistic Images with Retrieval-Augmented
  Diffusion Models
Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models
Robin Rombach
A. Blattmann
Bjorn Ommer
DiffM
73
71
0
26 Jul 2022
NUWA-Infinity: Autoregressive over Autoregressive Generation for
  Infinite Visual Synthesis
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Chenfei Wu
Jian Liang
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Lijuan Wang
Zicheng Liu
Yuejian Fang
Nan Duan
VGen
75
74
0
20 Jul 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
197
1,133
0
22 Jun 2022
Compositional Visual Generation with Composable Diffusion Models
Compositional Visual Generation with Composable Diffusion Models
Nan Liu
Shuang Li
Yilun Du
Antonio Torralba
J. Tenenbaum
DiffMCoGe
198
529
0
03 Jun 2022
Text2Human: Text-Driven Controllable Human Image Generation
Text2Human: Text-Driven Controllable Human Image Generation
Yuming Jiang
Shuai Yang
Haonan Qiu
Wayne Wu
Chen Change Loy
Ziwei Liu
DiffM
158
46
0
31 May 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via
  Transformers
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
316
631
0
29 May 2022
The Creativity of Text-to-Image Generation
The Creativity of Text-to-Image Generation
J. Oppenlaender
67
199
0
13 May 2022
CogView2: Faster and Better Text-to-Image Generation via Hierarchical
  Transformers
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
Ming Ding
Wendi Zheng
Wenyi Hong
Jie Tang
VLM
117
334
0
28 Apr 2022
TEMOS: Generating diverse human motions from textual descriptions
TEMOS: Generating diverse human motions from textual descriptions
Mathis Petrovich
Michael J. Black
Gül Varol
128
390
0
25 Apr 2022
A Taxonomy of Prompt Modifiers for Text-To-Image Generation
A Taxonomy of Prompt Modifiers for Text-To-Image Generation
J. Oppenlaender
87
106
0
20 Apr 2022
DR-GAN: Distribution Regularization for Text-to-Image Generation
DR-GAN: Distribution Regularization for Text-to-Image Generation
Hongchen Tan
Xiuping Liu
Baocai Yin
Xin Li
GAN
77
39
0
17 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLMDiffM
413
6,916
0
13 Apr 2022
Text2LIVE: Text-Driven Layered Image and Video Editing
Text2LIVE: Text-Driven Layered Image and Video Editing
Omer Bar-Tal
Dolev Ofri-Amar
Rafail Fridman
Yoni Kasten
Tali Dekel
VGenDiffM
97
317
0
05 Apr 2022
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Oran Gafni
Adam Polyak
Oron Ashual
Shelly Sheynin
Devi Parikh
Yaniv Taigman
DiffM
89
524
0
24 Mar 2022
Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic
  Memory
Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory
Lian Siyao
Weijiang Yu
Tianpei Gu
Chunze Lin
Quan Wang
Chao Qian
Chen Change Loy
Ziwei Liu
SLR
131
194
0
24 Mar 2022
123
Next