ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.01952
  4. Cited By
Instruct-Imagen: Image Generation with Multi-modal Instruction

Instruct-Imagen: Image Generation with Multi-modal Instruction

3 January 2024
Hexiang Hu
Kelvin C. K. Chan
Yu-Chuan Su
Wenhu Chen
Yandong Li
Kihyuk Sohn
Yang Zhao
Xue Ben
Boqing Gong
William W. Cohen
Ming-Wei Chang
Xuhui Jia
    MLLM
ArXivPDFHTML

Papers citing "Instruct-Imagen: Image Generation with Multi-modal Instruction"

37 / 37 papers shown
Title
Step1X-Edit: A Practical Framework for General Image Editing
Step1X-Edit: A Practical Framework for General Image Editing
S. Liu
Yucheng Han
Peng Xing
Fukun Yin
Rui Wang
...
Yibo Zhu
Binxing Jiao
X. Zhang
Gang Yu
Daxin Jiang
DiffM
102
3
0
24 Apr 2025
InstructEngine: Instruction-driven Text-to-Image Alignment
InstructEngine: Instruction-driven Text-to-Image Alignment
Xingyu Lu
Y. Hu
Y. Zhang
Kaiyu Jiang
Changyi Liu
...
Bin Wen
C. Yuan
Fan Yang
Tingting Gao
Di Zhang
46
0
0
14 Apr 2025
COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails
COP-GEN-Beta: Unified Generative Modelling of COPernicus Imagery Thumbnails
Miguel Espinosa
V. Marsocci
Yuru Jia
Elliot J. Crowley
Mikolaj Czerkawski
DiffM
49
0
0
11 Apr 2025
Transfer between Modalities with MetaQueries
Transfer between Modalities with MetaQueries
Xichen Pan
Satya Narayan Shukla
Aashu Singh
Zhuokai Zhao
Shlok Kumar Mishra
...
Jiuhai Chen
Kunpeng Li
F. Xu
Ji Hou
Saining Xie
DiffM
46
6
0
08 Apr 2025
Multi-party Collaborative Attention Control for Image Customization
Multi-party Collaborative Attention Control for Image Customization
Han Yang
Chuanguang Yang
Qiuli Wang
Zhulin An
Weilun Feng
Libo Huang
Y. Xu
DiffM
35
0
0
02 Apr 2025
Visual Persona: Foundation Model for Full-Body Human Customization
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam
Soowon Son
Zhan Xu
Jing Shi
Difan Liu
Feng Liu
Aashish Misraa
Seungryong Kim
Yang Zhou
DiffM
49
0
0
19 Mar 2025
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
63
0
0
18 Mar 2025
InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images
Jiun Tian Hoe
Weipeng Hu
Wei Zhou
Chao Xie
Ziwei Wang
Chee Seng Chan
Xudong Jiang
Y. Tan
61
0
0
12 Mar 2025
Personalized Generation In Large Model Era: A Survey
Yiyan Xu
Jinghao Zhang
Alireza Salemi
Xinting Hu
W. Wang
Fuli Feng
Hamed Zamani
Xiangnan He
Tat-Seng Chua
3DV
79
2
0
04 Mar 2025
UniReal: Universal Image Generation and Editing via Learning Real-world
  Dynamics
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics
Xi Chen
Zhifei Zhang
He Zhang
Yuqian Zhou
S. Kim
...
Nanxuan Zhao
Yilin Wang
Hui Ding
Zhe Lin
Hengshuang Zhao
VGen
DiffM
123
21
0
10 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A
  Comprehensive Survey
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
S. Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
101
14
0
03 Dec 2024
KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual
  Entities
KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities
Hsin-Ping Huang
X. Wang
Yonatan Bitton
Hagai Taitelbaum
Gaurav Singh Tomar
...
Xuhui Jia
Kelvin Chan
Hexiang Hu
Yu-Chuan Su
Ming Yang
EGVM
69
4
0
15 Oct 2024
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data
  Generation
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
Yi-Hao Peng
Faria Huq
Yue Jiang
Jason Wu
Amanda Li
Jeffrey P. Bigham
Amy Pavel
DiffM
27
4
0
30 Sep 2024
AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status
AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status
Jinghao Zhang
Wen Qian
Hao Luo
Fan Wang
Feng Zhao
DiffM
28
0
0
26 Sep 2024
Towards General Text-guided Image Synthesis for Customized Multimodal
  Brain MRI Generation
Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation
Yulin Wang
Honglin Xiong
Kaicong Sun
Shuwei Bai
Ling Dai
Zhongxiang Ding
Jiameng Liu
Qian Wang
Qian Liu
Dinggang Shen
MedIm
DiffM
34
1
0
25 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLM
DiffM
57
10
0
23 Sep 2024
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion
  Preimage
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage
Denis Zavadski
Damjan Kalšan
Carsten Rother
DiffM
MDE
22
5
0
13 Sep 2024
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware
  Open-domain Visual Storytelling
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Zilyu Ye
Jinxiu Liu
Ruotian Peng
Jinjin Cao
Zhiyang Chen
...
Mingyuan Zhou
Xiaoqian Shen
Mohamed Elhoseiny
Qi Liu
Guo-Jun Qi
VGen
VLM
32
1
0
07 Aug 2024
Diffusion Feedback Helps CLIP See Better
Diffusion Feedback Helps CLIP See Better
Wenxuan Wang
Quan-Sen Sun
Fan Zhang
Yepeng Tang
Jing Liu
Xinlong Wang
VLM
40
14
0
29 Jul 2024
VIMI: Grounding Video Generation through Multi-modal Instruction
VIMI: Grounding Video Generation through Multi-modal Instruction
Yuwei Fang
Willi Menapace
Aliaksandr Siarohin
Tsai-Shien Chen
Kuan-Chien Wang
Ivan Skorokhodov
Graham Neubig
Sergey Tulyakov
VGen
63
2
0
08 Jul 2024
Lateralization LoRA: Interleaved Instruction Tuning with
  Modality-Specialized Adaptations
Lateralization LoRA: Interleaved Instruction Tuning with Modality-Specialized Adaptations
Zhiyang Xu
Minqian Liu
Ying Shen
Joy Rimchala
Jiaxin Zhang
Qifan Wang
Yu Cheng
Lifu Huang
VLM
39
2
0
04 Jul 2024
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
William Berman
A. Peysakhovich
33
4
0
26 Jun 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
75
31
0
24 Jun 2024
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
X. Wang
Siming Fu
Qihan Huang
Wanggui He
Hao Jiang
DiffM
46
41
0
11 Jun 2024
Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt
Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt
Zhicheng Ding
Panfeng Li
Qikai Yang
Siyang Li
VLM
MLLM
40
18
0
04 Jun 2024
MasterWeaver: Taming Editability and Face Identity for Personalized
  Text-to-Image Generation
MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation
Yuxiang Wei
Zhilong Ji
Jinfeng Bai
Hongzhi Zhang
Lei Zhang
W. Zuo
DiffM
49
0
0
09 May 2024
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Kelvin C. K. Chan
Yang Zhao
Xuhui Jia
Ming-Hsuan Yang
Huisheng Wang
22
3
0
02 May 2024
RoboDreamer: Learning Compositional World Models for Robot Imagination
RoboDreamer: Learning Compositional World Models for Robot Imagination
Siyuan Zhou
Yilun Du
Jiaben Chen
Yandong Li
Dit-Yan Yeung
Chuang Gan
VGen
LM&Ro
76
29
0
18 Apr 2024
Denoising Monte Carlo Renders With Diffusion Models
Denoising Monte Carlo Renders With Diffusion Models
Vaibhav Vavilala
R. Vasanth
David A. Forsyth
DiffM
23
1
0
30 Mar 2024
Reward Guided Latent Consistency Distillation
Reward Guided Latent Consistency Distillation
Jiachen Li
Weixi Feng
Wenhu Chen
William Yang Wang
EGVM
23
11
0
16 Mar 2024
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image
  Diffusion Models
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models
Xinchen Zhang
Ling Yang
Yaqi Cai
Zhaochen Yu
Kai-Ni Wang
...
Ye Tian
Minkai Xu
Yong Tang
Yujiu Yang
Bin Cui
DiffM
31
5
0
20 Feb 2024
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
Yixiao Zhang
Yukara Ikemiya
Gus Xia
Naoki Murata
Marco A. Martínez Ramírez
Wei-Hsiang Liao
Yuki Mitsufuji
Simon Dixon
44
20
0
09 Feb 2024
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image
  Editing
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Kai Zhang
Lingbo Mo
Wenhu Chen
Huan Sun
Yu-Chuan Su
EGVM
111
237
0
16 Jun 2023
Open-domain Visual Entity Recognition: Towards Recognizing Millions of
  Wikipedia Entities
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Hexiang Hu
Yi Luan
Yang Chen
Urvashi Khandelwal
Mandar Joshi
Kenton Lee
Kristina Toutanova
Ming-Wei Chang
VLM
45
55
0
22 Feb 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
519
0
02 Jan 2023
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Wenhu Chen
Hexiang Hu
Chitwan Saharia
William W. Cohen
VLM
125
161
0
29 Sep 2022
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
294
75,800
0
18 May 2015
1