Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.09800
Cited By
v1
v2 (latest)
InstructPix2Pix: Learning to Follow Image Editing Instructions
17 November 2022
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"InstructPix2Pix: Learning to Follow Image Editing Instructions"
50 / 1,418 papers shown
Title
DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer
Junpeng Jiang
Gangyi Hong
Miao Zhang
Hengtong Hu
Kun Zhan
Rui Shao
Liqiang Nie
VGen
92
3
0
28 Apr 2025
SynergyAmodal: Deocclude Anything with Text Control
Xinyang Li
Chengjie Yi
Jiawei Lai
Mingbao Lin
Yansong Qu
Shengchuan Zhang
Liujuan Cao
DiffM
135
0
0
28 Apr 2025
CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes
Tuan Nguyen
Naseem Khan
Issa Khalil
AAML
167
0
0
27 Apr 2025
IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos
Yuan Li
Ziqian Bai
Feitong Tan
Zhaopeng Cui
S. Fanello
Yinda Zhang
DiffM
VGen
138
0
0
27 Apr 2025
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models
Gal Almog
Ariel Shamir
Ohad Fried
DiffM
75
0
0
26 Apr 2025
We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback
Minkyu Choi
S P Sharan
Harsh Goel
Sahil Shah
Sandeep Chinchali
DiffM
VGen
150
1
0
24 Apr 2025
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
Aviv Slobodkin
Hagai Taitelbaum
Yonatan Bitton
Brian Gordon
Michal Sokolik
Nitzan Bitton-Guetta
Almog Gueta
Royi Rassin
Itay Laish
Dani Lischinski
EGVM
VGen
110
0
0
24 Apr 2025
DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing
Aniruddha Bala
Rohit Chowdhury
Rohan Jaiswal
Siddharth Roheda
DiffM
AAML
105
0
0
24 Apr 2025
Step1X-Edit: A Practical Framework for General Image Editing
Shixuan Liu
Yucheng Han
Peng Xing
Fukun Yin
Rui Wang
...
Yibo Zhu
Binxing Jiao
Wei Wei
Gang Yu
Daxin Jiang
DiffM
246
24
0
24 Apr 2025
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
Le Zhuo
Liangbing Zhao
Sayak Paul
Yue Liao
Renrui Zhang
Yi Xin
Peng Gao
Mohamed Elhoseiny
Haoyang Li
VLM
142
3
0
22 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
237
0
0
22 Apr 2025
"I Know It When I See It": Mood Spaces for Connecting and Expressing Visual Concepts
Huzheng Yang
Katherine Xu
Michael D. Grossberg
Yutong Bai
Jianbo Shi
78
0
0
21 Apr 2025
A Controllable Appearance Representation for Flexible Transfer and Editing
Santiago Jimenez-Navarro
Julia Guerrero-Viu
B. Masiá
DiffM
87
0
0
21 Apr 2025
Insert Anything: Image Insertion via In-Context Editing in DiT
Wensong Song
Hong Jiang
Zongxing Yang
Ruijie Quan
Yi Yang
DiffM
124
4
0
21 Apr 2025
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan
Wang Lin
Zhongqi Yue
Tenglong Ao
Liyu Jia
Wei Zhao
Juncheng Billy Li
Siliang Tang
Hanwang Zhang
104
8
0
20 Apr 2025
PRISM: A Unified Framework for Photorealistic Reconstruction and Intrinsic Scene Modeling
Alara Dirik
Tuanfeng Y. Wang
Duygu Ceylan
Stefanos Zafeiriou
Anna Frühstück
DiffM
83
0
0
19 Apr 2025
Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing
Joowon Kim
Ziseok Lee
Donghyeon Cho
Sanghyun Jo
Y. Jung
Kyungsu Kim
Eunho Yang
DiffM
110
0
0
18 Apr 2025
Point-Driven Interactive Text and Image Layer Editing Using Diffusion Models
Zhenyu Yu
Mohd Yamani Idna Idris
Pei Wang
Yuelong Xia
DiffM
72
1
0
18 Apr 2025
Physical Reservoir Computing in Hook-Shaped Rover Wheel Spokes for Real-Time Terrain Identification
Xiao Jin
Zihan Wang
Zhenhua Yu
Changrak Choi
Kalind Carpenter
T. Nanayakkara
76
2
0
17 Apr 2025
ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior
Xiao Han
RunZe Tian
Yifei Tong
Fenggen Yu
Dingyao Liu
Yan Zhang
3DGS
62
0
0
17 Apr 2025
Complex-Edit
\texttt{Complex-Edit}
Complex-Edit
: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark
S. Yang
Mude Hui
Bingchen Zhao
Yuyin Zhou
Nataniel Ruiz
Cihang Xie
CoGe
178
3
0
17 Apr 2025
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models
Guanlong Jiao
Biqing Huang
Kuan-Chieh Wang
Renjie Liao
DiffM
141
0
0
17 Apr 2025
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding
Qianqian Sun
Jixiang Luo
Dell Zhang
Xuelong Li
DiffM
80
0
0
17 Apr 2025
Cobra: Efficient Line Art COlorization with BRoAder References
Junhao Zhuang
Lingen Li
Xuan Ju
Zhaoyang Zhang
Chun Yuan
Ying Shan
DiffM
151
0
0
16 Apr 2025
ACE: Attentional Concept Erasure in Diffusion Models
Finn Carter
DiffM
115
1
0
16 Apr 2025
PCDiff: Proactive Control for Ownership Protection in Diffusion Models with Watermark Compatibility
Keke Gai
Ziyue Shen
Jiahao Yu
Liehuang Zhu
Qi Wu
WIGM
118
0
0
16 Apr 2025
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
195
0
0
15 Apr 2025
ViMo: A Generative Visual GUI World Model for App Agents
Dezhao Luo
Bohan Tang
Kang Li
Georgios Papoudakis
Jifei Song
S. Gong
Haifeng Zhang
Jun Wang
Kun Shao
LM&Ro
VGen
183
1
0
15 Apr 2025
Omni
2
^2
2
: Unifying Omnidirectional Image Generation and Editing in an Omni Model
Liu Yang
Huiyu Duan
Yucheng Zhu
Xiaohong Liu
Lu Liu
Zitong Xu
Guangji Ma
Xiongkuo Min
Guangtao Zhai
P. Callet
VLM
VGen
441
2
0
15 Apr 2025
Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese Dishes
Huijie Liu
Bingcan Wang
Jie Hu
Xiaoming Wei
Guoliang Kang
138
0
0
14 Apr 2025
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Taihang Hu
Linxuan Li
Kai Wang
Yaxing Wang
Jian Yang
Ming-Ming Cheng
DiffM
VGen
97
0
0
14 Apr 2025
SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow
Kenan Tang
Yanhong Li
Yao Qin
DiffM
93
0
0
13 Apr 2025
Towards Explainable Partial-AIGC Image Quality Assessment
Jiaying Qian
Ziheng Jia
Zicheng Zhang
Zeyu Zhang
Guangtao Zhai
Xiongkuo Min
71
0
0
12 Apr 2025
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation
Linyan Huang
Haonan Lin
Yanning Zhou
Kaiwen Xiao
105
1
0
10 Apr 2025
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment
Jiayang Sun
Hongru Wang
Jie Cao
Huaibo Huang
Ran He
DiffM
114
0
0
10 Apr 2025
POEM: Precise Object-level Editing via MLLM control
Marco Schouten
Mehmet Onurcan Kaya
Serge Belongie
Dim P. Papadopoulos
DiffM
103
0
0
10 Apr 2025
Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu
Jaskirat Singh
Zhaoyuan Yang
Peter Tu
Jing Zhang
Hongdong Li
Richard Hartley
Dylan Campbell
DiffM
143
1
0
09 Apr 2025
IGG: Image Generation Informed by Geodesic Dynamics in Deformation Spaces
Nian Wu
Nivetha Jayakumar
Jiarui Xing
Miaomiao Zhang
100
0
0
09 Apr 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
Jifang Wang
Xue Yang
Longyue Wang
Zhenran Xu
Yansen Wang
Yaowei Wang
Weihua Luo
Kaifu Zhang
Baotian Hu
Min Zhang
EGVM
DiffM
133
2
0
09 Apr 2025
A Training-Free Style-aligned Image Generation with Scale-wise Autoregressive Model
Jihun Park
Jongmin Gim
Kyoungmin Lee
Minseok Oh
Minwoo Choi
Jaeyeul Kim
Woo Chool Park
Sunghoon Im
DiffM
75
0
0
08 Apr 2025
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model
Qi Mao
Lawrence Yunliang Chen
Yuchao Gu
Mike Zheng Shou
Ming-Hsuan Yang
DiffM
80
0
0
08 Apr 2025
D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition
Rupayan Mallick
Sibo Dong
Nataniel Ruiz
Sarah Adel Bargal
DiffM
249
0
0
08 Apr 2025
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
Junxi Chen
Junhao Dong
Xiaohua Xie
89
0
0
08 Apr 2025
Transfer between Modalities with MetaQueries
Xichen Pan
Satya Narayan Shukla
Aashu Singh
Zhuokai Zhao
Shlok Kumar Mishra
...
Jiuhai Chen
Kunpeng Li
F. Xu
Ji Hou
Saining Xie
DiffM
102
21
0
08 Apr 2025
CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models
Kavana Venkatesh
Connor Dunlop
Pinar Yanardag
DiffM
82
2
0
07 Apr 2025
PartStickers: Generating Parts of Objects for Rapid Prototyping
Mo Zhou
Josh Myers-Dean
Danna Gurari
102
0
0
07 Apr 2025
Disentangling Instruction Influence in Diffusion Transformers for Parallel Multi-Instruction-Guided Image Editing
Hui Liu
Bin Zou
Suiyun Zhang
Kecheng Chen
Rui Liu
Haoliang Li
DiffM
136
0
0
07 Apr 2025
TactileNet: Bridging the Accessibility Gap with AI-Generated Tactile Graphics for Individuals with Vision Impairment
Adnan Khan
Alireza Choubineh
Mai A. Shaaban
Abbas Akkasi
Majid Komeili
DiffM
125
0
0
07 Apr 2025
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision
Yuandong Pu
Le Zhuo
Kaiwen Zhu
Liangbin Xie
Wenlong Zhang
Xiangyu Chen
Peng Gao
Yu Qiao
Chao Dong
Yihao Liu
MLLM
99
2
0
07 Apr 2025
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
Wulin Xie
Yize Zhang
Chaoyou Fu
Yang Shi
Bingyan Nie
Hongkai Chen
Zheng Zhang
Liang Wang
Tieniu Tan
98
3
0
04 Apr 2025
Previous
1
2
3
4
5
6
...
27
28
29
Next