Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.03206
Cited By
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
5 March 2024
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
Harry Saini
Yam Levi
Dominik Lorenz
Axel Sauer
Frederic Boesel
Dustin Podell
Tim Dockhorn
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Rectified Flow Transformers for High-Resolution Image Synthesis"
50 / 808 papers shown
Title
DiTPainter: Efficient Video Inpainting with Diffusion Transformers
Xian Wu
Chang Liu
DiffM
26
0
0
20 May 2025
DynamiCtrl: Rethinking the Basic Structure and the Role of Text for High-quality Human Image Animation
Haoyu Zhao
Zhongang Qi
Cong Wang
Qingping Zheng
Guansong Lu
Fei Chen
Hang Xu
Zuxuan Wu
DiffM
VGen
46
0
0
20 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
D. Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
51
0
0
20 May 2025
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Junsong Chen
Shuchen Xue
Yuyang Zhao
Jincheng Yu
Sayak Paul
Junyu Chen
Han Cai
E. Xie
Song Han
VLM
66
2
0
20 May 2025
InstanceGen: Image Generation with Instance-level Instructions
Etai Sella
Yanir Kleiman
Hadar Averbuch-Elor
33
0
0
20 May 2025
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
84
2
0
20 May 2025
DRAGON: A Large-Scale Dataset of Realistic Images Generated by Diffusion Models
Giulia Bertazzini
Daniele Baracchi
D. Shullani
Isao Echizen
A. Piva
17
0
0
16 May 2025
PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment
Dingbang Huang
Wenbo Li
Yifei Zhao
Xinyu Pan
Yanhong Zeng
Bo Dai
DiffM
7
0
0
16 May 2025
CROC: Evaluating and Training T2I Metrics with Pseudo- and Human-Labeled Contrastive Robustness Checks
Christoph Leiter
Yuki M. Asano
M. Keuper
Steffen Eger
12
0
0
16 May 2025
FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation
Jun Guo
Xiaojian Ma
Yikai Wang
Min Yang
Huaping Liu
Qing Li
VGen
32
0
0
15 May 2025
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang
Boyang Zheng
Xichen Pan
Sayak Paul
Saining Xie
29
0
0
15 May 2025
Path Gradients after Flow Matching
Lorenz Vaitl
Leon Klein
14
0
0
15 May 2025
MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation
Yanbo Ding
Xirui Hu
Zhizhi Guo
Y. Wang
DiffM
VGen
31
0
0
15 May 2025
Fast Text-to-Audio Generation with Adversarial Post-Training
Zachary Novack
Zach Evans
Zack Zukowski
Josiah Taylor
CJ Carr
...
Adnan Al-Sinan
Gian Marco Iodice
Julian McAuley
Taylor Berg-Kirkpatrick
Jordi Pons
30
0
0
13 May 2025
H
3
^{\mathbf{3}}
3
DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
Yiyang Lu
Yufeng Tian
Zhecheng Yuan
X. Wang
Pu Hua
Zhengrong Xue
Huazhe Xu
31
0
0
12 May 2025
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
Ozgur Kara
Krishna Kumar Singh
Feng Liu
Duygu Ceylan
James M. Rehg
Tobias Hinz
DiffM
VGen
41
0
0
12 May 2025
FLUXSynID: A Framework for Identity-Controlled Synthetic Face Generation with Document and Live Images
Raul Ismayilov
Dzemila Sero
Luuk Spreeuwers
29
0
0
12 May 2025
Addressing degeneracies in latent interpolation for diffusion models
Erik Landolsi
Fredrik Kahl
DiffM
45
0
0
12 May 2025
Improving Trajectory Stitching with Flow Models
Reece O'Mahoney
Wanming Yu
Ioannis Havoutis
33
0
0
12 May 2025
DanceGRPO: Unleashing GRPO on Visual Generation
Zeyue Xue
Jie Wu
Yu Gao
Fangyuan Kong
Lingting Zhu
...
Zhiheng Liu
Wei Liu
Qiushan Guo
Weilin Huang
Ping Luo
EGVM
VGen
52
0
0
12 May 2025
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts
Hongkun Dou
Zeyu Li
Xingyu Jiang
H. Li
Lijun Yang
Wen Yao
Yue Deng
DiffM
38
0
0
12 May 2025
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Zhiyuan Chen
Keyi Li
Yifan Jia
Le Ye
Yufei Ma
DiffM
35
0
0
09 May 2025
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung
Jiachen Liu
Jeff J. Ma
Ruofan Wu
Oh Jun Kweon
Yuxuan Xia
Zhiyu Wu
Mosharaf Chowdhury
28
0
0
09 May 2025
From Pixels to Perception: Interpretable Predictions via Instance-wise Grouped Feature Selection
Moritz Vandenhirtz
Julia E. Vogt
38
0
0
09 May 2025
Does CLIP perceive art the same way we do?
Andrea Asperti
Leonardo Dessì
Maria Chiara Tonetti
Nico Wu
48
0
0
08 May 2025
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu
Gongye Liu
Jiajun Liang
Y. Li
Jiaheng Liu
X. Wang
Pengfei Wan
Di Zhang
Wanli Ouyang
AI4CE
68
0
0
08 May 2025
Defining and Quantifying Creative Behavior in Popular Image Generators
Aditi Ramaswamy
Hana Chockler
Melane Navaratnarajah
31
0
0
07 May 2025
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
Zilong Chen
Yikai Wang
Wenqiang Sun
Feng Wang
Yiwen Chen
Huaping Liu
34
0
0
07 May 2025
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Divyansh Srivastava
Xiang Zhang
He Wen
Chenru Wen
Zhuowen Tu
DiffM
34
0
0
07 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffM
VGen
57
0
0
07 May 2025
FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios
Shiyi Zhang
Junhao Zhuang
Zhaoyang Zhang
Ying Shan
Yansong Tang
VGen
102
0
0
06 May 2025
FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing
Rui Lan
Y. Bai
Xu Duan
M. Li
Lei Sun
X. Chu
DiffM
134
0
0
06 May 2025
Distribution-Conditional Generation: From Class Distribution to Creative Generation
Fu Feng
Yucheng Xie
Xu Yang
Jing Wang
Xin Geng
DiffM
31
0
0
06 May 2025
Multimodal Benchmarking and Recommendation of Text-to-Image Generation Models
Kapil Wanaskar
Gaytri Jena
Magdalini Eirinaki
EGVM
31
0
0
06 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing
Zinan Guo
Pengze Zhang
Yanze Wu
Chong Mou
Songtao Zhao
Qian He
30
0
0
05 May 2025
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li
Xin Gu
Fan Chen
X. Xing
Longyin Wen
C. L. P. Chen
Sijie Zhu
DiffM
81
1
0
05 May 2025
Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Biao Gong
Cheng Zou
Dandan Zheng
Hu Yu
Jingdong Chen
...
Qingpei Guo
Rui Liu
Weilong Chai
Xinyu Xiao
Ziyuan Huang
MLLM
79
1
0
05 May 2025
T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models
Yunfeng Ge
Jiawei Li
Yiji Zhao
Haomin Wen
Zhao Li
M. Qiu
H. Li
Ming Jin
Shirui Pan
DiffM
134
0
0
05 May 2025
VSC: Visual Search Compositional Text-to-Image Diffusion Model
Do Huu Dat
Nam Hyeonu
Po Yuan Mao
Tae-Hyun Oh
DiffM
CoGe
64
0
0
02 May 2025
Improving Editability in Image Generation with Layer-wise Memory
Daneul Kim
Jaeah Lee
Jaesik Park
DiffM
KELM
60
0
0
02 May 2025
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
Kwon Byung-Ki
Qi Dai
Lee Hyoseok
Chong Luo
Tae-Hyun Oh
71
0
0
01 May 2025
Multi-Modal Language Models as Text-to-Image Model Evaluators
Jiahui Chen
Candace Ross
Reyhane Askari Hemmat
Koustuv Sinha
Melissa Hall
M. Drozdzal
Adriana Romero-Soriano
EGVM
60
0
0
01 May 2025
Why Compress What You Can Generate? When GPT-4o Generation Ushers in Image Compression Fields
Yixin Gao
Xiaohan Pan
X. Li
Zhibo Chen
51
0
0
30 Apr 2025
ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction
Qihao Liu
Ju He
Qihang Yu
Liang-Chieh Chen
Alan Yuille
DiffM
VGen
78
0
0
30 Apr 2025
Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing
Hong Zhang
Zhongjie Duan
Xingjun Wang
Yuze Zhao
Weiyi Lu
Zhipeng Di
Y. Xu
Yingda Chen
Yu Zhang
MLLM
94
1
0
30 Apr 2025
AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images
Yunhao Li
Sijing Wu
Wei Sun
Zhichao Zhang
Yucheng Zhu
Zicheng Zhang
Huiyu Duan
Xiongkuo Min
Guangtao Zhai
EGVM
90
0
0
30 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
X. Li
MLLM
71
0
0
29 Apr 2025
TrueFake: A Real World Case Dataset of Last Generation Fake Images also Shared on Social Networks
S. Dell’Anna
Andrea Montibeller
Giulia Boato
57
0
0
29 Apr 2025
ADiff4TPP: Asynchronous Diffusion Models for Temporal Point Processes
Amartya Mukherjee
Ruizhi Deng
He Zhao
Yuzhen Mao
Leonid Sigal
Frederick Tung
DiffM
AI4TS
53
0
0
29 Apr 2025
1
2
3
4
...
15
16
17
Next