Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.03206
Cited By
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
5 March 2024
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
Harry Saini
Yam Levi
Dominik Lorenz
Axel Sauer
Frederic Boesel
Dustin Podell
Tim Dockhorn
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Rectified Flow Transformers for High-Resolution Image Synthesis"
50 / 826 papers shown
Title
MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation
Yi Wang
Mushui Liu
Wanggui He
Longxiang Zhang
Z. Huang
...
Yiming Li
Weilong Dai
Mingli Song
Jie Song
Hao Jiang
MLLM
MoE
LRM
86
1
0
03 Mar 2025
Generalized Diffusion Detector: Mining Robust Features from Diffusion Models for Domain-Generalized Detection
Boyong He
Yuxiang Ji
Qianwen Ye
Zhuoyue Tan
Liaoni Wu
DiffM
77
0
0
03 Mar 2025
How simple can you go? An off-the-shelf transformer approach to molecular dynamics
Max Eissler
Tim Korjakow
Stefan Ganscha
Oliver T. Unke
Klaus-Robert Müller
Stefan Gugler
63
1
0
03 Mar 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li
Boyang Li
CoGe
73
0
0
03 Mar 2025
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
Kaiwen Zheng
Yongxin Chen
Huayu Chen
Guande He
Xuan Li
Jun Zhu
Qinsheng Zhang
DiffM
49
0
0
03 Mar 2025
Proteina: Scaling Flow-based Protein Structure Generative Models
Tomas Geffner
Kieran Didi
Zuobai Zhang
Danny Reidenbach
Zhonglin Cao
...
Mario Geiger
Christian Dallago
E. Küçükbenli
Arash Vahdat
Karsten Kreis
DiffM
AI4CE
49
4
0
02 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
55
0
0
02 Mar 2025
Learning to Animate Images from A Few Videos to Portray Delicate Human Actions
Haoxin Li
Yingchen Yu
Qilong Wu
Hanwang Zhang
Boyang Li
Song Bai
3DH
VGen
183
0
0
01 Mar 2025
Leveraging Compute-in-Memory for Efficient Generative Model Inference in TPUs
Zhantong Zhu
Hongou Li
Wenjie Ren
Meng Wu
Le Ye
Ru Huang
Tianyu Jia
46
0
0
01 Mar 2025
Spatial Reasoning with Denoising Models
Christopher Wewer
Bart Pogodzinski
Bernt Schiele
J. E. Lenssen
DiffM
LRM
43
0
0
28 Feb 2025
Generative Uncertainty in Diffusion Models
Metod Jazbec
Eliot Wong-Toi
Guoxuan Xia
Dan Zhang
Eric T. Nalisnick
Stephan Mandt
DiffM
49
0
0
28 Feb 2025
Diffusion Restoration Adapter for Real-World Image Restoration
Hanbang Liang
Zhen Wang
Weihui Deng
DiffM
41
0
0
28 Feb 2025
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Sotiris Anagnostidis
Gregor Bachmann
Yeongmin Kim
Jonas Kohler
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Albert Pumarola
Ali K. Thabet
Edgar Schönfeld
92
0
0
27 Feb 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
83
6
0
27 Feb 2025
SubZero: Composing Subject, Style, and Action via Zero-Shot Personalization
Shubhankar Borse
K. Bhardwaj
Mohammad Reza Karimi Dastjerdi
Hyojin Park
Shreya Kadambi
...
Prathamesh Mandke
Ankita Nayak
Harris Teague
Munawar Hayat
Fatih Porikli
DiffM
84
1
0
27 Feb 2025
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think
L. Chen
S. Bai
Wenhao Chai
Weichu Xie
Haozhe Zhao
Leon Vinci
Junyang Lin
Baobao Chang
DiffM
92
4
0
27 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
90
3
0
26 Feb 2025
FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models via Adaptive Latent Guidance
Mintong Kang
Vinayshekhar Bannihatti Kumar
Shamik Roy
Abhishek Kumar
Sopan Khosla
Balakrishnan Narayanaswamy
Rashmi Gangadharaiah
50
0
0
25 Feb 2025
SYNTHIA: Novel Concept Design with Affordance Composition
Xiaomeng Jin
Hyeonjeong Ha
Jeonghwan Kim
Jiaheng Liu
Zhenhailong Wang
Khanh Duy Nguyen
Ansel Blume
Nanyun Peng
Kai-Wei Chang
Heng Ji
DiffM
195
0
0
25 Feb 2025
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation
Pengzhi Li
Pengfei Yu
Zide Liu
Wei He
Xuhao Pan
Xudong Rao
Tao Wei
Wei Chen
VLM
60
0
0
25 Feb 2025
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Yifan Pu
Yiming Zhao
Zhicong Tang
Ruihong Yin
Haoxing Ye
...
Ji Li
Xiu Li
Zheng Lian
Gao Huang
Baining Guo
DiffM
62
2
0
25 Feb 2025
Contrastive Visual Data Augmentation
Yu Zhou
B. Li
Mohan Tang
Xiaomeng Jin
Te-Lin Wu
Kuan-Hao Huang
Heng Ji
Kai-Wei Chang
Nanyun Peng
59
0
0
24 Feb 2025
TraFlow: Trajectory Distillation on Pre-Trained Rectified Flow
Zhangkai Wu
Xuhui Fan
Hongyu Wu
Longbing Cao
44
0
0
24 Feb 2025
BundleFlow: Deep Menus for Combinatorial Auctions by Diffusion-Based Optimization
Tonghan Wang
Yanchen Jiang
David C. Parkes
84
0
0
24 Feb 2025
Disentangling Visual Transformers: Patch-level Interpretability for Image Classification
Guillaume Jeanneret
Loïc Simon
F. Jurie
ViT
58
0
0
24 Feb 2025
CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models
Shunchang Liu
Zhuan Shi
Lingjuan Lyu
Yaochu Jin
Boi Faltings
66
2
0
24 Feb 2025
On Computational Limits of FlowAR Models: Expressivity and Efficiency
Chengyue Gong
Yekun Ke
Xiaoyu Li
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
74
3
0
23 Feb 2025
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Lijun Li
Zhelun Shi
Xuhao Hu
Bowen Dong
Yiran Qin
Xihui Liu
Lu Sheng
Jing Shao
114
1
0
21 Feb 2025
A Critical Assessment of Modern Generative Models' Ability to Replicate Artistic Styles
Andrea Asperti
Franky George
Tiberio Marras
Razvan Ciprian Stricescu
Fabio Zanotti
EGVM
49
0
0
21 Feb 2025
CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers
D. She
Mushui Liu
Jingxuan Pang
Jin Wang
Zhen Yang
...
Yi Wang
Qihan Huang
Haobin Tang
YunLong Yu
Siming Fu
VGen
96
4
0
21 Feb 2025
Text-to-Image Rectified Flow as Plug-and-Play Priors
Xiaofeng Yang
Cheng Chen
Xulei Yang
Fayao Liu
Guosheng Lin
DiffM
73
7
0
21 Feb 2025
Generative Video Semantic Communication via Multimodal Semantic Fusion with Large Model
Hang Yin
Li Qiao
Yu Ma
Shuo Sun
Kan Li
Zhen Gao
Dusit Niyato
DiffM
VGen
213
0
0
20 Feb 2025
Architect of the Bits World: Masked Autoregressive Modeling for Circuit Generation Guided by Truth Table
Haoyuan Wu
Haisheng Zheng
Shoubo Hu
Zhuolun He
Bei Yu
53
0
0
18 Feb 2025
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation
Xinlong Chen
Yang Zhang
Chongling Rao
Yushuo Guan
Jiaheng Liu
Fuzheng Zhang
Chengru Song
Qiang Liu
Di Zhang
Tieniu Tan
17
0
0
18 Feb 2025
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
Jingcheng Ni
Yuxin Guo
Yichen Liu
Rui Chen
Lewei Lu
Z. Wu
DiffM
VGen
73
3
0
17 Feb 2025
Precise Parameter Localization for Textual Generation in Diffusion Models
Łukasz Staniszewski
Bartosz Cywiñski
Franziska Boenisch
Kamil Deja
Adam Dziedzic
DiffM
208
0
0
17 Feb 2025
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Theodoros Kouzelis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DRL
80
5
0
17 Feb 2025
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation
Taeyoung Yun
Dinghuai Zhang
Jinkyoo Park
Ling Pan
DiffM
84
2
0
17 Feb 2025
ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations
Bowen Jiang
Yuan Yuan
Xinyi Bai
Zhuoqun Hao
Alyson Yin
Yaojie Hu
Wenyu Liao
Lyle Ungar
Camillo J Taylor
DiffM
53
1
0
16 Feb 2025
Phantom: Subject-consistent video generation via cross-modal alignment
Lijie Liu
Tianxiang Ma
Bingchuan Li
Zhuowei Chen
Jiawei Liu
Qian He
Xinglong Wu
Qian He
Xinglong Wu
DiffM
VGen
52
5
0
16 Feb 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
Guoqing Ma
Haoyang Huang
K. Yan
L. Chen
Nan Duan
...
Yansen Wang
Yuanwei Lu
Yu-Cheng Chen
Yu-Juan Luo
Y. Luo
DiffM
VGen
175
18
0
14 Feb 2025
Automatic Evaluation Metrics for Artificially Generated Scientific Research
Niklas Höpner
Leon Eshuijs
Dimitrios Alivanistos
Giacomo Zamprogno
Ilaria Tiddi
54
0
0
14 Feb 2025
Diffusion Models Through a Global Lens: Are They Culturally Inclusive?
Zahra Bayramli
Ayhan Suleymanzade
Na Min An
Huzama Ahmad
Eunsu Kim
Junyeong Park
James Thorne
Alice H. Oh
91
0
0
13 Feb 2025
Designing a Conditional Prior Distribution for Flow-Based Generative Models
Noam Issachar
Mohammad Salama
Raanan Fattal
Sagie Benaim
91
0
0
13 Feb 2025
E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization
T. Pham
Zhang Kang
Ji Woo Hong
Xuran Zheng
Chang D. Yoo
82
0
0
13 Feb 2025
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
H. Seo
Wongi Jeong
Jae-sun Seo
Se Young Chun
62
0
0
12 Feb 2025
MatSwap: Light-aware material transfers in images
Ivan Lopes
Valentin Deschaintre
Yannick Hold-Geoffroy
Raoul de Charette
DiffM
87
0
0
11 Feb 2025
Understanding Classifier-Free Guidance: High-Dimensional Theory and Non-Linear Generalizations
Krunoslav Lehman Pavasovic
Jakob Verbeek
Giulio Biroli
Marc Mézard
64
0
0
11 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Zhengyuan Yang
Mike Zheng Shou
MoE
78
0
0
10 Feb 2025
Dual Caption Preference Optimization for Diffusion Models
Amir Saeidi
Yiran Luo
Agneet Chatterjee
Shamanthak Hegde
Bimsara Pathiraja
Yezhou Yang
Chitta Baral
DiffM
63
0
0
09 Feb 2025
Previous
1
2
3
...
7
8
9
...
15
16
17
Next