ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.03206
  4. Cited By
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

5 March 2024
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
Harry Saini
Yam Levi
Dominik Lorenz
Axel Sauer
Frederic Boesel
Dustin Podell
Tim Dockhorn
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
    DiffM
ArXivPDFHTML

Papers citing "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis"

50 / 818 papers shown
Title
Stretching Each Dollar: Diffusion Training from Scratch on a
  Micro-Budget
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag
Xianghao Kong
Jingtao Li
Michael Spranger
Lingjuan Lyu
DiffM
47
9
0
22 Jul 2024
DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving
DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving
Jiahang Tu
Wei Ji
Han Zhao
Chao Zhang
Roger Zimmermann
Hui Qian
38
5
0
22 Jul 2024
Discrete Flow Matching
Discrete Flow Matching
Itai Gat
Tal Remez
Neta Shaul
Felix Kreuk
Ricky T. Q. Chen
Gabriel Synnaeve
Yossi Adi
Y. Lipman
DiffM
52
57
0
22 Jul 2024
Stable Audio Open
Stable Audio Open
Zach Evans
Julian Parker
CJ Carr
Zack Zukowski
Josiah Taylor
Jordi Pons
75
38
0
19 Jul 2024
I2AM: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
I2AM: Interpreting Image-to-Image Latent Diffusion Models via Bi-Attribution Maps
Junseo Park
Hyeryung Jang
81
0
0
17 Jul 2024
Scaling Diffusion Transformers to 16 Billion Parameters
Scaling Diffusion Transformers to 16 Billion Parameters
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Junshi Huang
DiffM
MoE
65
16
0
16 Jul 2024
DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised
  Pre-Training
DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training
Guillermo Jiménez-Pérez
Pedro Osório
Josef Cersovsky
Javier Montalt-Tordera
Jens Hooge
Steffen Vogler
Sadegh Mohammadi
MedIm
43
2
0
16 Jul 2024
Exploring the Potentials and Challenges of Deep Generative Models in
  Product Design Conception
Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
Phillip Mueller
Lars Mikelsons
AI4CE
41
1
0
15 Jul 2024
Several questions of visual generation in 2024
Several questions of visual generation in 2024
Shuyang Gu
32
1
0
11 Jul 2024
Generative Image as Action Models
Generative Image as Action Models
Mohit Shridhar
Yat Long Lo
Stephen James
43
9
0
10 Jul 2024
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image
  Synthesis
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Wanggui He
Siming Fu
Mushui Liu
Xierui Wang
Wenyi Xiao
...
Zhelun Yu
Haoyuan Li
Ziwei Huang
Leilei Gan
Hao Jiang
DiffM
24
23
0
10 Jul 2024
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning
  Instruction Using Language Model
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Wenqi Zhang
Zhenglin Cheng
Yuanyu He
Mengna Wang
Yongliang Shen
...
Guiyang Hou
Mingqian He
Yanna Ma
Weiming Lu
Yueting Zhuang
SyDa
74
9
0
09 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
66
4
0
09 Jul 2024
MiraData: A Large-Scale Video Dataset with Long Durations and Structured
  Captions
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
Xuan Ju
Yiming Gao
Zhaoyang Zhang
Ziyang Yuan
Xintao Wang
Ailing Zeng
Yu Xiong
Qiang Xu
Ying Shan
VGen
77
39
0
08 Jul 2024
LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video
  Reconstruction
LaSe-E2V: Towards Language-guided Semantic-Aware Event-to-Video Reconstruction
Kanghao Chen
Hangyu Li
Jiazhou Zhou
Zeyu Wang
Lin Wang
DiffM
VGen
41
2
0
08 Jul 2024
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
Haozhe Zhao
Xiaojian Ma
Liang Chen
Shuzheng Si
Rujie Wu
Kaikai An
Peiyu Yu
Minjia Zhang
Qing Li
Baobao Chang
36
43
0
07 Jul 2024
Replication in Visual Diffusion Models: A Survey and Outlook
Replication in Visual Diffusion Models: A Survey and Outlook
Wenhao Wang
Yifan Sun
Zongxin Yang
Zhengdong Hu
Zhentao Tan
Yi Yang
86
7
0
07 Jul 2024
Improved Noise Schedule for Diffusion Training
Improved Noise Schedule for Diffusion Training
Tiankai Hang
Shuyang Gu
DiffM
18
9
0
03 Jul 2024
Consistency Flow Matching: Defining Straight Flows with Velocity
  Consistency
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
Ling Yang
Zixiang Zhang
Zhilong Zhang
Xingchao Liu
Minkai Xu
Wentao Zhang
Chenlin Meng
Stefano Ermon
Bin Cui
44
18
0
02 Jul 2024
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan
Rui Xie
Penghao Zhou
Tiehan Fan
Zhenheng Yang
Zhijie Chen
Xiang Li
Jian Yang
Ying Tai
83
70
0
02 Jul 2024
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
Jian Ma
Yonglin Deng
Chen Chen
H. Lu
Zhenyu Yang
Zhenyu Yang
VLM
DiffM
97
6
0
02 Jul 2024
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
William Berman
A. Peysakhovich
36
4
0
26 Jun 2024
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers
Lei Chen
Yuan Meng
Chen Tang
Xinzhu Ma
Jingyan Jiang
Xin Wang
Zhi Wang
Wenwu Zhu
MQ
31
23
0
25 Jun 2024
Identifying and Solving Conditional Image Leakage in Image-to-Video
  Diffusion Model
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model
Min Zhao
Hongzhou Zhu
Chendong Xiang
Kaiwen Zheng
Chongxuan Li
Jun Zhu
69
8
0
22 Jun 2024
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Fantastic Copyrighted Beasts and How (Not) to Generate Them
Luxi He
Yangsibo Huang
Weijia Shi
Tinghao Xie
Haotian Liu
Yue Wang
Luke Zettlemoyer
Chiyuan Zhang
Danqi Chen
Peter Henderson
46
9
0
20 Jun 2024
Conditional score-based diffusion models for solving inverse problems in
  mechanics
Conditional score-based diffusion models for solving inverse problems in mechanics
Agnimitra Dasgupta
Harisankar Ramaswamy
Javier Murgoitio-Esandi
Ken Foo
Runze Li
Qifa Zhou
Brendan Kennedy
Assad A. Oberai
DiffM
MedIm
45
2
0
19 Jun 2024
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation
  Models
GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models
Yongtao Ge
Guangkai Xu
Zhiyue Zhao
Libo Sun
Zheng Huang
Yanlong Sun
Hao Chen
Chunhua Shen
MDE
42
3
0
18 Jun 2024
Learning Diffusion at Lightspeed
Learning Diffusion at Lightspeed
Antonio Terpin
Nicolas Lanzetti
Florian Dorfler
DiffM
43
7
0
18 Jun 2024
Exploring the Role of Large Language Models in Prompt Encoding for
  Diffusion Models
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
Bingqi Ma
Zhuofan Zong
Guanglu Song
Hongsheng Li
Yu Liu
38
21
0
17 Jun 2024
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image
  Models
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
Fanqing Meng
Wenqi Shao
Lixin Luo
Yahong Wang
Yiran Chen
...
Yue Yang
Tianshuo Yang
Kaipeng Zhang
Yu Qiao
Ping Luo
EGVM
44
8
0
17 Jun 2024
Diffusion Models in Low-Level Vision: A Survey
Diffusion Models in Low-Level Vision: A Survey
Chunming He
Yuqi Shen
Chengyu Fang
Fengyang Xiao
Longxiang Tang
Yulun Zhang
W. Zuo
Zhenhua Guo
Xiu Li
VLM
DiffM
MedIm
82
33
0
17 Jun 2024
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Han-Hung Lee
Yiming Zhang
Angel X. Chang
3DPC
48
3
0
17 Jun 2024
Poetry2Image: An Iterative Correction Framework for Images Generated
  from Chinese Classical Poetry
Poetry2Image: An Iterative Correction Framework for Images Generated from Chinese Classical Poetry
Jing Jiang
Yiran Ling
Binzhu Li
Pengxiang Li
Junming Piao
Yu Zhang
EGVM
DiffM
37
1
0
15 Jun 2024
Consistency-diversity-realism Pareto fronts of conditional image
  generative models
Consistency-diversity-realism Pareto fronts of conditional image generative models
Pietro Astolfi
Marlene Careil
Melissa Hall
Oscar Manas
Matthew Muckley
Jakob Verbeek
Adriana Romero Soriano
M. Drozdzal
51
10
0
14 Jun 2024
From Pixels to Prose: A Large Dataset of Dense Image Captions
From Pixels to Prose: A Large Dataset of Dense Image Captions
Vasu Singla
Kaiyu Yue
Sukriti Paul
Reza Shirkavand
Mayuka Jayawardhana
Alireza Ganjdanesh
Heng Huang
A. Bhatele
Gowthami Somepalli
Tom Goldstein
3DV
VLM
36
22
0
14 Jun 2024
LRM-Zero: Training Large Reconstruction Models with Synthesized Data
LRM-Zero: Training Large Reconstruction Models with Synthesized Data
Desai Xie
Sai Bi
Zhixin Shu
Kai Zhang
Zexiang Xu
Yi Zhou
Soren Pirk
Arie E. Kaufman
Xin Sun
Hao Tan
SyDa
56
14
0
13 Jun 2024
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal
  Prompts
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts
Yucheng Han
Rui Wang
Chi Zhang
Juntao Hu
Pei Cheng
Bin-Bin Fu
Hanwang Zhang
75
6
0
13 Jun 2024
FouRA: Fourier Low Rank Adaptation
FouRA: Fourier Low Rank Adaptation
Shubhankar Borse
Shreya Kadambi
N. Pandey
Kartikeya Bhardwaj
Viswanath Ganapathy
Sweta Priyadarshi
Risheek Garrepalli
Rafael Esteves
Munawar Hayat
Fatih Porikli
42
6
0
13 Jun 2024
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
  Video Generation
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Kai Wang
Shijian Deng
Jing Shi
Dimitrios Hatzinakos
Yapeng Tian
VGen
80
10
0
11 Jun 2024
Commonsense-T2I Challenge: Can Text-to-Image Generation Models
  Understand Commonsense?
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Xingyu Fu
Muyu He
Yujie Lu
William Yang Wang
Dan Roth
EGVM
LRM
31
15
0
11 Jun 2024
Flow Map Matching
Flow Map Matching
Nicholas M. Boffi
M. S. Albergo
Eric Vanden-Eijnden
34
4
0
11 Jun 2024
Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with
  Foundation Models
Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models
Athanasios Tragakis
Marco Aversa
Chaitanya Kaul
Roderick Murray-Smith
Daniele Faccio
57
2
0
11 Jun 2024
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
X. Wang
Siming Fu
Qihan Huang
Wanggui He
Hao Jiang
DiffM
48
41
0
11 Jun 2024
Margin-aware Preference Optimization for Aligning Diffusion Models
  without Reference
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Jiwoo Hong
Sayak Paul
Noah Lee
Kashif Rasul
James Thorne
Jongheon Jeong
43
13
0
10 Jun 2024
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise
  Optimization
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
L. Eyring
Shyamgopal Karthik
Karsten Roth
Alexey Dosovitskiy
Zeynep Akata
83
17
0
06 Jun 2024
VideoPhy: Evaluating Physical Commonsense for Video Generation
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal
Zongyu Lin
Tianyi Xie
Zeshun Zong
Michal Yarom
Yonatan Bitton
Chenfanfu Jiang
Ningyu Zhang
Kai-Wei Chang
Aditya Grover
EGVM
VGen
40
37
0
05 Jun 2024
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Le Zhuo
Ruoyi Du
Han Xiao
Yangguang Li
Dongyang Liu
...
Wanli Ouyang
Ziwei Liu
Ping Luo
Hongsheng Li
Peng Gao
52
45
0
05 Jun 2024
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Philip Anastassiou
Jiawei Chen
J. Chen
Yuanzhe Chen
Zhuo Chen
...
Wenjie Zhang
Wenjie Qu
Zilin Zhao
Dejian Zhong
Xiaobin Zhuang
49
79
0
04 Jun 2024
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few
  Steps Image Generation
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
Clement Chadebec
O. Tasar
Eyal Benaroche
Benjamin Aubin
VLM
60
9
0
04 Jun 2024
The Crystal Ball Hypothesis in diffusion models: Anticipating object
  positions from initial noise
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
Yuanhao Ban
Ruochen Wang
Tianyi Zhou
Boqing Gong
Cho-Jui Hsieh
Minhao Cheng
DiffM
67
4
0
04 Jun 2024
Previous
123...14151617
Next