ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.03206
  4. Cited By
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

5 March 2024
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
Harry Saini
Yam Levi
Dominik Lorenz
Axel Sauer
Frederic Boesel
Dustin Podell
Tim Dockhorn
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
    DiffM
ArXivPDFHTML

Papers citing "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis"

50 / 829 papers shown
Title
Generative Modeling of Molecular Dynamics Trajectories
Generative Modeling of Molecular Dynamics Trajectories
Bowen Jing
Hannes Stärk
Tommi Jaakkola
Bonnie Berger
AI4CE
47
16
0
26 Sep 2024
Pixel-Space Post-Training of Latent Diffusion Models
Pixel-Space Post-Training of Latent Diffusion Models
Christina Zhang
Simran Motwani
Matthew Yu
Ji Hou
Felix Juefei-Xu
Sam S. Tsai
Peter Vajda
Zijian He
Jialiang Wang
34
2
0
26 Sep 2024
FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
FlowMAC: Conditional Flow Matching for Audio Coding at Low Bit Rates
N. Pia
Martin Strauss
M. Multrus
B. Edler
42
0
0
26 Sep 2024
JoyType: A Robust Design for Multilingual Visual Text Creation
JoyType: A Robust Design for Multilingual Visual Text Creation
Chao Li
Chen Jiang
Xiaolong Liu
Jun Zhao
Guoxin Wang
DiffM
59
6
0
26 Sep 2024
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D
  Diffusion
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion
Yukun Huang
Jianan Wang
Ailing Zeng
Zheng-Jun Zha
Lei Zhang
Xihui Liu
3DGS
47
5
0
25 Sep 2024
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors
Aiping Zhang
Zongsheng Yue
Renjing Pei
Wenqi Ren
Xiaochun Cao
42
7
0
25 Sep 2024
StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion
  for Zero-shot Text-to-speech Synthesis
StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis
Zhiyong Chen
Xinnuo Li
Zhiqi Ai
Shugong Xu
DiffM
39
1
0
24 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
55
7
0
23 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLM
DiffM
66
10
0
23 Sep 2024
Imagine yourself: Tuning-Free Personalized Image Generation
Imagine yourself: Tuning-Free Personalized Image Generation
Zecheng He
Bo Sun
Felix Juefei-Xu
Haoyu Ma
Ankit Ramchandani
...
Ning Zhang
Peizhao Zhang
Roshan Sumbaly
Peter Vajda
Animesh Sinha
DiffM
37
17
0
20 Sep 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
Yishuo Wang
Hangting Chen
Dongchao Yang
Zhiyong Wu
Xixin Wu
DiffM
45
2
0
19 Sep 2024
Understanding Implosion in Text-to-Image Generative Models
Understanding Implosion in Text-to-Image Generative Models
Wenxin Ding
Cathy Y. Li
Shawn Shan
Ben Y. Zhao
Haitao Zheng
36
0
0
18 Sep 2024
Finding the Subjective Truth: Collecting 2 Million Votes for
  Comprehensive Gen-AI Model Evaluation
Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation
Dimitrios Christodoulou
Mads Kuhlmann-Jørgensen
EGVM
40
6
0
18 Sep 2024
ABHINAW: A method for Automatic Evaluation of Typography within
  AI-Generated Images
ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images
Abhinaw Jagtap
Nachiket Tapas
R. G. Brajesh
EGVM
30
0
0
18 Sep 2024
OmniGen: Unified Image Generation
OmniGen: Unified Image Generation
Shitao Xiao
Yueze Wang
Yueze Wang
Huaying Yuan
Xingrun Xing
Ruiran Yan
Shuting Wang
Tiejun Huang
Zheng Liu
DiffM
VLM
SyDa
64
65
0
17 Sep 2024
Generalizing Alignment Paradigm of Text-to-Image Generation with
  Preferences through $f$-divergence Minimization
Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through fff-divergence Minimization
Haoyuan Sun
Bo Xia
Yongzhe Chang
Xueqian Wang
EGVM
35
2
0
15 Sep 2024
Seed-Music: A Unified Framework for High Quality and Controlled Music
  Generation
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation
Ye Bai
Haonan Chen
Jitong Chen
Zhuo Chen
Yi Deng
...
Hang Zhao
Ziyi Zhao
Dejian Zhong
Shicen Zhou
Pei Zou
DiffM
63
6
0
13 Sep 2024
Scores as Actions: a framework of fine-tuning diffusion models by
  continuous-time reinforcement learning
Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning
Hanyang Zhao
Haoxian Chen
Ji Zhang
David D. Yao
Wenpin Tang
55
3
0
12 Sep 2024
Token Turing Machines are Efficient Vision Models
Token Turing Machines are Efficient Vision Models
Purvish Jajal
Nick Eliopoulos
Benjamin Shiue-Hal Chou
George K. Thiravathukal
James C. Davis
Yung-Hsiang Lu
98
0
0
11 Sep 2024
Learning Robotic Manipulation Policies from Point Clouds with
  Conditional Flow Matching
Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching
Eugenio Chisari
Nick Heppert
Max Argus
Tim Welschehold
Thomas Brox
Abhinav Valada
3DPC
63
13
0
11 Sep 2024
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Buhua Liu
Shitong Shao
Bao Li
Lichen Bai
Zhiqiang Xu
Haoyi Xiong
James Kwok
Sumi Helal
Zeke Xie
49
12
0
11 Sep 2024
Quantifying and Enabling the Interpretability of CLIP-like Models
Quantifying and Enabling the Interpretability of CLIP-like Models
Avinash Madasu
Yossi Gandelsman
Vasudev Lal
Phillip Howard
VLM
56
2
0
10 Sep 2024
Distilling Generative-Discriminative Representations for Very
  Low-Resolution Face Recognition
Distilling Generative-Discriminative Representations for Very Low-Resolution Face Recognition
Junzheng Zhang
Weijia Guo
Bochao Liu
Ruixin Shi
Yong Li
Shiming Ge
CVBM
51
0
0
10 Sep 2024
ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially
  Symmetric Flow Matching
ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching
Niklas Funk
Julen Urain
João Carvalho
V. Prasad
Georgia Chalvatzaki
Jan Peters
58
5
0
06 Sep 2024
LinFusion: 1 GPU, 1 Minute, 16K Image
LinFusion: 1 GPU, 1 Minute, 16K Image
Songhua Liu
Weihao Yu
Zhenxiong Tan
Xinchao Wang
48
13
0
03 Sep 2024
Differentially Private Kernel Density Estimation
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
69
3
0
03 Sep 2024
Affordance-based Robot Manipulation with Flow Matching
Affordance-based Robot Manipulation with Flow Matching
Fan Zhang
Michael Gienger
60
6
0
02 Sep 2024
SPDiffusion: Semantic Protection Diffusion Models for Multi-concept Text-to-image Generation
SPDiffusion: Semantic Protection Diffusion Models for Multi-concept Text-to-image Generation
Yang Zhang
Rui Zhang
Xuecheng Nie
Haochen Li
Jikun Chen
Yifan Hao
Xin Zhang
Luoqi Liu
Ling Li
50
0
0
02 Sep 2024
Accurate Compression of Text-to-Image Diffusion Models via Vector
  Quantization
Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization
Vage Egiazarian
Denis Kuznedelev
Anton Voronov
Ruslan Svirschevski
Michael Goin
Daniil Pavlov
Dan Alistarh
Dmitry Baranchuk
MQ
43
0
0
31 Aug 2024
Law of Vision Representation in MLLMs
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
44
9
0
29 Aug 2024
Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data
  Generation Toolkit for Auditing 3D Human Pose Estimators
Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data Generation Toolkit for Auditing 3D Human Pose Estimators
Nikita Kister
István Sárándi
Anna Khoreva
Gerard Pons-Moll
53
0
0
28 Aug 2024
Hand1000: Generating Realistic Hands from Text with Only 1,000 Images
Hand1000: Generating Realistic Hands from Text with Only 1,000 Images
Haozhuo Zhang
B. Zhu
Yu Cao
Y. Hao
VLM
37
2
0
28 Aug 2024
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware
  Diffusion and Iterative Refinement
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement
Xu He
Xiaoyu Li
Di Kang
Jiangnan Ye
Chaopeng Zhang
Liyang Chen
Xiangjun Gao
Han Zhang
Zhiyong Wu
Haolin Zhuang
DiffM
38
7
0
26 Aug 2024
SurGen: Text-Guided Diffusion Model for Surgical Video Generation
SurGen: Text-Guided Diffusion Model for Surgical Video Generation
Joseph Cho
Samuel Schmidgall
C. Zakka
Mrudang Mathur
Dhamanpreet Kaur
R. Shad
W. Hiesinger
VGen
MedIm
33
7
0
26 Aug 2024
Latent Space Disentanglement in Diffusion Transformers Enables Zero-shot
  Fine-grained Semantic Editing
Latent Space Disentanglement in Diffusion Transformers Enables Zero-shot Fine-grained Semantic Editing
Zitao Shuai
Chenwei Wu
Zhengxu Tang
Bowen Song
Liyue Shen
35
0
0
23 Aug 2024
What Do You Want? User-centric Prompt Generation for Text-to-image
  Synthesis via Multi-turn Guidance
What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Yilun Liu
Minggui He
Feiyu Yao
Yuhe Ji
Shimin Tao
...
Jian Gao
Li Zhang
Hao Yang
Boxing Chen
Osamu Yoshie
48
5
0
23 Aug 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and
  Generation
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
59
166
0
22 Aug 2024
MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time
  Adaptation
MeTTA: Single-View to 3D Textured Mesh Reconstruction with Test-Time Adaptation
Kim Yu-Ji
Hyunwoo Ha
Kim Youwang
Jaeheung Surh
Hyowon Ha
Tae-Hyun Oh
54
0
0
21 Aug 2024
Transfusion: Predict the Next Token and Diffuse Images with One
  Multi-Modal Model
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou
Lili Yu
Arun Babu
Kushal Tirumala
Michihiro Yasunaga
Leonid Shamis
Jacob Kahn
Xuezhe Ma
Luke Zettlemoyer
Omer Levy
DiffM
42
153
0
20 Aug 2024
MegaFusion: Extend Diffusion Models towards Higher-resolution Image
  Generation without Further Tuning
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Haoning Wu
Shaocheng Shen
Qiang Hu
Xiaoyun Zhang
Ya Zhang
Yanfeng Wang
40
10
0
20 Aug 2024
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent
  Collaboration
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Yanbo Ding
Shaobin Zhuang
Kunchang Li
Zhengrong Yue
Yu Qiao
Yali Wang
VGen
37
2
0
20 Aug 2024
Factorized-Dreamer: Training A High-Quality Video Generator with Limited
  and Low-Quality Data
Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data
Tao Yang
Yangming Shi
Yunwen Huang
Feng Chen
Yin Zheng
Lei Zhang
DiffM
VGen
70
0
0
19 Aug 2024
Detecting the Undetectable: Combining Kolmogorov-Arnold Networks and MLP
  for AI-Generated Image Detection
Detecting the Undetectable: Combining Kolmogorov-Arnold Networks and MLP for AI-Generated Image Detection
Taharim Rahman Anon
Jakaria Islam Emon
45
3
0
18 Aug 2024
Are CLIP features all you need for Universal Synthetic Image Origin
  Attribution?
Are CLIP features all you need for Universal Synthetic Image Origin Attribution?
Dario Cioni
Christos Tzelepis
Lorenzo Seidenari
Ioannis Patras
48
2
0
17 Aug 2024
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation
Peiming Guo
Sinuo Liu
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
Hao Fei
DiffM
50
1
0
16 Aug 2024
MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and
  3D Editing
MVInpainter: Learning Multi-View Consistent Inpainting to Bridge 2D and 3D Editing
Chenjie Cao
Chaohui Yu
Yanwei Fu
Fan Wang
Xiangyang Xue
VGen
53
7
0
15 Aug 2024
Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion
  Models
Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models
Chenqian Yan
Songwei Liu
Hongjian Liu
Xurui Peng
Xiaojian Wang
Fangming Chen
Lean Fu
Xing Mei
31
6
0
13 Aug 2024
Music2Latent: Consistency Autoencoders for Latent Audio Compression
Music2Latent: Consistency Autoencoders for Latent Audio Compression
Marco Pasini
Stefan Lattner
George Fazekas
24
7
0
12 Aug 2024
UniPortrait: A Unified Framework for Identity-Preserving Single- and
  Multi-Human Image Personalization
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
Junjie He
Yifeng Geng
Liefeng Bo
DiffM
56
20
0
12 Aug 2024
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffM
VGen
104
405
0
12 Aug 2024
Previous
123...1314151617
Next