Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.06940
Cited By
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
9 October 2024
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think"
49 / 49 papers shown
Title
DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning
Weilai Xiang
Hongyu Yang
Di Huang
Yunhong Wang
12
0
0
16 May 2025
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
Yuang Ai
Qihang Fan
Xuefeng Hu
Zhenheng Yang
Ran He
Huaibo Huang
DiffM
12
0
0
16 May 2025
Generative Pre-trained Autoregressive Diffusion Transformer
Yuan Zhang
Jiacheng Jiang
Guoqing Ma
Zhiying Lu
Haoyang Huang
Jianlong Yuan
Nan Duan
VGen
40
1
0
12 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
D. Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
51
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Xuzhi Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
49
0
0
29 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
77
0
0
29 Apr 2025
Enhancing Variational Autoencoders with Smooth Robust Latent Encoding
Hyomin Lee
Minseon Kim
Sangwon Jang
Jongheon Jeong
Sung Ju Hwang
DiffM
AAML
39
0
0
24 Apr 2025
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Theodoros Kouzelis
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DiffM
38
0
0
22 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
Elucidating the Design Space of Multimodal Protein Language Models
Cheng-Yen Hsieh
Qing Guo
Daiheng Zhang
Dongyu Xue
Fei Ye
Shujian Huang
Zaixiang Zheng
Quanquan Gu
29
1
0
15 Apr 2025
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors
Yanrui Bin
Wenbo Hu
Haoyuan Wang
Xinya Chen
Bing Wang
DiffM
45
0
0
15 Apr 2025
Efficient Generative Model Training via Embedded Representation Warmup
Deyuan Liu
Peng Sun
Xufeng Li
Tao Lin
33
0
0
14 Apr 2025
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
Xingjian Leng
Jaskirat Singh
Yunzhong Hou
Zhenchang Xing
Saining Xie
Liang Zheng
39
0
0
14 Apr 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
36
0
0
11 Apr 2025
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
Tri Ton
Ji Woo Hong
Chang D. Yoo
VGen
24
0
0
08 Apr 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
L. Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
48
0
0
01 Apr 2025
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
Boyuan Wang
Xiaofeng Wang
Chaojun Ni
Guosheng Zhao
Zhiqin Yang
...
Yukun Zhou
Xinze Chen
Guan Huang
Lihong Liu
Xingang Wang
VGen
57
2
0
31 Mar 2025
MMGen: Unified Multi-modal Image Generation and Understanding in One Go
Jiepeng Wang
Zhaoqing Wang
H. Pan
Yuan Liu
Dongdong Yu
Changhu Wang
Wenping Wang
DiffM
80
0
0
26 Mar 2025
U-REPA: Aligning Diffusion U-Nets to ViTs
Yuchuan Tian
Hanting Chen
Mengyu Zheng
Yuchen Liang
Chao Xu
Yunhe Wang
56
0
0
24 Mar 2025
SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction
Zhengyuan Li
Kai Cheng
Anindita Ghosh
Uttaran Bhattacharya
Liangyan Gui
Aniket Bera
DiffM
VGen
44
0
0
23 Mar 2025
Deeply Supervised Flow-Based Generative Models
Inkyu Shin
Chenglin Yang
Liang-Chieh Chen
63
0
0
18 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
68
0
0
13 Mar 2025
Unified Dense Prediction of Video Diffusion
Lehan Yang
Lu Qi
Xianrui Li
Sheng Li
Varun Jampani
Ming Yang
MDE
VOS
VGen
60
0
0
12 Mar 2025
"Principal Components" Enable A New Language of Images
Xin Wen
Bingchen Zhao
Ismail Elezi
Jiankang Deng
Xiaojuan Qi
66
0
0
11 Mar 2025
SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models
Hesen Chen
Junyan Wang
Zhiyu Tan
Hao Li
58
0
0
11 Mar 2025
Aligning Text to Image in Diffusion Models is Easier Than You Think
J. Lee
Byunghee Cha
Jeongsol Kim
Jong Chul Ye
52
0
0
11 Mar 2025
Keeping Representation Similarity in Finetuning for Medical Image Analysis
Wenqiang Zu
Shenghao Xie
Hao Chen
Yiming Liang
Lei Ma
MedIm
OOD
48
0
0
10 Mar 2025
Effective and Efficient Masked Image Generation Models
Zebin You
Jingyang Ou
Xiaolu Zhang
Jun Hu
Jun Zhou
Chongxuan Li
DiffM
VLM
64
1
0
10 Mar 2025
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu
Renda Li
Yong Wang
65
0
0
08 Mar 2025
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
Kaiwen Zheng
Yongxin Chen
Huayu Chen
Guande He
Xuan Li
Jun Zhu
Qinsheng Zhang
DiffM
49
0
0
03 Mar 2025
Interpreting CLIP with Hierarchical Sparse Autoencoders
Vladimir Zaigrajew
Hubert Baniecki
P. Biecek
51
0
0
27 Feb 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
83
6
0
27 Feb 2025
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
Sicheng Xie
Haidong Cao
Zejia Weng
Zhen Xing
Shiwei Shen
Jiaqi Leng
Xipeng Qiu
Yanwei Fu
Zuxuan Wu
Yu Jiang
56
0
0
23 Feb 2025
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
Sihyun Yu
Meera Hahn
Dan Kondratyuk
Jinwoo Shin
Agrim Gupta
José Lezama
Irfan Essa
David A. Ross
Jonathan Huang
DiffM
VGen
77
0
0
18 Feb 2025
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Theodoros Kouzelis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DRL
78
5
0
17 Feb 2025
Diffusion Models without Classifier-free Guidance
Zhicong Tang
Jianmin Bao
Dong Chen
Baining Guo
VLM
57
2
0
17 Feb 2025
Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling
Xiao Li
Zekai Zhang
Xiang Li
Siyi Chen
Zhihui Zhu
Peng Wang
Qing Qu
DiffM
51
0
0
09 Feb 2025
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
Junan Zhang
Jing Yang
Zihao Fang
Yixuan Wang
Zehua Zhang
Zhuo Wang
Fan Fan
Zhikai Wu
41
2
0
26 Jan 2025
IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features
Anand Kumar
Jiteng Mu
Nuno Vasconcelos
DiffM
88
0
0
19 Dec 2024
E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling
Zhihang Yuan
Yuzhang Shang
H. Zhang
Tongcheng Fang
Rui Xie
Bingxin Xu
Yan Yan
Shengen Yan
Guohao Dai
Yu Wang
DiffM
100
1
0
18 Dec 2024
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
H. Chen
Z. Wang
Xianrui Li
Xingchen Sun
Fangyi Chen
Jiang Liu
J. Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
114
6
0
14 Dec 2024
[MASK] is All You Need
Vincent Tao Hu
Bjorn Ommer
DiffM
137
2
0
09 Dec 2024
FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation
Kefan Chen
Chaerin Min
Linguang Zhang
Shreyas Hampali
Cem Keskin
Srinath Sridhar
77
0
0
03 Dec 2024
XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation
Xianrui Li
Kai Qiu
H. Chen
Jason Kuen
Jiuxiang Gu
J. Wang
Zhe-nan Lin
Bhiksha Raj
VLM
125
3
0
02 Dec 2024
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao
S. Sang
Tiancheng Zhi
Jing Liu
Qing Yan
Linjie Luo
Bo Yuan
Bo Yuan
VLM
86
1
0
26 Nov 2024
Factorized Visual Tokenization and Generation
Zechen Bai
Jianxiong Gao
Ziteng Gao
Pichao Wang
Zheng Zhang
Tong He
Mike Zheng Shou
75
3
0
25 Nov 2024
Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation
Zhenbin Wang
Lei Zhang
Lituan Wang
Minjuan Zhu
Zhenwei Zhang
VGen
MedIm
57
1
0
03 Nov 2024
Movie Gen: A Cast of Media Foundation Models
Adam Polyak
Amit Zohar
Andrew Brown
Andros Tjandra
Animesh Sinha
...
Simone Parmeggiani
Steve Fine
Tara Fowler
Vladan Petrovic
Yuming Du
VGen
DiffM
61
163
0
17 Oct 2024
1