Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.00426
Cited By
PixArt-
α
α
α
: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
30 September 2023
Junsong Chen
Jincheng Yu
Chongjian Ge
Lewei Yao
Enze Xie
Yue Wu
Zhongdao Wang
James T. Kwok
Ping Luo
Huchuan Lu
Zhenguo Li
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis"
36 / 86 papers shown
Title
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
Yiwei Guo
Shaobin Zhuang
Kunchang Li
Yu Qiao
Yali Wang
VLM
CLIP
32
0
0
16 Oct 2024
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Enze Xie
Junsong Chen
Junyu Chen
Han Cai
Haotian Tang
...
Zhekai Zhang
Muyang Li
Ligeng Zhu
Yaojie Lu
Song Han
VLM
46
49
0
14 Oct 2024
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Xiangru Zhu
Penglei Sun
Yaoxian Song
Yanghua Xiao
Zhixu Li
Chengyu Wang
Jun Huang
Bei Yang
Xiaoxiao Xu
EGVM
185
1
0
14 Oct 2024
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
Onkar Susladkar
Jishu Sen Gupta
Chirag Sehgal
Sparsh Mittal
Rekha Singhal
DiffM
VGen
42
0
0
10 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
76
64
0
09 Oct 2024
Revealing Directions for Text-guided 3D Face Editing
Zhuo Chen
Yichao Yan
Sehngqi Liu
Yuhao Cheng
Weiming Zhao
Lincheng Li
Mengxiao Bi
Xiaokang Yang
DiffM
37
0
0
07 Oct 2024
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Haotian Sun
Tao Lei
Bowen Zhang
Yanghao Li
Haoshuo Huang
Ruoming Pang
Bo Dai
Nan Du
DiffM
MoE
81
5
0
02 Oct 2024
Denoising with a Joint-Embedding Predictive Architecture
Dengsheng Chen
Jie Hu
Xiaoming Wei
Enhua Wu
DiffM
52
2
0
02 Oct 2024
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Jing He
Haodong Li
Wei Yin
Yixun Liang
Leheng Li
Kaiqiang Zhou
Hongbo Zhang
Bingbing Liu
Ying-Cong Chen
DiffM
VLM
49
40
0
26 Sep 2024
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
Zheng Chong
Xiao Dong
Haoxiang Li
Shiyue Zhang
Wenqing Zhang
Xujie Zhang
Hanqing Zhao
D. Jiang
Xiaodan Liang
DiffM
57
17
0
21 Jul 2024
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
Huiguo He
Huan Yang
Zixi Tuo
Yuan Zhou
Qiuyue Wang
Yuhang Zhang
Zeyu Liu
Wenhao Huang
Hongyang Chao
Jian Yin
DiffM
VGen
62
12
0
17 Jul 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLM
DiffM
46
25
0
08 Jul 2024
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan
Rui Xie
Penghao Zhou
Tiehan Fan
Zhenheng Yang
Zhijie Chen
Xiang Li
Jian Yang
Ying Tai
83
68
0
02 Jul 2024
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models
Jie Ren
Kangrui Chen
Yingqian Cui
Shenglai Zeng
Hui Liu
Yue Xing
Jiliang Tang
Lingjuan Lyu
53
1
0
21 Jun 2024
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models
Alireza Ganjdanesh
Reza Shirkavand
Shangqian Gao
Heng Huang
DiffM
VLM
56
4
0
17 Jun 2024
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
X. Wang
Siming Fu
Qihan Huang
Wanggui He
Hao Jiang
DiffM
48
41
0
11 Jun 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
66
227
0
10 Jun 2024
Evaluating Durability: Benchmark Insights into Multimodal Watermarking
Jielin Qiu
William Jongwon Han
Xuandong Zhao
Shangbang Long
Christos Faloutsos
Lei Li
65
1
0
06 Jun 2024
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Zicheng Zhang
H. Wu
Chunyi Li
Yingjie Zhou
Wei Sun
Xiongkuo Min
Zijian Chen
Xiaohong Liu
Weisi Lin
Guangtao Zhai
EGVM
72
16
0
05 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQ
VGen
112
25
0
04 Jun 2024
TerDiT: Ternary Diffusion Models with Transformers
Xudong Lu
Aojun Zhou
Ziyi Lin
Qi Liu
Yuhui Xu
Renrui Zhang
Yafei Wen
Shuai Ren
Peng Gao
Junchi Yan
MQ
53
2
0
23 May 2024
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
Katherine Xu
Lingzhi Zhang
Jianbo Shi
43
12
0
23 May 2024
CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models
Qinghe Wang
Baolu Li
Xiaomin Li
Bing Cao
Liqian Ma
Huchuan Lu
Xu Jia
DiffM
42
6
0
24 Apr 2024
Faster Diffusion via Temporal Attention Decomposition
Haozhe Liu
Wentian Zhang
Jinheng Xie
Francesco Faccio
Mengmeng Xu
Tao Xiang
Mike Zheng Shou
Juan-Manuel Perez-Rua
Jürgen Schmidhuber
DiffM
75
19
0
03 Apr 2024
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
S. A. Baumann
Felix Krause
Michael Neumayr
Nick Stracke
Vincent Tao Hu
Bjorn Ommer
Björn Ommer
DiffM
LM&Ro
70
11
0
25 Mar 2024
Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors
Ruicheng Wang
Jianfeng Xiang
Jiaolong Yang
Xin Tong
DiffM
37
4
0
18 Mar 2024
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Yifan Li
Hangyu Guo
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
61
39
0
14 Mar 2024
Diffusion Model-Based Image Editing: A Survey
Yi Huang
Jiancheng Huang
Yifan Liu
Mingfu Yan
Jiaxi Lv
Jianzhuang Liu
Wei Xiong
He Zhang
Liangliang Cao
Liangliang Cao
EGVM
66
85
0
27 Feb 2024
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Ziqiang Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
DiffM
VGen
125
239
0
05 Jan 2024
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Zeyinzi Jiang
Chaojie Mao
Yulin Pan
Zhen Han
Jingfeng Zhang
29
28
0
18 Dec 2023
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Weijia Wu
Zhuang Li
Yefei He
Mike Zheng Shou
Chunhua Shen
Lele Cheng
Yan Li
Tingting Gao
Di Zhang
VLM
141
24
0
24 Nov 2023
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Yuval Kirstain
Adam Polyak
Uriel Singer
Shahbuland Matiana
Joe Penna
Omer Levy
EGVM
168
351
0
02 May 2023
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
308
7,443
0
11 Nov 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
289
1,524
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
289
3,623
0
24 Feb 2021
TransTrack: Multiple Object Tracking with Transformer
Pei Sun
Jinkun Cao
Yi-Xin Jiang
Rufeng Zhang
Enze Xie
Zehuan Yuan
Changhu Wang
Ping Luo
ViT
VOT
264
566
0
31 Dec 2020
Previous
1
2