ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.03283
  4. Cited By
Diffusion Models as Masked Autoencoders

Diffusion Models as Masked Autoencoders

6 April 2023
Chen Wei
K. Mangalam
Po-Yao (Bernie) Huang
Yanghao Li
Haoqi Fan
Hu Xu
Huiyu Wang
Cihang Xie
Alan Yuille
Christoph Feichtenhofer
    DiffM
    SyDa
ArXivPDFHTML

Papers citing "Diffusion Models as Masked Autoencoders"

50 / 60 papers shown
Title
TULIP: Towards Unified Language-Image Pretraining
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLM
CLIP
MLLM
148
5
0
19 Mar 2025
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu
Renda Li
Yong Wang
215
1
0
08 Mar 2025
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
113
8
0
13 Aug 2024
Exploring Long-Sequence Masked Autoencoders
Exploring Long-Sequence Masked Autoencoders
Ronghang Hu
Shoubhik Debnath
Saining Xie
Xinlei Chen
43
18
0
13 Oct 2022
Phenaki: Variable Length Video Generation From Open Domain Textual
  Description
Phenaki: Variable Length Video Generation From Open Domain Textual Description
Ruben Villegas
Mohammad Babaeizadeh
Pieter-Jan Kindermans
Hernan Moraldo
Han Zhang
M. Saffar
Santiago Castro
Julius Kunze
D. Erhan
DiffM
VGen
117
389
0
05 Oct 2022
Imagen Video: High Definition Video Generation with Diffusion Models
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho
William Chan
Chitwan Saharia
Jay Whang
Ruiqi Gao
...
Diederik P. Kingma
Ben Poole
Mohammad Norouzi
David J. Fleet
Tim Salimans
VGen
133
1,528
0
05 Oct 2022
Make-A-Video: Text-to-Video Generation without Text-Video Data
Make-A-Video: Text-to-Video Generation without Text-Video Data
Uriel Singer
Adam Polyak
Thomas Hayes
Xiaoyue Yin
Jie An
...
Oron Ashual
Oran Gafni
Devi Parikh
Sonal Gupta
Yaniv Taigman
DiffM
VGen
74
1,408
0
29 Sep 2022
Exploring Target Representations for Masked Autoencoders
Exploring Target Representations for Masked Autoencoders
Xingbin Liu
Jinghao Zhou
Tao Kong
Xianming Lin
Rongrong Ji
133
51
0
08 Sep 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
60
319
0
12 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation
MILAN: Masked Image Pretraining on Language Assisted Representation
Zejiang Hou
Fei Sun
Yen-kuang Chen
Yuan Xie
S. Kung
ViT
63
68
0
11 Aug 2022
Expanding Language-Image Pretrained Models for General Video Recognition
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLM
CLIP
ViT
99
325
0
04 Aug 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
377
6,859
0
13 Apr 2022
Video Diffusion Models
Video Diffusion Models
Jonathan Ho
Tim Salimans
Alexey A. Gritsenko
William Chan
Mohammad Norouzi
David J. Fleet
DiffM
VGen
190
1,610
0
07 Apr 2022
MVP: Multimodality-guided Visual Pre-training
MVP: Multimodality-guided Visual Pre-training
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
49
107
0
10 Mar 2022
Generative Adversarial Networks
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
267
30,123
0
01 Mar 2022
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
97
855
0
07 Feb 2022
RePaint: Inpainting using Denoising Diffusion Probabilistic Models
RePaint: Inpainting using Denoising Diffusion Probabilistic Models
Andreas Lugmayr
Martin Danelljan
Andrés Romero
Feng Yu
Radu Timofte
Luc Van Gool
DiffM
342
1,405
0
24 Jan 2022
A ConvNet for the 2020s
A ConvNet for the 2020s
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
159
5,167
0
10 Jan 2022
GLIDE: Towards Photorealistic Image Generation and Editing with
  Text-Guided Diffusion Models
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol
Prafulla Dhariwal
Aditya A. Ramesh
Pranav Shyam
Pamela Mishkin
Bob McGrew
Ilya Sutskever
Mark Chen
331
3,594
0
20 Dec 2021
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
141
668
0
16 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
144
689
0
02 Dec 2021
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
Baining Guo
ViT
98
242
0
24 Nov 2021
iBOT: Image BERT Pre-Training with Online Tokenizer
iBOT: Image BERT Pre-Training with Online Tokenizer
Jinghao Zhou
Chen Wei
Huiyu Wang
Wei Shen
Cihang Xie
Alan Yuille
Tao Kong
81
734
0
15 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
451
7,739
0
11 Nov 2021
BEiT: BERT Pre-Training of Image Transformers
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
258
2,819
0
15 Jun 2021
Diffusion Models Beat GANs on Image Synthesis
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal
Alex Nichol
213
7,831
0
11 May 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation
  Learning
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Christoph Feichtenhofer
Haoqi Fan
Bo Xiong
Ross B. Girshick
Kaiming He
SSL
AI4TS
90
262
0
29 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
127
1,258
0
22 Apr 2021
An Empirical Study of Training Self-Supervised Vision Transformers
An Empirical Study of Training Self-Supervised Vision Transformers
Xinlei Chen
Saining Xie
Kaiming He
ViT
154
1,862
0
05 Apr 2021
Going deeper with Image Transformers
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
152
1,014
0
31 Mar 2021
Generating Diverse Structure for Image Inpainting With Hierarchical
  VQ-VAE
Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE
Jialun Peng
Dong Liu
Songcen Xu
Houqiang Li
DiffM
46
192
0
18 Mar 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
391
4,941
0
24 Feb 2021
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Xinlei Chen
Kaiming He
SSL
253
4,052
0
20 Nov 2020
Unsupervised Learning of Visual Features by Contrasting Cluster
  Assignments
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Mathilde Caron
Ishan Misra
Julien Mairal
Priya Goyal
Piotr Bojanowski
Armand Joulin
OCL
SSL
227
4,074
0
17 Jun 2020
Bootstrap your own latent: A new approach to self-supervised Learning
Bootstrap your own latent: A new approach to self-supervised Learning
Jean-Bastien Grill
Florian Strub
Florent Altché
Corentin Tallec
Pierre Harvey Richemond
...
M. G. Azar
Bilal Piot
Koray Kavukcuoglu
Rémi Munos
Michal Valko
SSL
360
6,797
0
13 Jun 2020
X3D: Expanding Architectures for Efficient Video Recognition
X3D: Expanding Architectures for Efficient Video Recognition
Christoph Feichtenhofer
128
1,019
0
09 Apr 2020
Momentum Contrast for Unsupervised Visual Representation Learning
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
196
12,074
0
13 Nov 2019
RandAugment: Practical automated data augmentation with a reduced search
  space
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
221
3,485
0
30 Sep 2019
Large Scale Adversarial Representation Learning
Large Scale Adversarial Representation Learning
Jeff Donahue
Karen Simonyan
SSL
124
543
0
04 Jul 2019
CutMix: Regularization Strategy to Train Strong Classifiers with
  Localizable Features
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun
Dongyoon Han
Seong Joon Oh
Sanghyuk Chun
Junsuk Choe
Y. Yoo
OOD
609
4,778
0
13 May 2019
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
164
3,273
0
10 Dec 2018
Iterative Reorganization with Weak Spatial Constraints: Solving
  Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning
Iterative Reorganization with Weak Spatial Constraints: Solving Arbitrary Jigsaw Puzzles for Unsupervised Representation Learning
Chen Wei
Lingxi Xie
Xutong Ren
Yingda Xia
Chi Su
Jiaying Liu
Qi Tian
Alan Yuille
SSL
60
131
0
02 Dec 2018
Unsupervised Feature Learning via Non-Parametric Instance-level
  Discrimination
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
Zhirong Wu
Yuanjun Xiong
Stella X. Yu
Dahua Lin
SSL
170
3,452
0
05 May 2018
Unsupervised Representation Learning by Predicting Image Rotations
Unsupervised Representation Learning by Predicting Image Rotations
Spyros Gidaris
Praveer Singh
N. Komodakis
OOD
SSL
DRL
252
3,288
0
21 Mar 2018
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Y. Zhang
Phillip Isola
Alexei A. Efros
Eli Shechtman
Oliver Wang
EGVM
376
11,790
0
11 Jan 2018
Neural Discrete Representation Learning
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
226
5,008
0
02 Nov 2017
mixup: Beyond Empirical Risk Minimization
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
276
9,760
0
25 Oct 2017
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
350
27,181
0
20 Mar 2017
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
327
8,116
0
13 Aug 2016
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
410
10,482
0
21 Jul 2016
12
Next