ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.03602
  4. Cited By
SiT: Self-supervised vIsion Transformer

SiT: Self-supervised vIsion Transformer

8 April 2021
Sara Atito Ali Ahmed
Muhammad Awais
J. Kittler
    ViT
ArXivPDFHTML

Papers citing "SiT: Self-supervised vIsion Transformer"

46 / 46 papers shown
Title
The Moon's Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction
The Moon's Many Faces: A Single Unified Transformer for Multimodal Lunar Reconstruction
Tom Sander
Moritz Tenthoff
Kay Wohlfarth
Christian Wöhler
31
0
0
08 May 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
68
0
0
13 Mar 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
83
6
0
27 Feb 2025
Beyond [cls]: Exploring the true potential of Masked Image Modeling representations
Beyond [cls]: Exploring the true potential of Masked Image Modeling representations
Marcin Przewiȩźlikowski
Randall Balestriero
Wojciech Jasiński
Marek 'Smieja
Bartosz Zieliñski
69
0
0
04 Dec 2024
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
Huiyang Hu
Peijin Wang
Hanbo Bi
Boyuan Tong
Zhi Wang
...
Ziqi Zhang
QiXiang Ye
Kun Fu
Xian Sun
Xian Sun
100
0
0
27 Nov 2024
Behavioral Cloning Models Reality Check for Autonomous Driving
Behavioral Cloning Models Reality Check for Autonomous Driving
M. Yildirim
Barkin Dagda
Vinal Asodia
Saber Fallah
OffRL
36
1
0
11 Sep 2024
Dynamic Identity-Guided Attention Network for Visible-Infrared Person
  Re-identification
Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification
Peng Gao
Yujian Lee
Hui Zhang
Xubo Liu
Yiyang Hu
Guquan Jing
32
1
0
21 May 2024
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
44
63
0
11 Dec 2023
Masked Feature Modelling: Feature Masking for the Unsupervised
  Pre-training of a Graph Attention Network Block for Bottom-up Video Event
  Recognition
Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition
Dimitrios Daskalakis
Nikolaos Gkalelis
Vasileios Mezaris
36
0
0
24 Aug 2023
Masked Momentum Contrastive Learning for Zero-shot Semantic
  Understanding
Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding
Jiantao Wu
Shentong Mo
Muhammad Awais
Sara Atito
Zhenhua Feng
J. Kittler
VLM
27
4
0
22 Aug 2023
DPPMask: Masked Image Modeling with Determinantal Point Processes
DPPMask: Masked Image Modeling with Determinantal Point Processes
Junde Xu
Zikai Lin
Donghao Zhou
Yao-Cheng Yang
Xiangyun Liao
Bian Wu
Guangyong Chen
Pheng-Ann Heng
23
1
0
13 Mar 2023
Knowledge Graph Completion Method Combined With Adaptive Enhanced
  Semantic Information
Knowledge Graph Completion Method Combined With Adaptive Enhanced Semantic Information
Weidong Ji
Zengxiang Yin
Guohui Zhou
Yuqi Yue
Xinru Zhang
Chenghong Sun
16
0
0
04 Feb 2023
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance
  Industry
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry
Azin Asgarian
Rohit Saha
Daniel Jakubovitz
Julia Peyre
24
2
0
15 Jan 2023
A New Perspective to Boost Vision Transformer for Medical Image
  Classification
A New Perspective to Boost Vision Transformer for Medical Image Classification
Yuexiang Li
Yawen Huang
Nanjun He
Kai Ma
Yefeng Zheng
ViT
MedIm
21
3
0
03 Jan 2023
UnICLAM:Contrastive Representation Learning with Adversarial Masking for
  Unified and Interpretable Medical Vision Question Answering
UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering
Chenlu Zhan
Peng Peng
Hongsen Wang
Tao Chen
Hongwei Wang
MedIm
23
3
0
21 Dec 2022
SPCXR: Self-supervised Pretraining using Chest X-rays Towards a Domain
  Specific Foundation Model
SPCXR: Self-supervised Pretraining using Chest X-rays Towards a Domain Specific Foundation Model
Syed Muhammad Anwar
Abhijeet Parida
Sara Atito
Muhammad Awais
G. Nino
Josef Kitler
M. Linguraru
ViT
SSL
OOD
29
6
0
23 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo
  Matching and Optical Flow
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
20
81
0
18 Nov 2022
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View
  Completion
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion
Philippe Weinzaepfel
Vincent Leroy
Thomas Lucas
Romain Brégier
Yohann Cabon
Vaibhav Arora
L. Antsfeld
Boris Chidlovskii
G. Csurka
Jérôme Revaud
SSL
42
64
0
19 Oct 2022
Self-supervised Video Representation Learning with Motion-Aware Masked
  Autoencoders
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang
Deng Huang
Bin Wen
Jiannan Wu
H. Yao
Yi-Xin Jiang
Xiatian Zhu
Zehuan Yuan
37
19
0
09 Oct 2022
Transformer based Fingerprint Feature Extraction
Transformer based Fingerprint Feature Extraction
Saraansh Tandon
A. Namboodiri
ViT
39
8
0
08 Sep 2022
SB-SSL: Slice-Based Self-Supervised Transformers for Knee Abnormality
  Classification from MRI
SB-SSL: Slice-Based Self-Supervised Transformers for Knee Abnormality Classification from MRI
Sara Atito
Syed Muhammad Anwar
Muhammad Awais
Josef Kitler
ViT
MedIm
29
12
0
29 Aug 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
34
32
0
19 Jun 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
35
97
0
16 Jun 2022
Rethinking Generalization in Few-Shot Classification
Rethinking Generalization in Few-Shot Classification
Markus Hiller
Rongkai Ma
Mehrtash Harandi
Tom Drummond
OCL
VLM
27
55
0
15 Jun 2022
GMML is All you Need
GMML is All you Need
Sara Atito
Muhammad Awais
J. Kittler
ViT
VLM
46
18
0
30 May 2022
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing
  Mechanisms in Sequence Learning
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Didolkar
Kshitij Gupta
Anirudh Goyal
Nitesh B. Gundavarapu
Alex Lamb
Nan Rosemary Ke
Yoshua Bengio
AI4CE
112
17
0
30 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
  Semantic Segmentation and Localization
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
28
159
0
16 May 2022
MultiMAE: Multi-modal Multi-task Masked Autoencoders
MultiMAE: Multi-modal Multi-task Masked Autoencoders
Roman Bachmann
David Mizrahi
Andrei Atanov
Amir Zamir
35
265
0
04 Apr 2022
Temporal Context Matters: Enhancing Single Image Prediction with Disease
  Progression Representations
Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations
Aishik Konwer
Xuan Xu
Joseph Bae
Chaoyu Chen
Prateek Prasanna
MedIm
30
15
0
02 Mar 2022
Unsupervised Anomaly Detection from Time-of-Flight Depth Images
Unsupervised Anomaly Detection from Time-of-Flight Depth Images
Pascal Schneider
J. Rambach
B. Mirbach
D. Stricker
29
7
0
02 Mar 2022
Training Vision Transformers with Only 2040 Images
Training Vision Transformers with Only 2040 Images
Yunhao Cao
Hao Yu
Jianxin Wu
ViT
110
42
0
26 Jan 2022
Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
Alaaeldin El-Nouby
Gautier Izacard
Hugo Touvron
Ivan Laptev
Hervé Jégou
Edouard Grave
SSL
27
148
0
20 Dec 2021
MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning
MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning
Sara Atito
Muhammad Awais
Ammarah Farooq
Zhenhua Feng
J. Kittler
17
17
0
30 Nov 2021
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image
  Analysis
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis
Yucheng Tang
Dong Yang
Wenqi Li
H. Roth
Bennett Landman
Daguang Xu
V. Nath
Ali Hatamizadeh
ViT
MedIm
42
517
0
29 Nov 2021
Sparse Fusion for Multimodal Transformers
Sparse Fusion for Multimodal Transformers
Yi Ding
Alex Rich
Mason Wang
Noah Stier
M. Turk
P. Sen
Tobias Höllerer
ViT
27
7
0
23 Nov 2021
iBOT: Image BERT Pre-Training with Online Tokenizer
iBOT: Image BERT Pre-Training with Online Tokenizer
Jinghao Zhou
Chen Wei
Huiyu Wang
Wei Shen
Cihang Xie
Alan Yuille
Tao Kong
21
710
0
15 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,443
0
11 Nov 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
38
268
0
19 Oct 2021
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual
  Representation Learning
Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning
Chongjian Ge
Youwei Liang
Yibing Song
Jianbo Jiao
Jue Wang
Ping Luo
ViT
21
36
0
11 Oct 2021
PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
Yu Fu
Tianyang Xu
Xiaojun Wu
J. Kittler
ViT
27
37
0
29 Jul 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision
  Transformers
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
39
614
0
18 Jun 2021
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Jiangning Zhang
Chao Xu
Jian Li
Wenzhou Chen
Yabiao Wang
Ying Tai
Shuo Chen
Chengjie Wang
Feiyue Huang
Yong Liu
29
22
0
31 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
317
5,785
0
29 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw
  Video, Audio and Text
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
248
577
0
22 Apr 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
227
2,430
0
04 Jan 2021
Self-Supervised Feature Learning by Learning to Spot Artifacts
Self-Supervised Feature Learning by Learning to Spot Artifacts
Simon Jenni
Paolo Favaro
SSL
150
127
0
13 Jun 2018
1