Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.11429
Cited By
Benchmarking Detection Transfer Learning with Vision Transformers
22 November 2021
Yanghao Li
Saining Xie
Xinlei Chen
Piotr Dollar
Kaiming He
Ross B. Girshick
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Benchmarking Detection Transfer Learning with Vision Transformers"
42 / 42 papers shown
Title
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
W. Xu
Shibiao Xu
ViT
139
0
0
06 May 2025
A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design
Jie Tian
Martin Taylor Sobczak
Dhanush Patil
Jixin Hou
Lin Pang
...
Yuval Golan
Xiaoming Zhai
Hongyue Sun
Kenan Song
X. U. Wang
LLMAG
AI4CE
53
0
0
25 Mar 2025
Perception of Visual Content: Differences Between Humans and Foundation Models
Nardiena A. Pratama
Shaoyang Fan
Gianluca Demartini
VLM
97
0
0
28 Nov 2024
TACO: Adversarial Camouflage Optimization on Trucks to Fool Object Detectors
Adonisz Dimitriu
Tamás Michaletzky
Viktor Remeli
AAML
133
0
0
28 Oct 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
41
2
0
12 Jun 2024
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
Kun Yuan
Hongbo Liu
Mading Li
Muyi Sun
Ming-hui Sun
Jiachao Gong
Jinhua Hao
Chao Zhou
Yansong Tang
ViT
51
5
0
28 May 2024
Instance-Level Safety-Aware Fidelity of Synthetic Data and Its Calibration
Chih-Hong Cheng
Paul Stöckel
Xingyu Zhao
22
2
0
10 Feb 2024
A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy
Edward Sanderson
B. Matuszewski
21
2
0
11 Jan 2024
Perceptual MAE for Image Manipulation Localization: A High-level Vision Learner Focusing on Low-level Features
Xiaochen Ma
Jizhe Zhou
Xiong Xu
Zhuohang Jiang
Chi-Man Pun
29
0
0
10 Oct 2023
Controllable Chest X-Ray Report Generation from Longitudinal Representations
F. Serra
Chaoyang Wang
F. Deligianni
Jeffrey Stephen Dalton
Alison Q. OÑeil
MedIm
28
13
0
09 Oct 2023
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Henry Hengyuan Zhao
Pichao Wang
Yuyang Zhao
Hao Luo
F. Wang
Mike Zheng Shou
ViT
31
14
0
15 Sep 2023
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
Xiaochen Ma
Bo Du
Zhuohang Jiang
Ahmed Y. Al Hammadi
Jizhe Zhou
16
7
0
27 Jul 2023
Exploring the Effectiveness of Dataset Synthesis: An application of Apple Detection in Orchards
A. V. Meekeren
Maya Aghaei
K. Dijkstra
DiffM
21
1
0
20 Jun 2023
Token Sparsification for Faster Medical Image Segmentation
Lei Zhou
Huidong Liu
Joseph Bae
Junjun He
Dimitris Samaras
Prateek Prasanna
MedIm
22
3
0
11 Mar 2023
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]
Maureen Daum
Enhao Zhang
Dong He
Stephen Mussmann
Brandon Haynes
Ranjay Krishna
Magdalena Balazinska
27
4
0
07 Mar 2023
Applying Plain Transformers to Real-World Point Clouds
Lanxiao Li
M. Heizmann
3DPC
ViT
23
3
0
28 Feb 2023
Universal Guidance for Diffusion Models
Arpit Bansal
Hong-Min Chu
Avi Schwarzschild
Soumyadip Sengupta
Micah Goldblum
Jonas Geiping
Tom Goldstein
VLM
37
242
0
14 Feb 2023
Edge Enhanced Image Style Transfer via Transformers
Chi Zhang
Jun Yang
Zaiyan Dai
Peng-Xia Cao
11
10
0
02 Jan 2023
A Close Look at Spatial Modeling: From Attention to Convolution
Xu Ma
Huan Wang
Can Qin
Kunpeng Li
Xing Zhao
Jie Fu
Yun Fu
ViT
3DPC
17
11
0
23 Dec 2022
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang
Jiarui Xu
Shalini De Mello
Elliot J. Crowley
X. Wang
ViT
30
21
0
13 Dec 2022
FastMIM: Expediting Masked Image Modeling Pre-training for Vision
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Yunhe Wang
Chang Xu
30
9
0
13 Dec 2022
Exploring Stochastic Autoregressive Image Modeling for Visual Representation
Yu-Hang Qi
Fan Yang
Yousong Zhu
Yufei Liu
Liwei Wu
Rui Zhao
Wei Li
DiffM
27
13
0
03 Dec 2022
Rethinking Hierarchies in Pre-trained Plain Vision Transformer
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
13
1
0
03 Nov 2022
RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild
Weiyao Wang
Byung-Hak Kim
Varun Ganapathi
SSL
LMTD
27
1
0
02 Nov 2022
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Chi Zhang
Xiaogang Xu
Lei Wang
Zaiyan Dai
Jun Yang
ViT
27
23
0
22 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng-Wei Zhang
Chao Zhang
Hanhua Hu
22
25
0
03 Oct 2022
How Well Do Vision Transformers (VTs) Transfer To The Non-Natural Image Domain? An Empirical Study Involving Art Classification
Vincent Tonkes
M. Sabatelli
ViT
25
6
0
09 Aug 2022
Contrastive Masked Autoencoders are Stronger Vision Learners
Zhicheng Huang
Xiaojie Jin
Cheng Lu
Qibin Hou
Mingg-Ming Cheng
Dongmei Fu
Xiaohui Shen
Jiashi Feng
31
147
0
27 Jul 2022
Towards Efficient 3D Object Detection with Knowledge Distillation
Jihan Yang
Shaoshuai Shi
Runyu Ding
Zhe Wang
Xiaojuan Qi
107
45
0
30 May 2022
A Closer Look at Self-Supervised Lightweight Vision Transformers
Shaoru Wang
Jin Gao
Zeming Li
Jian-jun Sun
Weiming Hu
ViT
67
41
0
28 May 2022
Green Hierarchical Vision Transformer for Masked Image Modeling
Lang Huang
Shan You
Mingkai Zheng
Fei Wang
Chao Qian
T. Yamasaki
27
68
0
26 May 2022
Masked Image Modeling with Denoising Contrast
Kun Yi
Yixiao Ge
Xiaotong Li
Shusheng Yang
Dian Li
Jianping Wu
Ying Shan
Xiaohu Qie
VLM
30
51
0
19 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
23
55
0
06 Apr 2022
MultiMAE: Multi-modal Multi-task Masked Autoencoders
Roman Bachmann
David Mizrahi
Andrei Atanov
Amir Zamir
32
265
0
04 Apr 2022
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
Xiaotong Li
Yixiao Ge
Kun Yi
Zixuan Hu
Ying Shan
Ling-yu Duan
37
38
0
29 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViT
VLM
35
159
0
04 Mar 2022
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
46
677
0
02 Dec 2021
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,434
0
11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
308
5,773
0
29 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,774
0
24 Feb 2021
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
Golnaz Ghiasi
Yin Cui
A. Srinivas
Rui Qian
Tsung-Yi Lin
E. D. Cubuk
Quoc V. Le
Barret Zoph
ISeg
228
968
0
13 Dec 2020
1