ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.09854
  4. Cited By
Transformer-Based Visual Segmentation: A Survey
v1v2v3 (latest)

Transformer-Based Visual Segmentation: A Survey

19 April 2023
Xiangtai Li
Henghui Ding
Haobo Yuan
Wenwei Zhang
Jiangmiao Pang
Guangliang Cheng
Kai-xiang Chen
Ziwei Liu
Chen Change Loy
    ViTMedIm
ArXiv (abs)PDFHTMLGithub (741★)

Papers citing "Transformer-Based Visual Segmentation: A Survey"

50 / 216 papers shown
Title
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
  Pre-training
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
Renrui Zhang
Ziyu Guo
Rongyao Fang
Bingyan Zhao
Dong Wang
Yu Qiao
Hongsheng Li
Peng Gao
3DPC
250
257
0
28 May 2022
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Yangyang Xu
Xiangtai Li
Haobo Yuan
Yibo Yang
Lefei Zhang
ViT
84
49
0
28 May 2022
Vision Transformer Adapter for Dense Predictions
Vision Transformer Adapter for Dense Predictions
Zhe Chen
Yuchen Duan
Wenhai Wang
Junjun He
Tong Lu
Jifeng Dai
Yu Qiao
132
564
0
17 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
77
126
0
08 May 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation
Temporally Efficient Vision Transformer for Video Instance Segmentation
Shusheng Yang
Xinggang Wang
Yu Li
Yuxin Fang
Jiemin Fang
Wenyu Liu
Xun Zhao
Ying Shan
ViT
51
66
0
18 Apr 2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
Wenqiang Zhang
Zilong Huang
Guozhong Luo
Tao Chen
Xinggang Wang
Wenyu Liu
Gang Yu
Chunhua Shen
ViT
106
208
0
12 Apr 2022
Panoptic-PartFormer: Learning a Unified Model for Panoptic Part
  Segmentation
Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation
Xiangtai Li
Shilin Xu
Yibo Yang
Guangliang Cheng
Yunhai Tong
Dacheng Tao
ViT
48
46
0
10 Apr 2022
AdaMixer: A Fast-Converging Query-Based Object Detector
AdaMixer: A Fast-Converging Query-Based Object Detector
Ziteng Gao
Limin Wang
Bing Han
Sheng Guo
ObjD
92
110
0
30 Mar 2022
MatteFormer: Transformer-Based Image Matting via Prior-Tokens
MatteFormer: Transformer-Based Image Matting via Prior-Tokens
Gyutae Park
S. Son
Jaeyoung Yoo
Seho Kim
Nojun Kwak
ViT
63
66
0
29 Mar 2022
Stratified Transformer for 3D Point Cloud Segmentation
Stratified Transformer for 3D Point Cloud Segmentation
Xin Lai
Jianhui Liu
Li Jiang
Liwei Wang
Hengshuang Zhao
Shu Liu
Xiaojuan Qi
Jiaya Jia
3DPCViT
114
274
0
28 Mar 2022
Sparse Instance Activation for Real-Time Instance Segmentation
Sparse Instance Activation for Real-Time Instance Segmentation
Tianheng Cheng
Xinggang Wang
Shaoyu Chen
Wenqiang Zhang
Qian Zhang
Chang Huang
Zhaoxiang Zhang
Wenyu Liu
ISeg
80
133
0
24 Mar 2022
InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene
  Understanding
InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
Hanrong Ye
Dan Xu
ViT
57
88
0
15 Mar 2022
Masked Autoencoders for Point Cloud Self-supervised Learning
Masked Autoencoders for Point Cloud Self-supervised Learning
Yatian Pang
Wenxiao Wang
Francis E. H. Tay
Wen Liu
Yonghong Tian
Liuliang Yuan
3DPCViT
86
475
0
13 Mar 2022
Conditional Prompt Learning for Vision-Language Models
Conditional Prompt Learning for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VLMCLIPVPVLM
139
1,349
0
10 Mar 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object
  Detection
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
167
1,451
0
07 Mar 2022
Multi-class Token Transformer for Weakly Supervised Semantic
  Segmentation
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Lian Xu
Wanli Ouyang
Bennamoun
F. Boussaïd
Dan Xu
ViT
82
213
0
06 Mar 2022
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Feng Li
Hao Zhang
Shi-guang Liu
Jian Guo
L. Ni
Lei Zhang
ViT
130
680
0
02 Mar 2022
Context Autoencoder for Self-Supervised Representation Learning
Context Autoencoder for Self-Supervised Representation Learning
Xiaokang Chen
Mingyu Ding
Xiaodi Wang
Ying Xin
Shentong Mo
Yunhao Wang
Shumin Han
Ping Luo
Gang Zeng
Jingdong Wang
SSL
105
396
0
07 Feb 2022
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
Shilong Liu
Feng Li
Hao Zhang
Xiaohu Yang
Xianbiao Qi
Hang Su
Jun Zhu
Lei Zhang
ViT
294
761
0
28 Jan 2022
TransVOD: End-to-End Video Object Detection with Spatial-Temporal
  Transformers
TransVOD: End-to-End Video Object Detection with Spatial-Temporal Transformers
Qianyu Zhou
Hefei Ling
Lu He
Li Niu
Guangliang Cheng
Yunhai Tong
Lizhuang Ma
Liqing Zhang
ViT
85
137
0
13 Jan 2022
A ConvNet for the 2020s
A ConvNet for the 2020s
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
171
5,192
0
10 Jan 2022
Detecting Twenty-thousand Classes using Image-level Supervision
Detecting Twenty-thousand Classes using Image-level Supervision
Xingyi Zhou
Rohit Girdhar
Armand Joulin
Phillip Krahenbuhl
Ishan Misra
CLIPVLM
106
617
0
07 Jan 2022
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation
MSeg: A Composite Dataset for Multi-domain Semantic Segmentation
John Lambert
Zhuang Liu
Ozan Sener
James Hays
V. Koltun
VLM
76
202
0
27 Dec 2021
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi
Xiuye Gu
Huayu Chen
Nayeon Lee
VLM
124
382
0
22 Dec 2021
SOIT: Segmenting Objects with Instance-Aware Transformers
SOIT: Segmenting Objects with Instance-Aware Transformers
Xiaodong Yu
Dahu Shi
Xing Wei
Ye Ren
Ting Ye
Wenming Tan
ViT
99
27
0
21 Dec 2021
MPViT: Multi-Path Vision Transformer for Dense Prediction
MPViT: Multi-Path Vision Transformer for Dense Prediction
Youngwan Lee
Jonghee Kim
Jeffrey Willette
Sung Ju Hwang
ViT
85
250
0
21 Dec 2021
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
149
670
0
16 Dec 2021
Slot-VPS: Object-centric Representation Learning for Video Panoptic
  Segmentation
Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation
Yi Zhou
Hui Zhang
Hana Lee
Shuyang Sun
Pingjun Li
Yangguang Zhu
ByungIn Yoo
Xiaojuan Qi
Jae-Joon Han
VOS
67
28
0
16 Dec 2021
Decoupling Zero-Shot Semantic Segmentation
Decoupling Zero-Shot Semantic Segmentation
Jian Ding
Nan Xue
Guisong Xia
Dengxin Dai
VLM
104
195
0
15 Dec 2021
Masked-attention Mask Transformer for Universal Image Segmentation
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
Alex Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
253
2,374
0
02 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
153
690
0
02 Dec 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
VLMCLIP
208
578
0
02 Dec 2021
CRIS: CLIP-Driven Referring Image Segmentation
CRIS: CLIP-Driven Referring Image Segmentation
Zhaoqing Wang
Yu Lu
Qiang Li
Xunqiang Tao
Yan Guo
Ming Gong
Tongliang Liu
VLM
111
371
0
30 Nov 2021
DAFormer: Improving Network Architectures and Training Strategies for
  Domain-Adaptive Semantic Segmentation
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
Lukas Hoyer
Dengxin Dai
Luc Van Gool
AI4CE
102
458
0
29 Nov 2021
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point
  Modeling
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Xumin Yu
Lulu Tang
Yongming Rao
Tiejun Huang
Jie Zhou
Jiwen Lu
3DPC
136
682
0
29 Nov 2021
Sparse DETR: Efficient End-to-End Object Detection with Learnable
  Sparsity
Sparse DETR: Efficient End-to-End Object Detection with Learnable Sparsity
Byungseok Roh
Jaewoong Shin
Wuhyun Shin
Saehoon Kim
ViT
52
145
0
29 Nov 2021
High Quality Segmentation for Ultra High-resolution Images
High Quality Segmentation for Ultra High-resolution Images
Tiancheng Shen
Yuechen Zhang
Lu Qi
Jason Kuen
Xingyu Xie
Jianlong Wu
Zhe Lin
Jiaya Jia
147
42
0
29 Nov 2021
Mask Transfiner for High-Quality Instance Segmentation
Mask Transfiner for High-Quality Instance Segmentation
Lei Ke
Martin Danelljan
Xia Li
Yu-Wing Tai
Chi-Keung Tang
Feng Yu
ISeg
62
117
0
26 Nov 2021
Open-Vocabulary Instance Segmentation via Robust Cross-Modal
  Pseudo-Labeling
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
Dat T. Huynh
Jason Kuen
Zhe Lin
Jiuxiang Gu
Ehsan Elhamifar
ISegVLM
56
86
0
24 Nov 2021
MetaFormer Is Actually What You Need for Vision
MetaFormer Is Actually What You Need for Vision
Weihao Yu
Mi Luo
Pan Zhou
Chenyang Si
Yichen Zhou
Xinchao Wang
Jiashi Feng
Shuicheng Yan
170
911
0
22 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng Zhang
Li Dong
Furu Wei
B. Guo
ViT
217
1,822
0
18 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViTTPM
467
7,814
0
11 Nov 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language
  Modeling
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
265
401
0
06 Nov 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision
  Transformer
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
285
1,274
0
05 Oct 2021
Localizing Objects with Self-Supervised Transformers and no Labels
Localizing Objects with Self-Supervised Transformers and no Labels
Oriane Siméoni
Gilles Puy
Huy V. Vo
Simon Roburin
Spyros Gidaris
Andrei Bursuc
P. Pérez
Renaud Marlet
Jean Ponce
ViT
230
202
0
29 Sep 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
79
105
0
30 Aug 2021
Conditional DETR for Fast Training Convergence
Conditional DETR for Fast Training Convergence
Depu Meng
Xiaokang Chen
Zejia Fan
Gang Zeng
Houqiang Li
Yuhui Yuan
Lei-huan Sun
Jingdong Wang
ViT
88
619
0
13 Aug 2021
Vision-Language Transformer and Query Generation for Referring
  Segmentation
Vision-Language Transformer and Query Generation for Referring Segmentation
Henghui Ding
Chang-rui Liu
Suchen Wang
Xudong Jiang
78
266
0
12 Aug 2021
Fast Convergence of DETR with Spatially Modulated Co-Attention
Fast Convergence of DETR with Spatially Modulated Co-Attention
Peng Gao
Minghang Zheng
Xiaogang Wang
Jifeng Dai
Hongsheng Li
ViT
72
307
0
05 Aug 2021
Global Aggregation then Local Distribution for Scene Parsing
Global Aggregation then Local Distribution for Scene Parsing
Xiangtai Li
Li Zhang
Guangliang Cheng
Kuiyuan Yang
Yunhai Tong
Xiatian Zhu
Tao Xiang
60
16
0
28 Jul 2021
Previous
12345
Next