ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.06630
  4. Cited By
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance
  Segmentation

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation

11 December 2023
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
ArXivPDFHTML

Papers citing "TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation"

50 / 51 papers shown
Title
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding
  and Generation
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
...
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
89
261
0
13 Jul 2023
AIMS: All-Inclusive Multi-Level Segmentation
AIMS: All-Inclusive Multi-Level Segmentation
Lu Qi
Jason Kuen
Weidong Guo
Jiuxiang Gu
Zhe Lin
Bo Du
Yu-Syuan Xu
Ming-Hsuan Yang
VLM
63
6
0
28 May 2023
VideoChat: Chat-Centric Video Understanding
VideoChat: Chat-Centric Video Understanding
Kunchang Li
Yinan He
Yi Wang
Yizhuo Li
Wen Wang
Ping Luo
Yali Wang
Limin Wang
Yu Qiao
MLLM
92
568
0
10 May 2023
UniNeXt: Exploring A Unified Architecture for Vision Recognition
UniNeXt: Exploring A Unified Architecture for Vision Recognition
Fangjian Lin
Jianlong Yuan
Sitong Wu
Fan Wang
Zhibin Wang
ViT
51
14
0
26 Apr 2023
Segment Anything
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
306
7,274
0
05 Apr 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
85
351
0
29 Mar 2023
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
110
326
0
06 Dec 2022
A Generalized Framework for Video Instance Segmentation
A Generalized Framework for Video Instance Segmentation
Miran Heo
Sukjun Hwang
Jeongseok Hyun
Hanju Kim
Seoung Wug Oh
Joon-Young Lee
Seon Joo Kim
VLM
62
42
0
16 Nov 2022
Multi-dataset Training of Transformers for Robust Action Recognition
Multi-dataset Training of Transformers for Robust Action Recognition
Junwei Liang
Enwei Zhang
Jun Zhang
Chunhua Shen
ViT
89
11
0
26 Sep 2022
OmDet: Large-scale vision-language multi-dataset pre-training with
  multimodal detection network
OmDet: Large-scale vision-language multi-dataset pre-training with multimodal detection network
Tiancheng Zhao
Peng Liu
Kyusong Lee
VLM
MLLM
ObjD
31
5
0
10 Sep 2022
MinVIS: A Minimal Video Instance Segmentation Framework without
  Video-based Training
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
De-An Huang
Zhiding Yu
Anima Anandkumar
VLM
88
80
0
03 Aug 2022
In Defense of Online Models for Video Instance Segmentation
In Defense of Online Models for Video Instance Segmentation
Junfeng Wu
Qihao Liu
Yi Jiang
S. Bai
Alan Yuille
Xiang Bai
65
109
0
21 Jul 2022
VITA: Video Instance Segmentation via Object Token Association
VITA: Video Instance Segmentation via Object Token Association
Miran Heo
Sukjun Hwang
Seoung Wug Oh
Joon-Young Lee
Seon Joo Kim
VOS
49
92
0
09 Jun 2022
Detection Hub: Unifying Object Detection Datasets via Query Adaptation
  on Language Embedding
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
Lingchen Meng
Xiyang Dai
Yinpeng Chen
Pengchuan Zhang
Dongdong Chen
Mengchen Liu
Jianfeng Wang
Zuxuan Wu
Lu Yuan
Yu-Gang Jiang
ObjD
66
24
0
07 Jun 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation
Temporally Efficient Vision Transformer for Video Instance Segmentation
Shusheng Yang
Xinggang Wang
Yu Li
Yuxin Fang
Jiemin Fang
Wenyu Liu
Xun Zhao
Ying Shan
ViT
42
65
0
18 Apr 2022
Global Tracking Transformers
Global Tracking Transformers
Xingyi Zhou
Tianwei Yin
V. Koltun
Philipp Krahenbuhl
VOT
65
137
0
24 Mar 2022
BigDetection: A Large-scale Benchmark for Improved Object Detector
  Pre-training
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training
Likun Cai
Zhi-Li Zhang
Yi Zhu
Li Zhang
Mu Li
Xiangyang Xue
VLM
ObjD
74
41
0
24 Mar 2022
Efficient Video Instance Segmentation via Tracklet Query and Proposal
Efficient Video Instance Segmentation via Tracklet Query and Proposal
Jialian Wu
Sudhir Yarram
Hui Liang
Tian Lan
Junsong Yuan
J. Eledath
Gérard Medioni
49
37
0
03 Mar 2022
Mask2Former for Video Instance Segmentation
Mask2Former for Video Instance Segmentation
Bowen Cheng
Anwesa Choudhuri
Ishan Misra
Alexander Kirillov
Rohit Girdhar
Alex Schwing
VOS
94
169
0
20 Dec 2021
SeqFormer: Sequential Transformer for Video Instance Segmentation
SeqFormer: Sequential Transformer for Video Instance Segmentation
Junfeng Wu
Yi Jiang
S. Bai
Wenqing Zhang
Xiang Bai
ViT
77
103
0
15 Dec 2021
VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video
  Instance Segmentation
VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation
Sunggeun Han
Sukjun Hwang
Seoung Wug Oh
Yeonchool Park
Hyunwoo J. Kim
Minjung Kim
Seon Joo Kim
38
29
0
08 Dec 2021
Masked-attention Mask Transformer for Universal Image Segmentation
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
Alex Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
202
2,355
0
02 Dec 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
55
75
0
25 Nov 2021
Video Instance Segmentation using Inter-Frame Communication Transformers
Video Instance Segmentation using Inter-Frame Communication Transformers
Sukjun Hwang
Miran Heo
Seoung Wug Oh
Seon Joo Kim
ViT
107
137
0
07 Jun 2021
Crossover Learning for Fast Online Video Instance Segmentation
Crossover Learning for Fast Online Video Instance Segmentation
Shusheng Yang
Yuxin Fang
Xinggang Wang
Yu Li
Chen Fang
Ying Shan
Bin Feng
Wenyu Liu
66
104
0
13 Apr 2021
Unidentified Video Objects: A Benchmark for Dense, Open-World
  Segmentation
Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation
Weiyao Wang
Matt Feiszli
Heng Wang
Du Tran
VOS
57
125
0
10 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
423
21,347
0
25 Mar 2021
Video Instance Segmentation with a Propose-Reduce Paradigm
Video Instance Segmentation with a Propose-Reduce Paradigm
Huaijia Lin
Ruizheng Wu
Shu Liu
Jiangbo Lu
Jiaya Jia
VLM
55
97
0
25 Mar 2021
SG-Net: Spatial Granularity Network for One-Stage Video Instance
  Segmentation
SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation
Dongfang Liu
Yiming Cui
Wenbo Tan
Yingjie Chen
64
132
0
18 Mar 2021
Simple multi-dataset detection
Simple multi-dataset detection
Xingyi Zhou
V. Koltun
Philipp Krahenbuhl
ObjD
270
117
0
25 Feb 2021
Occluded Video Instance Segmentation: A Benchmark
Occluded Video Instance Segmentation: A Benchmark
Jiyang Qi
Yan Gao
Yao Hu
Xinggang Wang
Xiaoyu Liu
Xiang Bai
Serge Belongie
Alan Yuille
Philip Torr
S. Bai
VOS
VLM
56
139
0
02 Feb 2021
TrackFormer: Multi-Object Tracking with Transformers
TrackFormer: Multi-Object Tracking with Transformers
Tim Meinhardt
A. Kirillov
Laura Leal-Taixe
Christoph Feichtenhofer
VOT
263
764
0
07 Jan 2021
CompFeat: Comprehensive Feature Aggregation for Video Instance
  Segmentation
CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation
Yang Fu
Linjie Yang
Ding Liu
Thomas S. Huang
Humphrey Shi
VOS
63
71
0
07 Dec 2020
End-to-End Video Instance Segmentation with Transformers
End-to-End Video Instance Segmentation with Transformers
Yuqing Wang
Zhaoliang Xu
Xinlong Wang
Chunhua Shen
Baoshan Cheng
Hao Shen
Huaxia Xia
ViT
69
690
0
30 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
557
40,961
0
22 Oct 2020
MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking
MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking
Patrick Dendorfer
Aljosa Osep
Anton Milan
Konrad Schindler
Daniel Cremers
Ian Reid
Stefan Roth
Laura Leal-Taixé
VOT
63
264
0
15 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
200
5,068
0
08 Oct 2020
SipMask: Spatial Information Preservation for Fast Image and Video
  Instance Segmentation
SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation
Jiale Cao
Rao Muhammad Anwer
Hisham Cholakkal
Fahad Shahbaz Khan
Yanwei Pang
Ling Shao
ISeg
50
171
0
29 Jul 2020
Quasi-Dense Similarity Learning for Multiple Object Tracking
Quasi-Dense Similarity Learning for Multiple Object Tracking
Jiangmiao Pang
Linlu Qiu
Xia Li
Haofeng Chen
Qi Li
Trevor Darrell
Feng Yu
VOT
125
371
0
11 Jun 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
365
13,002
0
26 May 2020
FairMOT: On the Fairness of Detection and Re-Identification in Multiple
  Object Tracking
FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking
Yifu Zhang
Chunyu Wang
Xinggang Wang
Wenjun Zeng
Wenyu Liu
VOT
86
1,335
0
04 Apr 2020
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos
STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos
A. Athar
Sabarinath Mahadevan
Aljosa Osep
Laura Leal-Taixé
Bastian Leibe
VOS
84
170
0
18 Mar 2020
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
Jonathan Munro
Dima Damen
EgoV
50
194
0
27 Jan 2020
Classifying, Segmenting, and Tracking Object Instances in Video with
  Mask Propagation
Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation
Gedas Bertasius
Lorenzo Torresani
48
178
0
10 Dec 2019
Video Instance Segmentation
Video Instance Segmentation
Linjie Yang
Yuchen Fan
N. Xu
VOS
ISeg
74
506
0
12 May 2019
Video Object Segmentation using Space-Time Memory Networks
Video Object Segmentation using Space-Time Memory Networks
Seoung Wug Oh
Joon-Young Lee
N. Xu
Seon Joo Kim
VOS
72
709
0
01 Apr 2019
Tracking without bells and whistles
Tracking without bells and whistles
Philipp Bergmann
Tim Meinhardt
Laura Leal-Taixe
VOT
107
910
0
13 Mar 2019
MOTS: Multi-Object Tracking and Segmentation
MOTS: Multi-Object Tracking and Segmentation
P. Voigtlaender
Michael Krause
Aljosa Osep
Jonathon Luiten
Berin Balachandar Gnana Sekar
Andreas Geiger
Bastian Leibe
VOT
72
578
0
10 Feb 2019
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
346
27,174
0
20 Mar 2017
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.1K
193,814
0
10 Dec 2015
12
Next