Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.00719
Cited By
Video Transformer Network
1 February 2021
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Video Transformer Network"
50 / 86 papers shown
Title
F
3
^3
3
Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
Zhaoyu Liu
Kan Jiang
Murong Ma
Zhé Hóu
Yun Lin
J. Dong
37
0
0
11 Apr 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
42
0
0
11 Feb 2025
Multiscaled Multi-Head Attention-based Video Transformer Network for Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
SLR
30
16
0
03 Jan 2025
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
Yunshan Zhong
Yuyao Zhou
Yuxin Zhang
Shen Li
Yong Li
Fei Chao
Zhanpeng Zeng
Rongrong Ji
MQ
94
0
0
31 Dec 2024
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
96
0
0
20 Nov 2024
On Learning Multi-Modal Forgery Representation for Diffusion Generated Video Detection
Xiufeng Song
Xiao Guo
J. Zhang
Qirui Li
Lei Bai
Xiaoming Liu
Guangtao Zhai
Xiaohong Liu
DiffM
VGen
69
9
0
31 Oct 2024
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&Ro
LLMAG
60
35
0
16 Jan 2024
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Z. Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
DiffM
VGen
123
233
0
05 Jan 2024
Video Recognition in Portrait Mode
Mingfei Han
Linjie Yang
Xiaojie Jin
Jiashi Feng
Xiaojun Chang
Heng Wang
28
3
0
21 Dec 2023
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
Xiao Wang
Yao Rong
Shiao Wang
Yuan Chen
Zhe Wu
Bowei Jiang
Yonghong Tian
Jin Tang
ViT
76
3
0
18 Dec 2023
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun V. Reddy
William Paul
Corban Rivera
Ketul Shah
Celso M. de Melo
Rama Chellappa
37
4
0
05 Dec 2023
Overcoming Label Noise for Source-free Unsupervised Video Domain Adaptation
A. Dasgupta
C. V. Jawahar
Karteek Alahari
TTA
VLM
16
10
0
30 Nov 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
Jiaming Zhou
Hanjun Li
Kun-Yu Lin
Junwei Liang
21
1
0
28 Nov 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
35
117
0
16 Oct 2023
RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches
Shawn Mathew
Saad Nadeem
Alvin C. Goh
Arie Kaufman
MedIm
47
0
0
02 Oct 2023
Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition
Dimitrios Daskalakis
Nikolaos Gkalelis
Vasileios Mezaris
34
0
0
24 Aug 2023
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation
Hejun Xiao
Kunyu Peng
Xiangsheng Huang
Alina Roitberg
Hao Li
Zhao Wang
Rainer Stiefelhagen
18
3
0
23 Aug 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
38
8
0
18 Jul 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
F. Khan
M. Shah
VLM
VPVLM
28
73
0
06 Apr 2023
On the Benefits of 3D Pose and Tracking for Human Action Recognition
Jathushan Rajasegaran
Georgios Pavlakos
Angjoo Kanazawa
Christoph Feichtenhofer
Jitendra Malik
28
30
0
03 Apr 2023
Multi-view knowledge distillation transformer for human action recognition
Yi Lin
Vincent S. Tseng
ViT
18
1
0
25 Mar 2023
Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution
Yunfan Lu
Zipeng Wang
Minjie Liu
Hongjian Wang
Lin Wang
SupR
21
31
0
24 Mar 2023
Graph Transformer GANs for Graph-Constrained House Generation
H. Tang
Zhenyu Zhang
Humphrey Shi
Bo-wen Li
Lin Shao
N. Sebe
Radu Timofte
Luc Van Gool
41
19
0
14 Mar 2023
Fruit Ripeness Classification: a Survey
Matteo Rizzo
Matteo Marcuzzo
A. Zangari
A. Gasparetto
A. Albarelli
25
62
0
29 Dec 2022
A Survey on Human Action Recognition
Zhou Shuchang
29
0
0
20 Dec 2022
Cross-Modal Learning with 3D Deformable Attention for Action Recognition
Sangwon Kim
Dasom Ahn
ByoungChul Ko
ViT
3DPC
25
24
0
12 Dec 2022
Video Test-Time Adaptation for Action Recognition
Wei Lin
M. Jehanzeb Mirza
Mateusz Koziñski
Horst Possegger
Hilde Kuehne
Horst Bischof
TTA
37
31
0
24 Nov 2022
SVFormer: Semi-supervised Video Transformer for Action Recognition
Zhen Xing
Qi Dai
Hang-Rui Hu
Jingjing Chen
Zuxuan Wu
Yu-Gang Jiang
ViT
22
69
0
23 Nov 2022
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training
Guoxi Huang
A. Bors
19
1
0
23 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
25
106
0
17 Nov 2022
Focal and Global Spatial-Temporal Transformer for Skeleton-based Action Recognition
Zhimin Gao
Peitao Wang
Pei Lv
Xiaoheng Jiang
Qi-dong Liu
Pichao Wang
Mingliang Xu
Wanqing Li
ViT
49
27
0
06 Oct 2022
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers
Zhikai Li
Mengjuan Chen
Junrui Xiao
Qingyi Gu
ViT
MQ
43
33
0
13 Sep 2022
Time-distance vision transformers in lung cancer diagnosis from longitudinal computed tomography
Thomas Z. Li
Kaiwen Xu
Riqiang Gao
Yucheng Tang
Thomas A. Lasko
Fabien Maldonado
K. Sandler
Bennett A. Landman
ViT
MedIm
19
11
0
04 Sep 2022
A Circular Window-based Cascade Transformer for Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
34
6
0
30 Aug 2022
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLM
CLIP
ViT
23
312
0
04 Aug 2022
MAR: Masked Autoencoders for Efficient Action Recognition
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Xiang Wang
Yuehuang Wang
Yiliang Lv
Changxin Gao
Nong Sang
19
42
0
24 Jul 2022
Time Is MattEr: Temporal Self-supervision for Video Transformers
Sukmin Yun
Jaehyung Kim
Dongyoon Han
Hwanjun Song
Jung-Woo Ha
Jinwoo Shin
ViT
15
12
0
19 Jul 2022
Eliminating Gradient Conflict in Reference-based Line-Art Colorization
Zekun Li
Zhengyang Geng
Zhao Kang
Wenyu Chen
Yibo Yang
18
35
0
13 Jul 2022
Earthformer: Exploring Space-Time Transformers for Earth System Forecasting
Zhihan Gao
Xingjian Shi
Hao Wang
Yi Zhu
Yuyang Wang
Mu Li
Dit-Yan Yeung
AI4TS
39
148
0
12 Jul 2022
VidConv: A modernized 2D ConvNet for Efficient Video Recognition
Chuong H. Nguyen
Su Huynh
Vinh Nguyen
Ngoc-Khanh Nguyen
ViT
24
3
0
08 Jul 2022
Large-scale Robustness Analysis of Video Action Recognition Models
Madeline Chantry Schiappa
Naman Biyani
Prudvi Kamtam
Shruti Vyas
Hamid Palangi
Vibhav Vineet
Y. S. Rawat
AAML
24
24
0
04 Jul 2022
CTrGAN: Cycle Transformers GAN for Gait Transfer
Shahar Mahpod
Noam Gaash
Hay Hoffman
Gil Ben-Artzi
ViT
23
1
0
30 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
30
32
0
19 Jun 2022
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
ViT
22
15
0
13 Jun 2022
SimVP: Simpler yet Better Video Prediction
Zhangyang Gao
Cheng Tan
Lirong Wu
Stan Z. Li
33
211
0
09 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
24
52
0
02 Jun 2022
Cross-Architecture Self-supervised Video Representation Learning
Sheng Guo
Zihua Xiong
Yujie Zhong
Limin Wang
Xiaobo Guo
Bing Han
Weilin Huang
SSL
AI4TS
66
24
0
26 May 2022
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
Mingdong Yang
Guo Chen
Yin-Dong Zheng
Tong Lu
Limin Wang
31
45
0
05 May 2022
Temporal Relevance Analysis for Video Action Models
Quanfu Fan
Donghyun Kim
Chun-Fu Chen
Chen
Stan Sclaroff
Kate Saenko
Sarah Adel Bargal
FAtt
22
0
0
25 Apr 2022
1
2
Next