Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.16795
Cited By
Deformable Video Transformer
31 March 2022
Jue Wang
Lorenzo Torresani
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deformable Video Transformer"
21 / 21 papers shown
Title
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and
O
(
T
)
\mathcal{O}(T)
O
(
T
)
Complexity
Shihao Zou
Qingfeng Li
Wei Ji
Jingjing Li
Yongkui Yang
Guoqi Li
Chao Dong
27
0
0
15 May 2025
Video Token Merging for Long-form Video Understanding
Seon-Ho Lee
Jue Wang
Zhikang Zhang
D. Fan
Xinyu Li
48
5
0
31 Oct 2024
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
Manuel Benavent-Lledo
David Mulero-Pérez
David Ortiz-Perez
José García Rodríguez
Antonis Argyros
24
0
0
28 Oct 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
40
4
0
03 Jul 2024
A Simple Video Segmenter by Tracking Objects Along Axial Trajectories
Ju He
Qihang Yu
Inkyu Shin
XueQing Deng
Alan L. Yuille
Xiaohui Shen
Liang-Chieh Chen
VOS
40
2
0
30 Nov 2023
Motion-Guided Masking for Spatiotemporal Representation Learning
D. Fan
Jue Wang
Shuai Liao
Yi Zhu
Vimal Bhat
H. Santos-Villalobos
M. Rohith
Xinyu Li
VGen
37
19
0
24 Aug 2023
SVT: Supertoken Video Transformer for Efficient Video Understanding
Chen-Ming Pan
Rui Hou
Hanchao Yu
Qifan Wang
Senem Velipasalar
Madian Khabsa
ViT
26
0
0
01 Apr 2023
Selective Structured State-Spaces for Long-Form Video Understanding
Jue Wang
Wenjie Zhu
Pichao Wang
Xiang Yu
Linda Liu
Mohamed Omar
Raffay Hamid
41
94
0
25 Mar 2023
Towards Robust Video Instance Segmentation with Temporal-Aware Transformer
Zhenghao Zhang
Fang Shao
Zuozhuo Dai
Siyu Zhu
ViT
17
1
0
20 Jan 2023
Semantic-Aware Local-Global Vision Transformer
Jiatong Zhang
Zengwei Yao
Fanglin Chen
Guangming Lu
Wenjie Pei
ViT
25
0
0
27 Nov 2022
PatchBlender: A Motion Prior for Video Transformers
Gabriele Prato
Yale Song
Janarthanan Rajendran
R. Devon Hjelm
Neel Joshi
Sarath Chandar
ViT
27
0
0
11 Nov 2022
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
30
4
0
15 Oct 2022
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Mian
ViT
19
44
0
13 Sep 2022
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation
Bolin Lai
Miao Liu
Fiona Ryan
James M. Rehg
ViT
40
33
0
08 Aug 2022
Towards Real-World Video Denosing: A Practical Video Denosing Dataset and Network
Xiaogang Xu
Yi-Bin Yu
Nianjuan Jiang
Jiangbo Lu
Bei Yu
Jiaya Jia
40
0
0
04 Jul 2022
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
277
2,606
0
04 May 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
251
577
0
22 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,984
0
09 Feb 2021
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
204
422
0
01 Feb 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
285
2,017
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
252
580
0
12 Mar 2020
1