Deformable Video Transformer

31 March 2022

Papers citing "Deformable Video Transformer"

21 / 21 papers shown

Title
$SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity$ SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity Shihao Zou Qingfeng Li Wei Ji Jingjing Li Yongkui Yang Guoqi Li Chao Dong 27 0 0 15 May 2025
Video Token Merging for Long-form Video Understanding Seon-Ho Lee Jue Wang Zhikang Zhang D. Fan Xinyu Li 48 5 0 31 Oct 2024
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context Manuel Benavent-Lledo David Mulero-Pérez David Ortiz-Perez José García Rodríguez Antonis Argyros 24 0 0 28 Oct 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition Y. Hao Diansong Zhou Zhicai Wang Chong-Wah Ngo Meng Wang ViT 40 4 0 03 Jul 2024
A Simple Video Segmenter by Tracking Objects Along Axial Trajectories Ju He Qihang Yu Inkyu Shin XueQing Deng Alan L. Yuille Xiaohui Shen Liang-Chieh Chen VOS 40 2 0 30 Nov 2023
Motion-Guided Masking for Spatiotemporal Representation Learning D. Fan Jue Wang Shuai Liao Yi Zhu Vimal Bhat H. Santos-Villalobos M. Rohith Xinyu Li VGen 37 19 0 24 Aug 2023
SVT: Supertoken Video Transformer for Efficient Video Understanding Chen-Ming Pan Rui Hou Hanchao Yu Qifan Wang Senem Velipasalar Madian Khabsa ViT 26 0 0 01 Apr 2023
Selective Structured State-Spaces for Long-Form Video Understanding Jue Wang Wenjie Zhu Pichao Wang Xiang Yu Linda Liu Mohamed Omar Raffay Hamid 41 94 0 25 Mar 2023
Towards Robust Video Instance Segmentation with Temporal-Aware Transformer Zhenghao Zhang Fang Shao Zuozhuo Dai Siyu Zhu ViT 17 1 0 20 Jan 2023
Semantic-Aware Local-Global Vision Transformer Jiatong Zhang Zengwei Yao Fanglin Chen Guangming Lu Wenjie Pei ViT 25 0 0 27 Nov 2022
PatchBlender: A Motion Prior for Video Transformers Gabriele Prato Yale Song Janarthanan Rajendran R. Devon Hjelm Neel Joshi Sarath Chandar ViT 27 0 0 11 Nov 2022
Linear Video Transformer with Feature Fixation Kaiyue Lu Zexia Liu Jianyuan Wang Weixuan Sun Zhen Qin ... Xuyang Shen Huizhong Deng Xiaodong Han Yuchao Dai Yiran Zhong 30 4 0 15 Oct 2022
Vision Transformers for Action Recognition: A Survey Anwaar Ulhaq Naveed Akhtar Ganna Pogrebna Ajmal Mian ViT 19 44 0 13 Sep 2022
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation Bolin Lai Miao Liu Fiona Ryan James M. Rehg ViT 40 33 0 08 Aug 2022
Towards Real-World Video Denosing: A Practical Video Denosing Dataset and Network Xiaogang Xu Yi-Bin Yu Nianjuan Jiang Jiangbo Lu Bei Yu Jiaya Jia 40 0 0 04 Jul 2022
MLP-Mixer: An all-MLP Architecture for Vision Ilya O. Tolstikhin N. Houlsby Alexander Kolesnikov Lucas Beyer Xiaohua Zhai ... Andreas Steiner Daniel Keysers Jakob Uszkoreit Mario Lucic Alexey Dosovitskiy 277 2,606 0 04 May 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Hassan Akbari Liangzhe Yuan Rui Qian Wei-Hong Chuang Shih-Fu Chang Huayu Chen Boqing Gong ViT 251 577 0 22 Apr 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 283 1,984 0 09 Feb 2021
Video Transformer Network Daniel Neimark Omri Bar Maya Zohar Dotan Asselmann ViT 204 422 0 01 Feb 2021
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 285 2,017 0 28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 252 580 0 12 Mar 2020