SwinBERT: End-to-End Transformers with Sparse Attention for Video
Captioning

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

25 November 2021

Kevin Qinghong Lin

Chung-Ching Lin

Zicheng Liu

Papers citing "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"

8 / 58 papers shown

Title
Temporal Segment Networks for Action Recognition in Videos Limin Wang Yuanjun Xiong Zhe Wang Yu Qiao Dahua Lin Xiaoou Tang Luc Van Gool ViT 81 807 0 08 May 2017
Towards Automatic Learning of Procedures from Web Instructional Videos Luowei Zhou Chenliang Xu Jason J. Corso EgoV 57 812 0 28 Mar 2017
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Christian Szegedy Sergey Ioffe Vincent Vanhoucke Alexander A. Alemi 268 14,196 0 23 Feb 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 1.2K 192,638 0 10 Dec 2015
Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Venugopalan Huijuan Xu Jeff Donahue Marcus Rohrbach Raymond J. Mooney Kate Saenko 83 951 0 15 Dec 2014
CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam C. L. Zitnick Devi Parikh 214 4,451 0 20 Nov 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description Jeff Donahue Lisa Anne Hendricks Marcus Rohrbach Subhashini Venugopalan S. Guadarrama Kate Saenko Trevor Darrell VLM 117 6,037 0 17 Nov 2014
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 978 39,383 0 01 Sep 2014