Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.13196
Cited By
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
25 November 2021
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
8 / 58 papers shown
Title
Temporal Segment Networks for Action Recognition in Videos
Limin Wang
Yuanjun Xiong
Zhe Wang
Yu Qiao
Dahua Lin
Xiaoou Tang
Luc Van Gool
ViT
81
807
0
08 May 2017
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou
Chenliang Xu
Jason J. Corso
EgoV
57
812
0
28 Mar 2017
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Christian Szegedy
Sergey Ioffe
Vincent Vanhoucke
Alexander A. Alemi
268
14,196
0
23 Feb 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.2K
192,638
0
10 Dec 2015
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
83
951
0
15 Dec 2014
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
214
4,451
0
20 Nov 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Jeff Donahue
Lisa Anne Hendricks
Marcus Rohrbach
Subhashini Venugopalan
S. Guadarrama
Kate Saenko
Trevor Darrell
VLM
117
6,037
0
17 Nov 2014
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
978
39,383
0
01 Sep 2014
Previous
1
2