Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2208.09266
Cited By
Diverse Video Captioning by Adaptive Spatio-temporal Attention
19 August 2022
Zohreh Ghaderi
Leonard Salewski
Hendrik P. A. Lensch
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Diverse Video Captioning by Adaptive Spatio-temporal Attention"
13 / 13 papers shown
Title
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
72
1,458
0
24 Jun 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
106
2,119
0
29 Mar 2021
SBAT: Video Captioning with Sparse Boundary-Aware Transformer
Tao Jin
Siyu Huang
Ming Chen
Yingming Li
Zhongfei Zhang
54
53
0
23 Jul 2020
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
57
271
0
26 Feb 2020
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
64
544
0
06 Apr 2019
End-to-End Dense Video Captioning with Masked Transformer
Luowei Zhou
Yingbo Zhou
Jason J. Corso
R. Socher
Caiming Xiong
66
527
0
03 Apr 2018
Less Is More: Picking Informative Frames for Video Captioning
Yangyu Chen
Shuhui Wang
Wentao Zhang
Qingming Huang
39
200
0
05 Mar 2018
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Y. Zhang
Phillip Isola
Alexei A. Efros
Eli Shechtman
Oliver Wang
EGVM
259
11,610
0
11 Jan 2018
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
146
2,461
0
01 Apr 2015
Describing Videos by Exploiting Temporal Structure
L. Yao
Atousa Torabi
Kyunghyun Cho
Nicolas Ballas
C. Pal
Hugo Larochelle
Aaron Courville
118
1,063
0
27 Feb 2015
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
85
951
0
15 Dec 2014
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
214
4,451
0
20 Nov 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
376
27,205
0
01 Sep 2014
1