Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.14785
Cited By
A Comprehensive Review of the Video-to-Text Problem
27 March 2021
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Comprehensive Review of the Video-to-Text Problem"
47 / 47 papers shown
Title
Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling
Ville Heilala
Roberto Araya
Raija Hamalainen
38
4
0
24 Sep 2024
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
83
380
0
26 Jun 2020
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
Shizhe Chen
Yida Zhao
Qin Jin
Qi Wu
70
311
0
01 Mar 2020
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
62
271
0
26 Feb 2020
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
85
163
0
27 Aug 2019
Visual Question Answering using Deep Learning: A Survey and Performance Analysis
Yash Srivastava
Vaishnav Murali
S. Dubey
Snehasis Mukherjee
43
48
0
27 Aug 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
91
1,192
0
07 Jun 2019
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning
Wei Zhang
Bairui Wang
Lin Ma
Wei Liu
75
67
0
03 Jun 2019
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
Jingwen Chen
Yingwei Pan
Yehao Li
Ting Yao
Hongyang Chao
Tao Mei
40
103
0
03 May 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
75
550
0
06 Apr 2019
Learning Correspondence from the Cycle-Consistency of Time
Xinyu Wang
Allan Jabri
Alexei A. Efros
SSL
65
488
0
18 Mar 2019
Cycle-Consistency for Robust Visual Question Answering
Meet Shah
Xinlei Chen
Marcus Rohrbach
Devi Parikh
OOD
43
188
0
15 Feb 2019
Hierarchical LSTMs with Adaptive Attention for Visual Captioning
Jingkuan Song
Xiangpeng Li
Lianli Gao
Heng Tao Shen
43
221
0
26 Dec 2018
Cross-Modal and Hierarchical Modeling of Video and Text
Bowen Zhang
Hexiang Hu
Fei Sha
BDL
AI4TS
36
189
0
16 Oct 2018
Human Action Recognition and Prediction: A Survey
Yu Kong
Y. Fu
70
618
0
28 Jun 2018
ECO: Efficient Convolutional Network for Online Video Understanding
Mohammadreza Zolfaghari
Kamaljeet Singh
Thomas Brox
170
498
0
24 Apr 2018
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
Antoine Miech
Ivan Laptev
Josef Sivic
45
234
0
07 Apr 2018
End-to-End Dense Video Captioning with Masked Transformer
Luowei Zhou
Yingbo Zhou
Jason J. Corso
R. Socher
Caiming Xiong
73
527
0
03 Apr 2018
Less Is More: Picking Informative Frames for Video Captioning
Yangyu Chen
Shuhui Wang
Wentao Zhang
Qingming Huang
42
200
0
05 Mar 2018
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
133
1,317
0
13 Dec 2017
Video Captioning with Guidance of Multimodal Latent Topics
Shizhe Chen
Jia Chen
Qin Jin
Alexander G. Hauptmann
76
67
0
31 Aug 2017
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
91
940
0
04 Aug 2017
Supervising Neural Attention Models for Video Captioning by Human Gaze Data
Youngjae Yu
Jongwook Choi
Yeonhwa Kim
Kyung Yoo
Sang-Hun Lee
Gunhee Kim
43
69
0
19 Jul 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
199
7,961
0
22 May 2017
Generating Descriptions with Grounded and Co-Referenced People
Anna Rohrbach
Marcus Rohrbach
Siyu Tang
Seong Joon Oh
Bernt Schiele
355
72
0
05 Apr 2017
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
Albert Gatt
E. Krahmer
LM&MA
ELM
60
814
0
29 Mar 2017
Hierarchical Boundary-Aware Neural Encoder for Video Captioning
Lorenzo Baraldi
C. Grana
Rita Cucchiara
41
191
0
28 Nov 2016
Video Captioning with Transferred Semantic Attributes
Yingwei Pan
Ting Yao
Houqiang Li
Tao Mei
47
329
0
23 Nov 2016
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
430
10,281
0
16 Nov 2016
Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation
Rakshith Shetty
Jorma T. Laaksonen
36
94
0
17 Aug 2016
Learning Joint Representations of Videos and Sentences with Web Image Search
Mayu Otani
Yuta Nakashima
Esa Rahtu
J. Heikkilä
N. Yokoya
42
94
0
08 Aug 2016
R-FCN: Object Detection via Region-based Fully Convolutional Networks
Jifeng Dai
Yi Li
Kaiming He
Jian Sun
ObjD
110
5,627
0
20 May 2016
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Gunnar Sigurdsson
Gül Varol
Xinyu Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
77
1,238
0
06 Apr 2016
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
Pingbo Pan
Zhongwen Xu
Yi Yang
Leilei Gan
Yueting Zhuang
38
385
0
11 Nov 2015
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks
Haonan Yu
Jiang Wang
Zhiheng Huang
Yi Yang
Wenyuan Xu
72
560
0
26 Oct 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
153
2,461
0
01 Apr 2015
FaceNet: A Unified Embedding for Face Recognition and Clustering
Florian Schroff
Dmitry Kalenichenko
James Philbin
3DH
272
13,079
0
12 Mar 2015
Describing Videos by Exploiting Temporal Structure
L. Yao
Atousa Torabi
Kyunghyun Cho
Nicolas Ballas
C. Pal
Hugo Larochelle
Aaron Courville
122
1,063
0
27 Feb 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
171
18,534
0
06 Feb 2015
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
91
951
0
15 Dec 2014
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
224
4,451
0
20 Nov 2014
From Captions to Visual Concepts and Back
Hao Fang
Saurabh Gupta
F. Iandola
R. Srivastava
Li Deng
...
Xiaodong He
Margaret Mitchell
John C. Platt
C. L. Zitnick
Geoffrey Zweig
VLM
67
1,310
0
18 Nov 2014
How transferable are features in deep neural networks?
J. Yosinski
Jeff Clune
Yoshua Bengio
Hod Lipson
OOD
145
8,309
0
06 Nov 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
388
27,205
0
01 Sep 2014
Video In Sentences Out
Andrei Barbu
Alexander Bridge
Zachary Burchill
D. Coroian
Sven J. Dickinson
...
Jarrell W. Waggoner
Song Wang
Jinlian Wei
Yifan Yin
Zhiqi Zhang
35
155
0
09 Aug 2014
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov
Ilya Sutskever
Kai Chen
G. Corrado
J. Dean
NAI
OCL
296
33,445
0
16 Oct 2013
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
Jeff Donahue
Yangqing Jia
Oriol Vinyals
Judy Hoffman
Ning Zhang
Eric Tzeng
Trevor Darrell
VLM
ObjD
153
4,946
0
06 Oct 2013
1