A Comprehensive Review of the Video-to-Text Problem

27 March 2021

Papers citing "A Comprehensive Review of the Video-to-Text Problem"

47 / 47 papers shown

Title
Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling Ville Heilala Roberto Araya Raija Hamalainen 38 4 0 24 Sep 2024
Evaluation of Text Generation: A Survey Asli Celikyilmaz Elizabeth Clark Jianfeng Gao ELM LM&MA 83 380 0 26 Jun 2020
Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning Shizhe Chen Yida Zhao Qin Jin Qi Wu 70 311 0 01 Mar 2020
Object Relational Graph with Teacher-Recommended Learning for Video Captioning Ziqi Zhang Yaya Shi Chunfen Yuan Bing Li Peijin Wang Weiming Hu Zhengjun Zha VLM 62 271 0 26 Feb 2020
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network Bairui Wang Lin Ma Wei Zhang Wenhao Jiang Jingwen Wang Wei Liu 85 163 0 27 Aug 2019
Visual Question Answering using Deep Learning: A Survey and Performance Analysis Yash Srivastava Vaishnav Murali S. Dubey Snehasis Mukherjee 43 48 0 27 Aug 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips Antoine Miech Dimitri Zhukov Jean-Baptiste Alayrac Makarand Tapaswi Ivan Laptev Josef Sivic VGen 91 1,192 0 07 Jun 2019
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning Wei Zhang Bairui Wang Lin Ma Wei Liu 75 67 0 03 Jun 2019
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning Jingwen Chen Yingwei Pan Yehao Li Ting Yao Hongyang Chao Tao Mei 40 103 0 03 May 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Xin Eric Wang Jiawei Wu Junkun Chen Lei Li Yuan-fang Wang William Yang Wang 75 550 0 06 Apr 2019
Learning Correspondence from the Cycle-Consistency of Time Xinyu Wang Allan Jabri Alexei A. Efros SSL 65 488 0 18 Mar 2019
Cycle-Consistency for Robust Visual Question Answering Meet Shah Xinlei Chen Marcus Rohrbach Devi Parikh OOD 43 188 0 15 Feb 2019
Hierarchical LSTMs with Adaptive Attention for Visual Captioning Jingkuan Song Xiangpeng Li Lianli Gao Heng Tao Shen 43 221 0 26 Dec 2018
Cross-Modal and Hierarchical Modeling of Video and Text Bowen Zhang Hexiang Hu Fei Sha BDL AI4TS 36 189 0 16 Oct 2018
Human Action Recognition and Prediction: A Survey Yu Kong Y. Fu 70 618 0 28 Jun 2018
ECO: Efficient Convolutional Network for Online Video Understanding Mohammadreza Zolfaghari Kamaljeet Singh Thomas Brox 170 498 0 24 Apr 2018
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data Antoine Miech Ivan Laptev Josef Sivic 45 234 0 07 Apr 2018
End-to-End Dense Video Captioning with Masked Transformer Luowei Zhou Yingbo Zhou Jason J. Corso R. Socher Caiming Xiong 73 527 0 03 Apr 2018
Less Is More: Picking Informative Frames for Video Captioning Yangyu Chen Shuhui Wang Wentao Zhang Qingming Huang 42 200 0 05 Mar 2018
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification Saining Xie Chen Sun Jonathan Huang Zhuowen Tu Kevin Patrick Murphy 3DH 133 1,317 0 13 Dec 2017
Video Captioning with Guidance of Multimodal Latent Topics Shizhe Chen Jia Chen Qin Jin Alexander G. Hauptmann 76 67 0 31 Aug 2017
Localizing Moments in Video with Natural Language Lisa Anne Hendricks Oliver Wang Eli Shechtman Josef Sivic Trevor Darrell Bryan C. Russell 91 940 0 04 Aug 2017
Supervising Neural Attention Models for Video Captioning by Human Gaze Data Youngjae Yu Jongwook Choi Yeonhwa Kim Kyung Yoo Sang-Hun Lee Gunhee Kim 43 69 0 19 Jul 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset João Carreira Andrew Zisserman 199 7,961 0 22 May 2017
Generating Descriptions with Grounded and Co-Referenced People Anna Rohrbach Marcus Rohrbach Siyu Tang Seong Joon Oh Bernt Schiele 355 72 0 05 Apr 2017
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation Albert Gatt E. Krahmer LM&MA ELM 60 814 0 29 Mar 2017
Hierarchical Boundary-Aware Neural Encoder for Video Captioning Lorenzo Baraldi C. Grana Rita Cucchiara 41 191 0 28 Nov 2016
Video Captioning with Transferred Semantic Attributes Yingwei Pan Ting Yao Houqiang Li Tao Mei 47 329 0 23 Nov 2016
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Zhuowen Tu Kaiming He 430 10,281 0 16 Nov 2016
Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation Rakshith Shetty Jorma T. Laaksonen 36 94 0 17 Aug 2016
Learning Joint Representations of Videos and Sentences with Web Image Search Mayu Otani Yuta Nakashima Esa Rahtu J. Heikkilä N. Yokoya 42 94 0 08 Aug 2016
R-FCN: Object Detection via Region-based Fully Convolutional Networks Jifeng Dai Yi Li Kaiming He Jian Sun ObjD 110 5,627 0 20 May 2016
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding Gunnar Sigurdsson Gül Varol Xinyu Wang Ali Farhadi Ivan Laptev Abhinav Gupta VGen 77 1,238 0 06 Apr 2016
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning Pingbo Pan Zhongwen Xu Yi Yang Leilei Gan Yueting Zhuang 38 385 0 11 Nov 2015
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks Haonan Yu Jiang Wang Zhiheng Huang Yi Yang Wenyuan Xu 72 560 0 26 Oct 2015
Microsoft COCO Captions: Data Collection and Evaluation Server Xinlei Chen Hao Fang Nayeon Lee Ramakrishna Vedantam Saurabh Gupta Piotr Dollar C. L. Zitnick 153 2,461 0 01 Apr 2015
FaceNet: A Unified Embedding for Face Recognition and Clustering Florian Schroff Dmitry Kalenichenko James Philbin 3DH 272 13,079 0 12 Mar 2015
Describing Videos by Exploiting Temporal Structure L. Yao Atousa Torabi Kyunghyun Cho Nicolas Ballas C. Pal Hugo Larochelle Aaron Courville 122 1,063 0 27 Feb 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He Xinming Zhang Shaoqing Ren Jian Sun VLM 171 18,534 0 06 Feb 2015
Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Venugopalan Huijuan Xu Jeff Donahue Marcus Rohrbach Raymond J. Mooney Kate Saenko 91 951 0 15 Dec 2014
CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam C. L. Zitnick Devi Parikh 224 4,451 0 20 Nov 2014
From Captions to Visual Concepts and Back Hao Fang Saurabh Gupta F. Iandola R. Srivastava Li Deng ... Xiaodong He Margaret Mitchell John C. Platt C. L. Zitnick Geoffrey Zweig VLM 67 1,310 0 18 Nov 2014
How transferable are features in deep neural networks? J. Yosinski Jeff Clune Yoshua Bengio Hod Lipson OOD 145 8,309 0 06 Nov 2014
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 388 27,205 0 01 Sep 2014
Video In Sentences Out Andrei Barbu Alexander Bridge Zachary Burchill D. Coroian Sven J. Dickinson ... Jarrell W. Waggoner Song Wang Jinlian Wei Yifan Yin Zhiqi Zhang 35 155 0 09 Aug 2014
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov Ilya Sutskever Kai Chen G. Corrado J. Dean NAI OCL 296 33,445 0 16 Oct 2013
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition Jeff Donahue Yangqing Jia Oriol Vinyals Judy Hoffman Ning Zhang Eric Tzeng Trevor Darrell VLM ObjD 153 4,946 0 06 Oct 2013