Delving Deeper into the Decoder for Video Captioning

16 January 2020

Papers citing "Delving Deeper into the Decoder for Video Captioning"

15 / 15 papers shown

Title
Question-Answering Dense Video Events Hangyu Qin Junbin Xiao Angela Yao VLM 71 1 0 06 Sep 2024
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian Willy Fitra Hendria 29 2 0 20 Jun 2023
A Review of Deep Learning for Video Captioning Moloud Abdar Meenakshi Kollati Swaraja Kuraparthi Farhad Pourpanah Daniel J. McDuff ... Shuicheng Yan Abduallah A. Mohamed Abbas Khosravi Erik Cambria Fatih Porikli 3DV 27 20 0 22 Apr 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training Wei Li Linchao Zhu Longyin Wen Yi Yang VLM 42 86 0 06 Mar 2023
STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training Weihong Zhong Mao Zheng Duyu Tang Xuan Luo Heng Gong Xiaocheng Feng Bing Qin 27 8 0 20 Feb 2023
Zero-Shot Video Captioning with Evolving Pseudo-Tokens Yoad Tewel Yoav Shalev Roy Nadler Idan Schwartz Lior Wolf 29 28 0 22 Jul 2022
Learning Audio-Video Modalities from Image Captions Arsha Nagrani Paul Hongsuck Seo Bryan Seybold Anja Hauth Santiago Manén Chen Sun Cordelia Schmid CLIP 11 82 0 01 Apr 2022
End-to-end Generative Pretraining for Multimodal Video Captioning Paul Hongsuck Seo Arsha Nagrani Anurag Arnab Cordelia Schmid 27 164 0 20 Jan 2022
CLIP4Caption: CLIP for Video Caption Mingkang Tang Zhanyu Wang Zhenhua Liu Fengyun Rao Dian Li Xiu Li CLIP VLM 32 149 0 13 Oct 2021
A Comprehensive Review of the Video-to-Text Problem Jesus Perez-Martin B. Bustos S. Guimarães I. Sipiran Jorge A. Pérez Grethel Coello Said 13 17 0 27 Mar 2021
The MSR-Video to Text Dataset with Clean Annotations Haoran Chen Jianmin Li Simone Frintrop Xiaolin Hu 22 18 0 12 Feb 2021
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends Shagun Uppal Sarthak Bhagat Devamanyu Hazarika Navonil Majumdar Soujanya Poria Roger Zimmermann Amir Zadeh 23 6 0 19 Oct 2020
Video captioning with stacked attention and semantic hard pull Md. Mushfiqur Rahman Thasinul Abedin Khondokar S. S. Prottoy Ayana Moshruba Fazlul Hasan Siddiqui 19 2 0 15 Sep 2020
ECO: Efficient Convolutional Network for Online Video Understanding Mohammadreza Zolfaghari Kamaljeet Singh Thomas Brox 130 496 0 24 Apr 2018
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Y. Gal Zoubin Ghahramani UQCV BDL 285 9,136 0 06 Jun 2015