Move Forward and Tell: A Progressive Generator of Video Descriptions

26 July 2018

Papers citing "Move Forward and Tell: A Progressive Generator of Video Descriptions"

14 / 14 papers shown

Title
Text with Knowledge Graph Augmented Transformer for Video Captioning Xin Gu G. Chen Yufei Wang Libo Zhang Tiejian Luo Longyin Wen 27 47 0 22 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning Antoine Yang Arsha Nagrani Paul Hongsuck Seo Antoine Miech Jordi Pont-Tuset Ivan Laptev Josef Sivic Cordelia Schmid AI4TS VLM 39 221 0 27 Feb 2023
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning Kashu Yamazaki Khoa T. Vo Sang Truong Bhiksha Raj Ngan Le 29 35 0 28 Nov 2022
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning Kashu Yamazaki Sang Truong Khoa T. Vo Michael Kidd Chase Rainwater Khoa Luu Ngan Le VLM CoGe 11 25 0 26 Jun 2022
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning Xu Yan Zhengcong Fei Shuhui Wang Qingming Huang Qi Tian VGen 40 4 0 19 Nov 2021
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations Mohammadreza Zolfaghari Yi Zhu Peter V. Gehler Thomas Brox 135 127 0 30 Sep 2021
End-to-End Dense Video Captioning with Parallel Decoding Teng Wang Ruimao Zhang Zhichao Lu Feng Zheng Ran Cheng Ping Luo 3DV 47 179 0 17 Aug 2021
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers Chiori Hori Takaaki Hori Jonathan Le Roux 25 4 0 04 Aug 2021
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks Humam Alwassel Silvio Giancola Guohao Li 33 123 0 23 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Simon Ging Mohammadreza Zolfaghari Hamed Pirsiavash Thomas Brox ViT CLIP 20 168 0 01 Nov 2020
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning Jie Lei Liwei Wang Yelong Shen Dong Yu Tamara L. Berg Joey Tianyi Zhou 21 186 0 11 May 2020
Multi-modal Dense Video Captioning Vladimir E. Iashin Esa Rahtu 22 164 0 17 Mar 2020
Recursive Visual Sound Separation Using Minus-Plus Net Xudong Xu Bo Dai Dahua Lin 35 91 0 30 Aug 2019
Grounded Video Description Luowei Zhou Yannis Kalantidis Xinlei Chen Jason J. Corso Marcus Rohrbach 27 190 0 17 Dec 2018