v1v2 (latest)

VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning

26 June 2022

Ngan Le

Papers citing "VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning"

25 / 25 papers shown

Title
S3Former: Self-supervised High-resolution Transformer for Solar PV Profiling Minh-Triet Tran Adrian de Luis Haitao Liao Ying Huang Roy McCann Alan Mantooth Jack Cothren Ngan Le 216 0 0 07 May 2024
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation Khoa T. Vo Kevin Hyekang Joo Kashu Yamazaki Sang Truong Kris Kitani Minh-Triet Tran Ngan Le EgoV 101 18 0 21 Oct 2021
End-to-End Dense Video Captioning with Parallel Decoding Teng Wang Ruimao Zhang Zhichao Lu Feng Zheng Ran Cheng Ping Luo 3DV 82 184 0 17 Aug 2021
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery Or Patashnik Zongze Wu Eli Shechtman Daniel Cohen-Or Dani Lischinski CLIP VLM 115 1,207 0 31 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 931 29,436 0 26 Feb 2021
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Simon Ging Mohammadreza Zolfaghari Hamed Pirsiavash Thomas Brox ViT CLIP 73 172 0 01 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 657 41,103 0 22 Oct 2020
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning Jie Lei Liwei Wang Yelong Shen Dong Yu Tamara L. Berg Joey Tianyi Zhou 58 190 0 11 May 2020
Contrastive Multiview Coding Yonglong Tian Dilip Krishnan Phillip Isola SSL 169 2,403 0 13 Jun 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai Zhilin Yang Yiming Yang J. Carbonell Quoc V. Le Ruslan Salakhutdinov VLM 250 3,730 0 09 Jan 2019
Grounded Video Description Luowei Zhou Yannis Kalantidis Xinlei Chen Jason J. Corso Marcus Rohrbach 83 193 0 17 Dec 2018
Adversarial Inference for Multi-Sentence Video Description J. S. Park Marcus Rohrbach Trevor Darrell Anna Rohrbach 53 80 0 13 Dec 2018
SlowFast Networks for Video Recognition Christoph Feichtenhofer Haoqi Fan Jitendra Malik Kaiming He 166 3,274 0 10 Dec 2018
Learning deep representations by mutual information estimation and maximization R. Devon Hjelm A. Fedorov Samuel Lavoie-Marchildon Karan Grewal Phil Bachman Adam Trischler Yoshua Bengio SSL DRL 330 2,662 0 20 Aug 2018
Move Forward and Tell: A Progressive Generator of Video Descriptions Yilei Xiong Bo Dai Dahua Lin 63 102 0 26 Jul 2018
End-to-End Dense Video Captioning with Masked Transformer Luowei Zhou Yingbo Zhou Jason J. Corso R. Socher Caiming Xiong 92 529 0 03 Apr 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 713 131,652 0 12 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset João Carreira Andrew Zisserman 235 8,019 0 22 May 2017
Dense-Captioning Events in Videos Ranjay Krishna Kenji Hata F. Ren Li Fei-Fei Juan Carlos Niebles 139 1,248 0 02 May 2017
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training Rakshith Shetty Marcus Rohrbach Lisa Anne Hendricks Mario Fritz Bernt Schiele 54 144 0 30 Mar 2017
Towards Automatic Learning of Procedures from Web Instructional Videos Luowei Zhou Chenliang Xu Jason J. Corso EgoV 75 827 0 28 Mar 2017
Convolutional Two-Stream Network Fusion for Video Action Recognition Christoph Feichtenhofer A. Pinz Andrew Zisserman 163 2,611 0 22 Apr 2016
CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam C. L. Zitnick Devi Parikh 295 4,488 0 20 Nov 2014
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman 247 7,535 0 09 Jun 2014
On the difficulty of training Recurrent Neural Networks Razvan Pascanu Tomas Mikolov Yoshua Bengio ODL 196 5,353 0 21 Nov 2012