ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.12972
  4. Cited By
VLCap: Vision-Language with Contrastive Learning for Coherent Video
  Paragraph Captioning
v1v2 (latest)

VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning

26 June 2022
Kashu Yamazaki
Sang Truong
Khoa T. Vo
Michael Kidd
Chase Rainwater
Khoa Luu
Ngan Le
    VLMCoGe
ArXiv (abs)PDFHTML

Papers citing "VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning"

25 / 25 papers shown
Title
S3Former: Self-supervised High-resolution Transformer for Solar PV Profiling
S3Former: Self-supervised High-resolution Transformer for Solar PV Profiling
Minh-Triet Tran
Adrian de Luis
Haitao Liao
Ying Huang
Roy McCann
Alan Mantooth
Jack Cothren
Ngan Le
216
0
0
07 May 2024
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal
  Action Proposals Generation
AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation
Khoa T. Vo
Kevin Hyekang Joo
Kashu Yamazaki
Sang Truong
Kris Kitani
Minh-Triet Tran
Ngan Le
EgoV
101
18
0
21 Oct 2021
End-to-End Dense Video Captioning with Parallel Decoding
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
82
184
0
17 Aug 2021
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
Or Patashnik
Zongze Wu
Eli Shechtman
Daniel Cohen-Or
Dani Lischinski
CLIPVLM
118
1,207
0
31 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
934
29,436
0
26 Feb 2021
COOT: Cooperative Hierarchical Transformer for Video-Text Representation
  Learning
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViTCLIP
73
172
0
01 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
657
41,103
0
22 Oct 2020
MART: Memory-Augmented Recurrent Transformer for Coherent Video
  Paragraph Captioning
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Jie Lei
Liwei Wang
Yelong Shen
Dong Yu
Tamara L. Berg
Joey Tianyi Zhou
58
190
0
11 May 2020
Contrastive Multiview Coding
Contrastive Multiview Coding
Yonglong Tian
Dilip Krishnan
Phillip Isola
SSL
169
2,403
0
13 Jun 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
250
3,730
0
09 Jan 2019
Grounded Video Description
Grounded Video Description
Luowei Zhou
Yannis Kalantidis
Xinlei Chen
Jason J. Corso
Marcus Rohrbach
83
193
0
17 Dec 2018
Adversarial Inference for Multi-Sentence Video Description
Adversarial Inference for Multi-Sentence Video Description
J. S. Park
Marcus Rohrbach
Trevor Darrell
Anna Rohrbach
53
80
0
13 Dec 2018
SlowFast Networks for Video Recognition
SlowFast Networks for Video Recognition
Christoph Feichtenhofer
Haoqi Fan
Jitendra Malik
Kaiming He
166
3,274
0
10 Dec 2018
Learning deep representations by mutual information estimation and
  maximization
Learning deep representations by mutual information estimation and maximization
R. Devon Hjelm
A. Fedorov
Samuel Lavoie-Marchildon
Karan Grewal
Phil Bachman
Adam Trischler
Yoshua Bengio
SSLDRL
330
2,662
0
20 Aug 2018
Move Forward and Tell: A Progressive Generator of Video Descriptions
Move Forward and Tell: A Progressive Generator of Video Descriptions
Yilei Xiong
Bo Dai
Dahua Lin
63
102
0
26 Jul 2018
End-to-End Dense Video Captioning with Masked Transformer
End-to-End Dense Video Captioning with Masked Transformer
Luowei Zhou
Yingbo Zhou
Jason J. Corso
R. Socher
Caiming Xiong
92
529
0
03 Apr 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
713
131,652
0
12 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
235
8,019
0
22 May 2017
Dense-Captioning Events in Videos
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
139
1,248
0
02 May 2017
Speaking the Same Language: Matching Machine to Human Captions by
  Adversarial Training
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training
Rakshith Shetty
Marcus Rohrbach
Lisa Anne Hendricks
Mario Fritz
Bernt Schiele
54
144
0
30 Mar 2017
Towards Automatic Learning of Procedures from Web Instructional Videos
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou
Chenliang Xu
Jason J. Corso
EgoV
75
827
0
28 Mar 2017
Convolutional Two-Stream Network Fusion for Video Action Recognition
Convolutional Two-Stream Network Fusion for Video Action Recognition
Christoph Feichtenhofer
A. Pinz
Andrew Zisserman
163
2,611
0
22 Apr 2016
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
295
4,488
0
20 Nov 2014
Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan
Andrew Zisserman
247
7,535
0
09 Jun 2014
On the difficulty of training Recurrent Neural Networks
On the difficulty of training Recurrent Neural Networks
Razvan Pascanu
Tomas Mikolov
Yoshua Bengio
ODL
196
5,353
0
21 Nov 2012
1