Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.12867
Cited By
Accurate and Fast Compressed Video Captioning
22 September 2023
Yaojie Shen
Xin Gu
Kai Xu
Hengrui Fan
Longyin Wen
Libo Zhang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Accurate and Fast Compressed Video Captioning"
21 / 21 papers shown
Title
Generative Modeling of Class Probability for Multi-Modal Representation Learning
Jungkyoo Shin
Bumsoo Kim
Eunwoo Kim
88
1
0
21 Mar 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
118
1
0
31 Dec 2024
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation
Trong-Thuan Nguyen
Pha Nguyen
J. Cothren
Alper Yilmaz
Khoa Luu
138
1
0
27 Nov 2024
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
68
168
0
20 Jan 2022
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Faisal Ahmed
Zhe Gan
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
78
243
0
25 Nov 2021
Hierarchical Modular Network for Video Captioning
Hanhua Ye
Guorong Li
Yuankai Qi
Shuhui Wang
Qingming Huang
Ming-Hsuan Yang
65
68
0
24 Nov 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
223
1,428
0
03 Nov 2021
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng Zhang
Stephen Lin
Han Hu
ViT
96
1,481
0
24 Jun 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
136
1,173
0
01 Apr 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
215
2,149
0
29 Mar 2021
Open-book Video Captioning with Retrieve-Copy-Generate Network
Ziqi Zhang
Zhongang Qi
Chun Yuan
Ying Shan
Bing Li
Ying Deng
Weiming Hu
52
94
0
09 Mar 2021
Semantic Grouping Network for Video Captioning
Hobin Ryu
Sunghun Kang
Haeyong Kang
Chang D. Yoo
73
137
0
01 Feb 2021
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
62
102
0
28 Jul 2020
The AVA-Kinetics Localized Human Actions Video Dataset
Ang Li
Meghana Thotakuri
David A. Ross
João Carreira
Alexander Vostrikov
Andrew Zisserman
VGen
59
137
0
01 May 2020
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Boxiao Pan
Haoye Cai
De-An Huang
Kuan-Hui Lee
Adrien Gaidon
Ehsan Adeli
Juan Carlos Niebles
56
236
0
31 Mar 2020
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
78
272
0
26 Feb 2020
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
108
1,200
0
07 Jun 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
93
549
0
06 Apr 2019
Compressed Video Action Recognition
Chao-Yuan Wu
Manzil Zaheer
Hexiang Hu
R. Manmatha
Alex Smola
Philipp Krahenbuhl
135
325
0
02 Dec 2017
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
Kensho Hara
Hirokatsu Kataoka
Y. Satoh
3DPC
123
1,934
0
27 Nov 2017
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
286
4,484
0
20 Nov 2014
1