Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.00115
Cited By
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval
30 November 2023
M. Gwilliam
Michael Cogswell
Meng Ye
Karan Sikka
Abhinav Shrivastava
Ajay Divakaran
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval"
4 / 4 papers shown
Title
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
Dahun Kim
A. Piergiovanni
Ganesh Mallya
A. Angelova
CoGe
44
0
0
04 Apr 2025
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
346
3,726
0
11 Feb 2021
CTRLsum: Towards Generic Controllable Text Summarization
Junxian He
Wojciech Kry'sciñski
Bryan McCann
Nazneen Rajani
Caiming Xiong
216
138
0
08 Dec 2020
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
433
596
0
21 Jul 2020
1