Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.00754
Cited By
Dense-Captioning Events in Videos
2 May 2017
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dense-Captioning Events in Videos"
50 / 280 papers shown
Title
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
329
782
0
18 Apr 2021
Automatic Generation of Descriptive Titles for Video Clips Using Deep Learning
Soheyla Amirian
Khaled Rasheed
T. Taha
H. Arabnia
VLM
VGen
19
23
0
07 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
57
1,134
0
01 Apr 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
24
128
0
19 Mar 2021
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Qi Feng
Yunchao Wei
Mingming Cheng
Yi Yang
27
5
0
18 Mar 2021
On Semantic Similarity in Video Retrieval
Michael Wray
Hazel Doughty
Dima Damen
33
66
0
18 Mar 2021
Natural Language Video Localization: A Revisit in Span-based Question Answering Framework
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
113
84
0
26 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
Jie Lei
Linjie Li
Luowei Zhou
Zhe Gan
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
CLIP
46
648
0
11 Feb 2021
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
Ruilong Li
Sha Yang
David A. Ross
Angjoo Kanazawa
ViT
219
483
0
21 Jan 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
233
2,434
0
04 Jan 2021
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
31
8
0
13 Dec 2020
Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language
Songyang Zhang
Houwen Peng
Jianlong Fu
Yijuan Lu
Jiebo Luo
27
51
0
04 Dec 2020
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DV
VLM
19
5
0
30 Nov 2020
Video Self-Stitching Graph Network for Temporal Action Localization
Chen Zhao
Ali K. Thabet
Guohao Li
26
138
0
30 Nov 2020
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Guohao Li
33
123
0
23 Nov 2020
VLG-Net: Video-Language Graph Matching Network for Video Grounding
Mattia Soldan
Mengmeng Xu
Sisi Qu
Jesper N. Tegnér
Guohao Li
35
69
0
19 Nov 2020
Multimodal Pretraining for Dense Video Captioning
Gabriel Huang
Bo Pang
Zhenhai Zhu
Clara E. Rivera
Radu Soricut
21
81
0
10 Nov 2020
Human-centric Spatio-Temporal Video Grounding With Visual Transformers
Zongheng Tang
Yue Liao
Si Liu
Guanbin Li
Xiaojie Jin
Hongxu Jiang
Qian Yu
Dong Xu
21
94
0
10 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
31
169
0
01 Nov 2020
Improved Actor Relation Graph based Group Activity Recognition
Zijian Kuang
Xinran Tie
23
5
0
24 Oct 2020
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog
Wubo Li
Dongwei Jiang
Wei Zou
Xiangang Li
23
6
0
21 Oct 2020
DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Basura Fernando
Hongdong Li
Stephen Gould
142
11
0
13 Oct 2020
Making Mobile Augmented Reality Applications Accessible
Jaylin Herskovitz
Jason Wu
Samuel White
Amy Pavel
G. Reyes
Anhong Guo
Jeffrey P. Bigham
14
37
0
12 Oct 2020
Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos
Jie Wu
Guanbin Li
Xiaoguang Han
Liang Lin
OffRL
AI4TS
27
56
0
18 Sep 2020
Uncovering Hidden Challenges in Query-Based Video Moment Retrieval
Mayu Otani
Yuta Nakashima
Esa Rahtu
J. Heikkilä
21
74
0
01 Sep 2020
Poet: Product-oriented Video Captioner for E-commerce
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Jie Liu
Jingren Zhou
Hongxia Yang
Fei Wu
14
34
0
16 Aug 2020
Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization
Daizong Liu
Xiaoye Qu
Xiao-Yang Liu
Jianfeng Dong
Pan Zhou
Zichuan Xu
33
129
0
04 Aug 2020
The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020)
Samuel Albanie
Yang Liu
Arsha Nagrani
Antoine Miech
Ernesto Coto
...
Kaixu Cui
Hui Liu
Chen Wang
Yudong Jiang
Xiaoshuai Hao
34
9
0
03 Aug 2020
Adversarial Bipartite Graph Learning for Video Domain Adaptation
Yadan Luo
Zi Huang
Zijian Wang
Zheng-Wei Zhang
Mahsa Baktashmotlagh
24
51
0
31 Jul 2020
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
25
101
0
28 Jul 2020
Active Learning for Video Description With Cluster-Regularized Ensemble Ranking
David M. Chan
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
VLM
14
6
0
27 Jul 2020
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
430
596
0
21 Jul 2020
Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions
Noa Garcia
Yuta Nakashima
26
32
0
17 Jul 2020
Rescaling Egocentric Vision
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
19
437
0
23 Jun 2020
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Jie Lei
Liwei Wang
Yelong Shen
Dong Yu
Tamara L. Berg
Joey Tianyi Zhou
27
186
0
11 May 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings
Max Bain
Arsha Nagrani
A. Brown
Andrew Zisserman
39
100
0
08 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
57
494
0
01 May 2020
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
32
313
0
29 Apr 2020
Local-Global Video-Text Interactions for Temporal Grounding
Jonghwan Mun
Minsu Cho
Bohyung Han
36
267
0
16 Apr 2020
Dense Regression Network for Video Grounding
Runhao Zeng
Haoming Xu
Wenbing Huang
Peihao Chen
Mingkui Tan
Chuang Gan
22
284
0
07 Apr 2020
Multi-modal Dense Video Captioning
Vladimir E. Iashin
Esa Rahtu
22
165
0
17 Mar 2020
Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos
Yijun Song
Jingwen Wang
Lin Ma
Zhou Yu
Jun Yu
21
61
0
16 Mar 2020
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
Zhiyuan Fang
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
23
60
0
11 Mar 2020
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
119
277
0
24 Jan 2020
Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video
Jie Wu
Guanbin Li
Si Liu
Liang Lin
OffRL
23
104
0
18 Jan 2020
Spatio-Temporal Ranked-Attention Networks for Video Captioning
A. Cherian
Jue Wang
Chiori Hori
Tim K. Marks
AI4TS
22
19
0
17 Jan 2020
Action Genome: Actions as Composition of Spatio-temporal Scene Graphs
Jingwei Ji
Ranjay Krishna
Li Fei-Fei
Juan Carlos Niebles
39
336
0
15 Dec 2019
Action Modifiers: Learning from Adverbs in Instructional Videos
Hazel Doughty
Ivan Laptev
W. Mayol-Cuevas
Dima Damen
27
30
0
13 Dec 2019
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos
Yitian Yuan
Lin Ma
Jingwen Wang
Wei Liu
Wenwu Zhu
30
242
0
31 Oct 2019
Cross-Lingual Vision-Language Navigation
An Yan
Junfeng Fang
Jiangtao Feng
Lei Li
William Yang Wang
LM&Ro
32
16
0
24 Oct 2019
Previous
1
2
3
4
5
6
Next