Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.08625
Cited By
Unifying Event Detection and Captioning as Sequence Generation via Pre-Training
18 July 2022
Qi Zhang
Yuqing Song
Qin Jin
Re-assign community
ArXiv (abs)
PDF
HTML
Github (11★)
Papers citing
"Unifying Event Detection and Captioning as Sequence Generation via Pre-Training"
16 / 16 papers shown
Title
I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences
Zihan Wang
Brian Liang
Varad Dhat
Zander Brumbaugh
Nick Walker
Ranjay Krishna
Maya Cakmak
102
5
0
20 Nov 2024
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Yuting Mei
Linli Yao
Qin Jin
65
1
0
24 Jun 2024
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Minkuk Kim
Hyeon Bae Kim
Jinyoung Moon
Jinwoo Choi
Seong Tae Kim
71
25
0
11 Apr 2024
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Hao Wu
Huabin Liu
Yu Qiao
Xiao Sun
3DV
34
11
0
03 Apr 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
119
16
0
26 Mar 2024
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren
Linli Yao
Shicheng Li
Xu Sun
Lu Hou
VLM
MLLM
101
200
0
04 Dec 2023
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols
Iqra Qasim
Alexander Horsch
Dilip K. Prasad
96
9
0
05 Nov 2023
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges
Tongtong Yuan
Xuange Zhang
Kun Liu
Bo Liu
Chen Chen
Jian Jin
Zhenzhen Jiao
AI4TS
105
19
0
25 Sep 2023
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Qi Zhang
S. Zheng
Qin Jin
90
0
0
20 Jul 2023
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment
Yongrae Jo
Seongyun Lee
Aiden Seung Joon Lee
Hyunji Lee
Hanseok Oh
Minjoon Seo
58
2
0
05 Jul 2023
REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction
Zeyi Liu
Arpit Bahety
Shuran Song
LRM
114
127
0
27 Jun 2023
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Min Zhang
Fatih Porikli
3DV
121
22
0
22 Apr 2023
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
Hassan Mkhallati
A. Cioppa
Silvio Giancola
Guohao Li
Marc Van Droogenbroeck
77
34
0
10 Apr 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Shih-Han Chou
James J. Little
Leonid Sigal
67
2
0
14 Mar 2023
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Teng Wang
Jinrui Zhang
Feng Zheng
Wenhao Jiang
Ran Cheng
Ping Luo
VLM
82
11
0
11 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
173
241
0
27 Feb 2023
1