ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.08625
  4. Cited By
Unifying Event Detection and Captioning as Sequence Generation via
  Pre-Training

Unifying Event Detection and Captioning as Sequence Generation via Pre-Training

18 July 2022
Qi Zhang
Yuqing Song
Qin Jin
ArXiv (abs)PDFHTMLGithub (11★)

Papers citing "Unifying Event Detection and Captioning as Sequence Generation via Pre-Training"

16 / 16 papers shown
Title
I Can Tell What I am Doing: Toward Real-World Natural Language Grounding
  of Robot Experiences
I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences
Zihan Wang
Brian Liang
Varad Dhat
Zander Brumbaugh
Nick Walker
Ranjay Krishna
Maya Cakmak
102
5
0
20 Nov 2024
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Yuting Mei
Linli Yao
Qin Jin
65
1
0
24 Jun 2024
Do You Remember? Dense Video Captioning with Cross-Modal Memory
  Retrieval
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Minkuk Kim
Hyeon Bae Kim
Jinyoung Moon
Jinwoo Choi
Seong Tae Kim
71
25
0
11 Apr 2024
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo
  Boundary Enrichment and Online Refinement
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Hao Wu
Huabin Liu
Yu Qiao
Xiao Sun
3DV
34
11
0
03 Apr 2024
OmniVid: A Generative Framework for Universal Video Understanding
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLMVGen
119
16
0
26 Mar 2024
TimeChat: A Time-sensitive Multimodal Large Language Model for Long
  Video Understanding
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren
Linli Yao
Shicheng Li
Xu Sun
Lu Hou
VLMMLLM
101
200
0
04 Dec 2023
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation
  Protocols
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols
Iqra Qasim
Alexander Horsch
Dilip K. Prasad
96
9
0
05 Nov 2023
Towards Surveillance Video-and-Language Understanding: New Dataset,
  Baselines, and Challenges
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges
Tongtong Yuan
Xuange Zhang
Kun Liu
Bo Liu
Chen Chen
Jian Jin
Zhenzhen Jiao
AI4TS
105
19
0
25 Sep 2023
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention
  and Zoom-in Boundary Detection
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Qi Zhang
S. Zheng
Qin Jin
90
0
0
20 Jul 2023
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment
Yongrae Jo
Seongyun Lee
Aiden Seung Joon Lee
Hyunji Lee
Hanseok Oh
Minjoon Seo
58
2
0
05 Jul 2023
REFLECT: Summarizing Robot Experiences for Failure Explanation and
  Correction
REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction
Zeyi Liu
Arpit Bahety
Shuran Song
LRM
114
127
0
27 Jun 2023
A Review of Deep Learning for Video Captioning
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Min Zhang
Fatih Porikli
3DV
121
22
0
22 Apr 2023
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts
  Commentaries
SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries
Hassan Mkhallati
A. Cioppa
Silvio Giancola
Guohao Li
Marc Van Droogenbroeck
77
34
0
10 Apr 2023
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Implicit and Explicit Commonsense for Multi-sentence Video Captioning
Shih-Han Chou
James J. Little
Leonid Sigal
67
2
0
14 Mar 2023
Learning Grounded Vision-Language Representation for Versatile
  Understanding in Untrimmed Videos
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Teng Wang
Jinrui Zhang
Feng Zheng
Wenhao Jiang
Ran Cheng
Ping Luo
VLM
82
11
0
11 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TSVLM
173
241
0
27 Feb 2023
1