ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.00819
  4. Cited By
End-to-End Dense Video Captioning with Masked Transformer

End-to-End Dense Video Captioning with Masked Transformer

3 April 2018
Luowei Zhou
Yingbo Zhou
Jason J. Corso
R. Socher
Caiming Xiong
ArXivPDFHTML

Papers citing "End-to-End Dense Video Captioning with Masked Transformer"

14 / 114 papers shown
Title
Multi-modal Dense Video Captioning
Multi-modal Dense Video Captioning
Vladimir E. Iashin
Esa Rahtu
22
164
0
17 Mar 2020
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video
  Captioning
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
Zhiyuan Fang
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
23
60
0
11 Mar 2020
Compressive Transformers for Long-Range Sequence Modelling
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
13
621
0
13 Nov 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event
  Captioning
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
30
78
0
22 Sep 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
25
132
0
22 Jul 2019
Learning Video Representations using Contrastive Bidirectional
  Transformer
Learning Video Representations using Contrastive Bidirectional Transformer
Chen Sun
Fabien Baradel
Kevin Patrick Murphy
Cordelia Schmid
SSL
ViT
27
133
0
13 Jun 2019
Reconstruct and Represent Video Contents for Captioning via
  Reinforcement Learning
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning
Wei Zhang
Bairui Wang
Lin Ma
Wei Liu
20
67
0
03 Jun 2019
COIN: A Large-scale Dataset for Comprehensive Instructional Video
  Analysis
COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
Yansong Tang
Dajun Ding
Yongming Rao
Yu Zheng
Danyang Zhang
Lili Zhao
Jiwen Lu
Jie Zhou
18
305
0
07 Mar 2019
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
Chih-Yao Ma
Jiasen Lu
Zuxuan Wu
G. Al-Regib
Z. Kira
R. Socher
Caiming Xiong
LM&Ro
6
274
0
10 Jan 2019
Grounded Video Description
Grounded Video Description
Luowei Zhou
Yannis Kalantidis
Xinlei Chen
Jason J. Corso
Marcus Rohrbach
29
191
0
17 Dec 2018
Weakly Supervised Dense Event Captioning in Videos
Weakly Supervised Dense Event Captioning in Videos
Xuguang Duan
Wen-bing Huang
Chuang Gan
Jingdong Wang
Wenwu Zhu
Junzhou Huang
33
148
0
10 Dec 2018
Zero-Shot Anticipation for Instructional Activities
Zero-Shot Anticipation for Instructional Activities
Fadime Sener
Angela Yao
LM&Ro
28
68
0
06 Dec 2018
Holistic Multi-modal Memory Network for Movie Question Answering
Holistic Multi-modal Memory Network for Movie Question Answering
Anran Wang
Anh Tuan Luu
Chuan-Sheng Foo
Erik Cambria
Yi Tay
V. Chandrasekhar
36
20
0
12 Nov 2018
Previous
123