Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.05448
Cited By
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning
15 April 2018
Junfeng Fang
Yuan-fang Wang
William Yang Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning"
15 / 15 papers shown
Title
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
50
0
0
05 Dec 2022
End-to-end Generative Pretraining for Multimodal Video Captioning
Paul Hongsuck Seo
Arsha Nagrani
Anurag Arnab
Cordelia Schmid
27
164
0
20 Jan 2022
Contrastive Learning of Visual-Semantic Embeddings
Anurag Jain
Yashaswi Verma
SSL
33
1
0
17 Oct 2021
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Yapeng Tian
Di Hu
Chenliang Xu
ObjD
21
88
0
05 Apr 2021
The MSR-Video to Text Dataset with Clean Annotations
Haoran Chen
Jianmin Li
Simone Frintrop
Xiaolin Hu
24
18
0
12 Feb 2021
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian
Dingzeyu Li
Chenliang Xu
34
180
0
21 Jul 2020
Multi-modal Dense Video Captioning
Vladimir E. Iashin
Esa Rahtu
22
165
0
17 Mar 2020
Spatio-Temporal Ranked-Attention Networks for Video Captioning
A. Cherian
Jue Wang
Chiori Hori
Tim K. Marks
AI4TS
22
19
0
17 Jan 2020
Delving Deeper into the Decoder for Video Captioning
Haoran Chen
Jianmin Li
Xiaolin Hu
43
34
0
16 Jan 2020
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
30
78
0
22 Sep 2019
A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling
Haoran Chen
Ke Lin
A. Maye
Jianmin Li
Xiaoling Hu
25
47
0
31 Aug 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
32
539
0
06 Apr 2019
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
36
20
0
07 Dec 2018
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling
Xin Eric Wang
Wenhu Chen
Yuan-fang Wang
William Yang Wang
16
157
0
24 Apr 2018
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation
Xin Eric Wang
Wenhan Xiong
Hongmin Wang
William Yang Wang
33
198
0
21 Mar 2018
1