Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.11135
Cited By
Video Captioning via Hierarchical Reinforcement Learning
29 November 2017
Xin Eric Wang
Wenhu Chen
Jiawei Wu
Yuan-fang Wang
William Yang Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Video Captioning via Hierarchical Reinforcement Learning"
37 / 37 papers shown
Title
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Sicheng Yu
Chengkai Jin
Huanyu Wang
Zhenghao Chen
Sheng Jin
...
Zhenbang Sun
Bingni Zhang
Jiawei Wu
Hao Zhang
Qianru Sun
77
5
0
04 Oct 2024
Question-Answering Dense Video Events
Hangyu Qin
Junbin Xiao
Angela Yao
VLM
80
1
0
06 Sep 2024
Multi-modal Instance Refinement for Cross-domain Action Recognition
Yuan Qing
Naixing Wu
Shaohua Wan
Lixin Duan
14
0
0
24 Nov 2023
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics
P. Sermanet
Tianli Ding
Jeffrey Zhao
Fei Xia
Debidatta Dwibedi
...
Pannag R. Sanketi
Karol Hausman
Izhak Shafran
Brian Ichter
Yuan Cao
LM&Ro
33
50
0
01 Nov 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
39
221
0
27 Feb 2023
Planning Irregular Object Packing via Hierarchical Reinforcement Learning
Sichao Huang
Ziwei Wang
Jie Zhou
Jiwen Lu
18
19
0
17 Nov 2022
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le
Yue Wang
Akhilesh Deepak Gotmare
Silvio Savarese
Guosheng Lin
SyDa
ALM
135
240
0
05 Jul 2022
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Kai Zheng
Xiaotong Chen
Odest Chadwicke Jenkins
Qing Guo
LM&Ro
CoGe
24
54
0
17 Jun 2022
GL-RG: Global-Local Representation Granularity for Video Captioning
Liqi Yan
Qifan Wang
Yiming Cui
Fuli Feng
Xiaojun Quan
Xinming Zhang
Dongfang Liu
31
59
0
22 May 2022
An Integrated Approach for Video Captioning and Applications
Soheyla Amirian
T. Taha
Khaled Rasheed
H. Arabnia
31
1
0
23 Jan 2022
Relational Graph Learning for Grounded Video Description Generation
Wenqiao Zhang
Qing Guo
Siliang Tang
Haizhou Shi
Haochen Shi
Jun Xiao
Yueting Zhuang
Wei Wang
27
33
0
02 Dec 2021
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning
Xu Yan
Zhengcong Fei
Shuhui Wang
Qingming Huang
Qi Tian
VGen
40
4
0
19 Nov 2021
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning
Jing Bi
Jiebo Luo
Chenliang Xu
76
48
0
05 Oct 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song
Shizhe Chen
Qin Jin
21
37
0
30 May 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
18
55
0
24 May 2021
A Survey on Natural Language Video Localization
Xinfang Liu
Xiushan Nie
Zhifang Tan
Jie Guo
Yilong Yin
28
7
0
01 Apr 2021
Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation
Julia Ive
A. Li
Yishu Miao
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
29
10
0
22 Feb 2021
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DV
VLM
19
5
0
30 Nov 2020
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks
Humam Alwassel
Silvio Giancola
Guohao Li
33
123
0
23 Nov 2020
DORB: Dynamically Optimizing Multiple Rewards with Bandits
Ramakanth Pasunuru
Han Guo
Joey Tianyi Zhou
OffRL
32
6
0
15 Nov 2020
In-Home Daily-Life Captioning Using Radio Signals
Lijie Fan
Tianhong Li
Yuan. Yuan
Dina Katabi
40
47
0
25 Aug 2020
Learning from Few Samples: A Survey
Nihar Bendre
Hugo Terashima-Marín
Peyman Najafirad
VLM
BDL
26
54
0
30 Jul 2020
Multi-modal Dense Video Captioning
Vladimir E. Iashin
Esa Rahtu
22
164
0
17 Mar 2020
Spatio-Temporal Ranked-Attention Networks for Video Captioning
A. Cherian
Jue Wang
Chiori Hori
Tim K. Marks
AI4TS
22
19
0
17 Jan 2020
Delving Deeper into the Decoder for Video Captioning
Haoran Chen
Jianmin Li
Xiaolin Hu
43
34
0
16 Jan 2020
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
25
132
0
22 Jul 2019
Domain Adversarial Reinforcement Learning for Partial Domain Adaptation
Jin Chen
Xinxiao Wu
Lixin Duan
Shenghua Gao
OOD
38
43
0
10 May 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
32
540
0
06 Apr 2019
End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis
Lin Xu
Qixian Zhou
Ke Gong
Xiaodan Liang
Jianheng Tang
Liang Lin
MedIm
22
166
0
30 Jan 2019
Anomaly Detection in Road Traffic Using Visual Surveillance: A Survey
Santhosh Kelathodi Kumaran
D. P. Dogra
P. Roy
11
134
0
24 Jan 2019
Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos
Dongliang He
Xiang Zhao
Jizhou Huang
Fu Li
Xiao-Chang Liu
Shilei Wen
22
152
0
21 Jan 2019
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
36
20
0
07 Dec 2018
Mining for meaning: from vision to language through multiple networks consensus
Iulia Duta
Andrei Liviu Nicolicioiu
Simion-Vlad Bogolin
Marius Leordeanu
18
3
0
05 Jun 2018
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling
Xin Eric Wang
Wenhu Chen
Yuan-fang Wang
William Yang Wang
16
157
0
24 Apr 2018
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning
Qing Guo
Yuan-fang Wang
William Yang Wang
13
76
0
15 Apr 2018
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation
Xin Eric Wang
Wenhan Xiong
Hongmin Wang
William Yang Wang
33
198
0
21 Mar 2018
Discriminability objective for training descriptive captions
Ruotian Luo
Brian L. Price
Scott D. Cohen
Gregory Shakhnarovich
30
202
0
12 Mar 2018
1