Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.11438
Cited By
Reconstruction Network for Video Captioning
30 March 2018
Bairui Wang
Lin Ma
Wei Zhang
Wen Liu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Reconstruction Network for Video Captioning"
50 / 87 papers shown
Title
PR-DETR: Injecting Position and Relation Prior for Dense Video Captioning
Yizhe Li
Sanping Zhou
Zheng Qin
Le Wang
ViT
22
0
0
19 Jun 2025
F
3
^3
3
Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
Zhaoyu Liu
Kan Jiang
Murong Ma
Zhe Hou
Yun Lin
Jin Song Dong
79
0
0
11 Apr 2025
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities
Ehsan Faghihi
Mohammedreza Zarenejad
Ali-Asghar Beheshti Shirazi
72
1
0
04 Nov 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
142
9
0
30 Sep 2024
VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It
Xiaoxuan Zhu
Zhouhong Gu
Sihang Jiang
Zhixu Li
Hongwei Feng
Yanghua Xiao
94
0
0
15 Jun 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViT
VGen
69
4
0
19 Apr 2024
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video
Keito Kudo
Haruki Nagasawa
Jun Suzuki
Nobuyuki Shimizu
75
2
0
04 Dec 2023
Multi Sentence Description of Complex Manipulation Action Videos
Fatemeh Ziaeetabar
Reza Safabakhsh
S. Momtazi
M. Tamosiunaite
Florentin Wörgötter
79
3
0
13 Nov 2023
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
126
7
0
16 Oct 2023
A Hierarchical Graph-based Approach for Recognition and Description Generation of Bimanual Actions in Videos
Fatemeh Ziaeetabar
Reza Safabakhsh
S. Momtazi
M. Tamosiunaite
Florentin Wörgötter
60
2
0
01 Oct 2023
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges
Tongtong Yuan
Xuange Zhang
Kun Liu
Bo Liu
Chen Chen
Jian Jin
Zhenzhen Jiao
AI4TS
107
19
0
25 Sep 2023
Guided Slot Attention for Unsupervised Video Object Segmentation
Minhyeok Lee
Suhwan Cho
Dogyoon Lee
Chaewon Park
Jungho Lee
Sangyoun Lee
VOS
148
11
0
15 Mar 2023
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Joey Tianyi Zhou
Gedas Bertasius
VLM
125
81
0
09 Dec 2022
Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
Xian Zhong
Zipeng Li
Shuqin Chen
Kui Jiang
Chen Chen
Mang Ye
DiffM
VGen
107
43
0
28 Nov 2022
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
3DGS
117
15
0
21 Nov 2022
Visual Commonsense-aware Representation Network for Video Captioning
Pengpeng Zeng
Haonan Zhang
Lianli Gao
Xiangpeng Li
Jin Qian
Hengtao Shen
85
18
0
17 Nov 2022
End-to-End Multimodal Representation Learning for Video Dialog
Huda AlAmri
Anthony Bilic
Michael Hu
Apoorva Beedu
Irfan Essa
90
7
0
26 Oct 2022
Thinking Hallucination for Video Captioning
Nasib Ullah
Partha Pratim Mohanta
VLM
84
6
0
28 Sep 2022
Multi-modal Video Chapter Generation
Xiao Cao
Zitan Chen
Canyu Le
Lei Meng
VGen
70
3
0
26 Sep 2022
CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
Haoran Wang
Dongliang He
Wenhao Wu
Boyang Xia
Min Yang
Fu Li
YunLong Yu
Zhong Ji
Errui Ding
Jingdong Wang
64
23
0
21 Aug 2022
Automatic Concept Extraction for Concept Bottleneck-based Video Classification
J. Jeyakumar
Luke Dickens
L. Garcia
Yu Cheng
Diego Ramirez Echavarria
Joseph Noor
Alessandra Russo
Lance M. Kaplan
Erik P. Blasch
Mani B. Srivastava
79
9
0
21 Jun 2022
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Kai Zheng
Xiaotong Chen
Odest Chadwicke Jenkins
Xinze Wang
LM&Ro
CoGe
95
63
0
17 Jun 2022
Egocentric Video-Language Pretraining
Kevin Qinghong Lin
Alex Jinpeng Wang
Mattia Soldan
Michael Wray
Rui Yan
...
Hongfa Wang
Dima Damen
Guohao Li
Wei Liu
Mike Zheng Shou
VLM
EgoV
104
207
0
03 Jun 2022
GL-RG: Global-Local Representation Granularity for Video Captioning
Liqi Yan
Qifan Wang
Yiming Cui
Fuli Feng
Xiaojun Quan
Xinming Zhang
Dongfang Liu
121
59
0
22 May 2022
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Chengpeng Dai
Fuhai Chen
Xiaoshuai Sun
Rongrong Ji
QiXiang Ye
Yongjian Wu
86
1
0
13 Mar 2022
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
186
228
0
18 Feb 2022
An Integrated Approach for Video Captioning and Applications
Soheyla Amirian
T. Taha
Khaled Rasheed
H. Arabnia
64
1
0
23 Jan 2022
Bridging the Gap: Using Deep Acoustic Representations to Learn Grounded Language from Percepts and Raw Speech
Gaoussou Youssouf Kebe
Luke E. Richards
Edward Raff
Francis Ferraro
Cynthia Matuszek
SSL
92
5
0
27 Dec 2021
Syntax Customized Video Captioning by Imitating Exemplar Sentences
Yitian Yuan
Lin Ma
Wenwu Zhu
65
7
0
02 Dec 2021
Hierarchical Modular Network for Video Captioning
Hanhua Ye
Guorong Li
Yuankai Qi
Shuhui Wang
Qingming Huang
Ming-Hsuan Yang
129
70
0
24 Nov 2021
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
91
155
0
13 Oct 2021
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
101
188
0
17 Aug 2021
Cross-Modal Graph with Meta Concepts for Video Captioning
Hao Wang
Guosheng Lin
Guosheng Lin
Chunyan Miao
67
7
0
14 Aug 2021
Discriminative Latent Semantic Graph for Video Captioning
Yang Bai
Junyan Wang
Yang Long
Bingzhang Hu
Yang Song
Maurice Pagnucco
Yu Guan
90
31
0
08 Aug 2021
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning
Fenglin Liu
Xuancheng Ren
Xian Wu
Bang-ju Yang
Shen Ge
Yuexian Zou
Xu Sun
83
32
0
05 Aug 2021
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization
Rui Qian
Yuxi Li
Huabin Liu
John See
Shuangrui Ding
Xian Liu
Dian Li
Weiyao Lin
82
42
0
04 Aug 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song
Shizhe Chen
Qin Jin
66
38
0
30 May 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
245
59
0
24 May 2021
A Survey on Natural Language Video Localization
Xinfang Liu
Xiushan Nie
Zhifang Tan
Jie Guo
Yilong Yin
121
7
0
01 Apr 2021
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
71
17
0
27 Mar 2021
Exploration of Visual Features and their weighted-additive fusion for Video Captioning
V. PraveenS.
Akhilesh Bharadwaj
Harsh Raj
Janhavi Dadhania
Ganesh Samarth C.A
Nikhil Pareek
S. M. I. S. R. Mahadeva Prasanna
50
1
0
14 Jan 2021
Guidance Module Network for Video Captioning
Xiao Zhang
Chunsheng Liu
F. Chang
43
4
0
20 Dec 2020
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DV
VLM
47
5
0
30 Nov 2020
Data-efficient Alignment of Multimodal Sequences by Aligning Gradient Updates and Internal Feature Distributions
Jianan Wang
Boyang Albert Li
Xiangyu Fan
Jing-Hua Lin
Yanwei Fu
54
2
0
15 Nov 2020
Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning
Xing Yan
Weizhong Zhang
Lin Ma
Wen Liu
Qi Wu
AI4TS
38
23
0
16 Oct 2020
Video captioning with stacked attention and semantic hard pull
Md. Mushfiqur Rahman
Thasinul Abedin
Khondokar S. S. Prottoy
Ayana Moshruba
Fazlul Hasan Siddiqui
67
2
0
15 Sep 2020
Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
82
13
0
18 Aug 2020
Poet: Product-oriented Video Captioner for E-commerce
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Jie Liu
Jingren Zhou
Hongxia Yang
Leilei Gan
73
36
0
16 Aug 2020
Self-supervised Video Representation Learning by Pace Prediction
Jiangliu Wang
Jianbo Jiao
Yunhui Liu
SSL
AI4TS
84
237
0
13 Aug 2020
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
99
102
0
28 Jul 2020
1
2
Next