Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00487
Cited By
Sequence to Sequence -- Video to Text
3 May 2015
Subhashini Venugopalan
Marcus Rohrbach
Jeff Donahue
Raymond J. Mooney
Trevor Darrell
Kate Saenko
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sequence to Sequence -- Video to Text"
50 / 170 papers shown
Title
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
26
168
0
01 Nov 2020
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
ELM
39
35
0
26 Oct 2020
Improved Actor Relation Graph based Group Activity Recognition
Zijian Kuang
Xinran Tie
23
5
0
24 Oct 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Chun-Fu Chen
Yikang Shen
K. Ramakrishnan
Rogerio Feris
J. M. Cohn
A. Oliva
Quanfu Fan
23
95
0
22 Oct 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
Qinxin Wang
Hao Tan
Sheng Shen
Michael W. Mahoney
Z. Yao
ObjD
47
11
0
12 Oct 2020
Boosting Continuous Sign Language Recognition via Cross Modality Augmentation
Junfu Pu
Wen-gang Zhou
Hezhen Hu
Houqiang Li
43
108
0
11 Oct 2020
In-Home Daily-Life Captioning Using Radio Signals
Lijie Fan
Tianhong Li
Yuan. Yuan
Dina Katabi
35
47
0
25 Aug 2020
Enriching Video Captions With Contextual Text
Philipp Rimle
Pelin Dogan
Markus Gross
30
3
0
29 Jul 2020
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
25
101
0
28 Jul 2020
Active Learning for Video Description With Cluster-Regularized Ensemble Ranking
David M. Chan
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
VLM
8
6
0
27 Jul 2020
Fully Convolutional Networks for Continuous Sign Language Recognition
Ka Leong Cheng
Zhaoyang Yang
Qifeng Chen
Yu-Wing Tai
SLR
44
143
0
24 Jul 2020
SBAT: Video Captioning with Sparse Boundary-Aware Transformer
Tao Jin
Siyu Huang
Ming Chen
Yingming Li
Zhongfei Zhang
32
52
0
23 Jul 2020
Learning to Discretely Compose Reasoning Module Networks for Video Captioning
Ganchao Tan
Daqing Liu
Meng Wang
Zhengjun Zha
LRM
25
73
0
17 Jul 2020
Visual Relation Grounding in Videos
Junbin Xiao
Xindi Shang
Xun Yang
Sheng Tang
Tat-Seng Chua
20
40
0
17 Jul 2020
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers
Shijie Geng
Peng Gao
Moitreya Chatterjee
Chiori Hori
Jonathan Le Roux
Yongfeng Zhang
Hongsheng Li
A. Cherian
27
11
0
08 Jul 2020
Comprehensive Information Integration Modeling Framework for Video Titling
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Tan Jiang
Jingren Zhou
Hongxia Yang
Fei Wu
31
40
0
24 Jun 2020
Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances
Sijie He
Xinyan Li
T. DelSole
Pradeep Ravikumar
A. Banerjee
AI4Cl
29
43
0
14 Jun 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
43
493
0
01 May 2020
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Boxiao Pan
Haoye Cai
De-An Huang
Kuan-Hui Lee
Adrien Gaidon
Ehsan Adeli
Juan Carlos Niebles
31
235
0
31 Mar 2020
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
135
189
0
19 Mar 2020
Multi-modal Dense Video Captioning
Vladimir E. Iashin
Esa Rahtu
22
164
0
17 Mar 2020
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
Zhiyuan Fang
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
23
60
0
11 Mar 2020
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
37
271
0
26 Feb 2020
Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition
Hao Zhou
Wen-gang Zhou
Yun Zhou
Houqiang Li
NoLa
32
195
0
08 Feb 2020
Spatio-Temporal Ranked-Attention Networks for Video Captioning
A. Cherian
Jue Wang
Chiori Hori
Tim K. Marks
AI4TS
22
19
0
17 Jan 2020
Actions as Moving Points
Yixuan Li
Zixu Wang
Limin Wang
Gangshan Wu
22
104
0
14 Jan 2020
Exploiting Event Cameras for Spatio-Temporal Prediction of Fast-Changing Trajectories
Marco Monforte
A. Arriandiaga
Arren J. Glover
Chiara Bartolozzi
18
10
0
05 Jan 2020
Personalizing Fast-Forward Videos Based on Visual and Textual Features from Social Network
W. Ramos
M. Silva
Edson Roteia Araujo Junior
Alan C. Neves
Erickson R. Nascimento
22
6
0
29 Dec 2019
A Better Way to Attend: Attention with Trees for Video Question Answering
Hongyang Xue
Wenqing Chu
Zhou Zhao
Deng Cai
25
33
0
05 Sep 2019
A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling
Haoran Chen
Ke Lin
A. Maye
Jianmin Li
Xiaoling Hu
25
47
0
31 Aug 2019
SF-Net: Structured Feature Network for Continuous Sign Language Recognition
Zhaoyang Yang
Zhenmei Shi
Xiaoyong Shen
Yu-Wing Tai
SLR
27
63
0
04 Aug 2019
Prediction and Description of Near-Future Activities in Video
T. Mahmud
Mohammad Billah
Mahmudul Hasan
A. Roy-Chowdhury
28
16
0
02 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
36
387
0
31 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
20
132
0
22 Jul 2019
Two-Stream Region Convolutional 3D Network for Temporal Activity Detection
Huijuan Xu
Abir Das
Kate Saenko
3DPC
16
46
0
05 Jun 2019
Relational Reasoning using Prior Knowledge for Visual Captioning
Jingyi Hou
Xinxiao Wu
Yayun Qi
Wentian Zhao
Jiebo Luo
Yunde Jia
17
14
0
04 Jun 2019
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning
Wei Zhang
Bairui Wang
Lin Ma
Wei Liu
20
67
0
03 Jun 2019
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
Jingwen Chen
Yingwei Pan
Yehao Li
Ting Yao
Hongyang Chao
Tao Mei
21
104
0
03 May 2019
Hierarchical Recurrent Neural Network for Video Summarization
Bin Zhao
Xuelong Li
Xiaoqiang Lu
23
174
0
28 Apr 2019
FishNet: A Camera Localizer using Deep Recurrent Networks
Hsin-I Chen
Sebastian Agethen
Chia-Min Wu
Winston H. Hsu
Bing-Yu Chen
11
0
0
22 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
32
540
0
06 Apr 2019
The Steep Road to Happily Ever After: An Analysis of Current Visual Storytelling Models
Yatri Modi
Natalie Parde
21
16
0
06 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
A. Roy-Chowdhury
30
193
0
05 Apr 2019
Spatiotemporal Pyramid Network for Video Action Recognition
Yunbo Wang
Mingsheng Long
Jianmin Wang
Philip S. Yu
29
227
0
04 Mar 2019
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning
Nayyer Aafaq
Naveed Akhtar
Wei Liu
Syed Zulqarnain Gilani
Ajmal Mian
31
204
0
27 Feb 2019
Weakly Supervised Dense Event Captioning in Videos
Xuguang Duan
Wen-bing Huang
Chuang Gan
Jingdong Wang
Wenwu Zhu
Junzhou Huang
33
148
0
10 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
36
20
0
07 Dec 2018
Zero-Shot Anticipation for Instructional Activities
Fadime Sener
Angela Yao
LM&Ro
25
68
0
06 Dec 2018
A deep neural network to enhance prediction of 1-year mortality using echocardiographic videos of the heart
Alvaro E. Ulloa
Linyuan Jing
Christopher W. Good
David P. vanMaanen
S. Raghunath
...
Aalpen A. Patel
H. Kirchner
Marios S. Pattichis
C. Haggerty
Brandon K. Fornwalt
19
3
0
26 Nov 2018
Y^2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences
Simon Denman
Mingyang Shang
Sabesan Sivapalan
Yu-Shen Liu
Matthias Zwicker
3DV
14
53
0
07 Nov 2018
Previous
1
2
3
4
Next