Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.00487
Cited By
Sequence to Sequence -- Video to Text
3 May 2015
Subhashini Venugopalan
Marcus Rohrbach
Jeff Donahue
Raymond J. Mooney
Trevor Darrell
Kate Saenko
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sequence to Sequence -- Video to Text"
50 / 459 papers shown
Title
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval
Ning Han
Jingjing Chen
Chuhao Shi
Yawen Zeng
Guangyi Xiao
Hao Chen
22
10
0
29 Oct 2021
Video and Text Matching with Conditioned Embeddings
Ameen Ali
Idan Schwartz
Tamir Hazan
Lior Wolf
94
13
0
21 Oct 2021
Visual-aware Attention Dual-stream Decoder for Video Captioning
Zhixin Sun
Xian Zhong
Shuqin Chen
Lin Li
Luo Zhong
31
3
0
16 Oct 2021
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
35
150
0
13 Oct 2021
Reliable Shot Identification for Complex Event Detection via Visual-Semantic Embedding
Minnan Luo
Xiaojun Chang
Chen Gong
13
9
0
12 Oct 2021
Beam Search with Bidirectional Strategies for Neural Response Generation
Pierre Colombo
Chouchang Yang
Giovanna Varni
Chloé Clavel
41
13
0
07 Oct 2021
Attention Gate in Traffic Forecasting
Anh-Phuong Lam
A. Nguyen
H. Le
25
0
0
27 Sep 2021
Scene Graph Generation for Better Image Captioning?
Maximilian Mozes
Martin Schmitt
Vladimir Golkov
Hinrich Schütze
Daniel Cremers
GNN
29
3
0
23 Sep 2021
EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery Generation
Yanjun Gao
Lulu Liu
Jason Wang
Xin Chen
Huayan Wang
Rui Zhang
31
1
0
10 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
31
12
0
07 Sep 2021
Target Adaptive Context Aggregation for Video Scene Graph Generation
Yao Teng
Limin Wang
Zhifeng Li
Gangshan Wu
37
62
0
18 Aug 2021
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
47
179
0
17 Aug 2021
Discriminative Latent Semantic Graph for Video Captioning
Yang Bai
Junyan Wang
Yang Long
Bingzhang Hu
Yang Song
Maurice Pagnucco
Yu Guan
46
31
0
08 Aug 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
167
154
0
07 Aug 2021
Hybrid Reasoning Network for Video-based Commonsense Captioning
Weijiang Yu
Jian Liang
Lei Ji
Lu Li
Yuejian Fang
Nong Xiao
Nan Duan
19
10
0
05 Aug 2021
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Chiori Hori
Takaaki Hori
Jonathan Le Roux
25
4
0
04 Aug 2021
PiSLTRc: Position-informed Sign Language Transformer with Content-aware Convolution
Pan Xie
Mengyi Zhao
Xiaohui Hu
ViT
SLR
38
35
0
27 Jul 2021
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference
Juncheng Li
Siliang Tang
Linchao Zhu
Haochen Shi
Xuanwen Huang
Fei Wu
Yi Yang
Yueting Zhuang
27
28
0
26 Jul 2021
Boosting Video Captioning with Dynamic Loss Network
Nasib Ullah
Partha Pratim Mohanta
30
1
0
25 Jul 2021
A comparison of LSTM and GRU networks for learning symbolic sequences
Roberto Cahuantzi
Xinye Chen
S. Güttel
28
136
0
05 Jul 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song
Shizhe Chen
Qin Jin
21
37
0
30 May 2021
Unsupervised Video Summarization with a Convolutional Attentive Adversarial Network
Guoqiang Liang
Yanbing Lv
Shucheng Li
Shizhou Zhang
Yanning Zhang
GAN
13
9
0
24 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
45
444
0
18 May 2021
Video Corpus Moment Retrieval with Contrastive Learning
Hao Zhang
Aixin Sun
Wei Jing
Guoshun Nan
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
44
81
0
13 May 2021
Reconstructive Sequence-Graph Network for Video Summarization
Bin Zhao
Haopeng Li
Xiaoqiang Lu
Xuelong Li
18
101
0
10 May 2021
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
Hailin Jin
Andrew Zisserman
Samuel Albanie
Yang Liu
VGen
21
124
0
16 Apr 2021
Grounding Open-Domain Instructions to Automate Web Support Tasks
N. Xu
Sam Masling
Michael Du
Giovanni Campagna
Larry Heck
James A. Landay
M. Lam
LLMAG
AI4TS
8
41
0
30 Mar 2021
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
13
17
0
27 Mar 2021
On the hidden treasure of dialog in video question answering
Deniz Engin
Franccois Schnitzler
Ngoc Q. K. Duong
Yannis Avrithis
29
10
0
26 Mar 2021
On Semantic Similarity in Video Retrieval
Michael Wray
Hazel Doughty
Dima Damen
33
66
0
18 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
35
37
0
06 Mar 2021
RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization
Yizhou Wang
Zhongyu Jiang
Yudong Li
Lei Li
Guanbin Xing
Hui Liu
39
158
0
09 Feb 2021
The Role of the Input in Natural Language Video Description
S. Cascianelli
G. Costante
Alessandro Devo
Thomas Alessandro Ciarfuglia
P. Valigi
M. L. Fravolini
15
5
0
09 Feb 2021
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
Ruilong Li
Sha Yang
David A. Ross
Angjoo Kanazawa
ViT
219
482
0
21 Jan 2021
End-to-End Video Question-Answer Generation with Generator-Pretester Network
Hung-Ting Su
Chen-Hsi Chang
Po-Wei Shen
Yu-Siang Wang
Ya-Liang Chang
Yu-Cheng Chang
Pu-Jen Cheng
Winston H. Hsu
35
31
0
05 Jan 2021
Video Captioning in Compressed Video
Mingjian Zhu
Chenrui Duan
Changbin (Brad) Yu
17
4
0
02 Jan 2021
Searching a Raw Video Database using Natural Language Queries
Sriram Krishna
Siddarth Vinay
S. SrinivasK.
18
0
0
31 Dec 2020
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
28
8
0
13 Dec 2020
Driving Behavior Explanation with Multi-level Fusion
H. Ben-younes
Éloi Zablocki
Patrick Pérez
Matthieu Cord
27
30
0
09 Dec 2020
Understanding Action Sequences based on Video Captioning for Learning-from-Observation
Iori Yanokura
Naoki Wake
Kazuhiro Sasabuchi
Katsushi Ikeuchi
Masayuki Inaba
30
4
0
09 Dec 2020
StacMR: Scene-Text Aware Cross-Modal Retrieval
Andrés Mafla
Rafael Sampaio de Rezende
Lluís Gómez
Diane Larlus
Dimosthenis Karatzas
3DV
45
14
0
08 Dec 2020
Rethinking movie genre classification with fine-grained semantic clustering
Edward Fish
Jon Weinbren
Andrew Gilbert
VLM
34
7
0
04 Dec 2020
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling
Jing Su
Qingyun Dai
Frank Guerin
Mian Zhou
30
24
0
03 Dec 2020
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DV
VLM
19
5
0
30 Nov 2020
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
32
3
0
18 Nov 2020
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus
Bowen Zhang
Hexiang Hu
Joonseok Lee
Mingde Zhao
Sheide Chammas
Vihan Jain
Eugene Ie
Fei Sha
25
30
0
18 Nov 2020
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering
Aman Chadha
Gurneet Arora
Navpreet Kaloty
27
35
0
16 Nov 2020
DORB: Dynamically Optimizing Multiple Rewards with Bandits
Ramakanth Pasunuru
Han Guo
Joey Tianyi Zhou
OffRL
32
6
0
15 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
31
168
0
01 Nov 2020
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
ELM
39
35
0
26 Oct 2020
Previous
1
2
3
4
5
6
...
8
9
10
Next