ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00487
  4. Cited By
Sequence to Sequence -- Video to Text

Sequence to Sequence -- Video to Text

3 May 2015
Subhashini Venugopalan
Marcus Rohrbach
Jeff Donahue
Raymond J. Mooney
Trevor Darrell
Kate Saenko
ArXivPDFHTML

Papers citing "Sequence to Sequence -- Video to Text"

50 / 459 papers shown
Title
Object Relational Graph with Teacher-Recommended Learning for Video
  Captioning
Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Ziqi Zhang
Yaya Shi
Chunfen Yuan
Bing Li
Peijin Wang
Weiming Hu
Zhengjun Zha
VLM
37
271
0
26 Feb 2020
Spatial-Temporal Multi-Cue Network for Continuous Sign Language
  Recognition
Spatial-Temporal Multi-Cue Network for Continuous Sign Language Recognition
Hao Zhou
Wen-gang Zhou
Yun Zhou
Houqiang Li
NoLa
32
195
0
08 Feb 2020
Multimodal Matching Transformer for Live Commenting
Multimodal Matching Transformer for Live Commenting
Chaoqun Duan
Lei Cui
Shuming Ma
Furu Wei
Conghui Zhu
Tiejun Zhao
6
12
0
07 Feb 2020
Convolutional Hierarchical Attention Network for Query-Focused Video
  Summarization
Convolutional Hierarchical Attention Network for Query-Focused Video Summarization
Shuwen Xiao
Zhou Zhao
Zijian Zhang
Ziyu Guan
Deng Cai
21
48
0
31 Jan 2020
Spatio-Temporal Ranked-Attention Networks for Video Captioning
Spatio-Temporal Ranked-Attention Networks for Video Captioning
A. Cherian
Jue Wang
Chiori Hori
Tim K. Marks
AI4TS
22
19
0
17 Jan 2020
Actions as Moving Points
Actions as Moving Points
Yixuan Li
Zixu Wang
Limin Wang
Gangshan Wu
22
104
0
14 Jan 2020
Exploiting Event Cameras for Spatio-Temporal Prediction of Fast-Changing
  Trajectories
Exploiting Event Cameras for Spatio-Temporal Prediction of Fast-Changing Trajectories
Marco Monforte
A. Arriandiaga
Arren J. Glover
Chiara Bartolozzi
26
10
0
05 Jan 2020
Personalizing Fast-Forward Videos Based on Visual and Textual Features
  from Social Network
Personalizing Fast-Forward Videos Based on Visual and Textual Features from Social Network
W. Ramos
M. Silva
Edson Roteia Araujo Junior
Alan C. Neves
Erickson R. Nascimento
22
6
0
29 Dec 2019
Vision and Language: from Visual Perception to Content Creation
Vision and Language: from Visual Perception to Content Creation
Tao Mei
Wei Zhang
Ting Yao
VLM
17
8
0
26 Dec 2019
Meaning guided video captioning
Meaning guided video captioning
Rushi J. Babariya
Toru Tamaki
30
3
0
12 Dec 2019
Forecasting future action sequences with attention: a new approach to
  weakly supervised action forecasting
Forecasting future action sequences with attention: a new approach to weakly supervised action forecasting
Yan Bin Ng
Basura Fernando
AI4TS
19
33
0
10 Dec 2019
Recurrent Neural Networks (RNNs): A gentle Introduction and Overview
Recurrent Neural Networks (RNNs): A gentle Introduction and Overview
Robin M. Schmidt
8
149
0
23 Nov 2019
Characterizing the impact of using features extracted from pre-trained
  models on the quality of video captioning sequence-to-sequence models
Characterizing the impact of using features extracted from pre-trained models on the quality of video captioning sequence-to-sequence models
Menatallh Hammad
May Hammad
Mohamed Elshenawy
24
2
0
22 Nov 2019
Crowd Video Captioning
Crowd Video Captioning
Liqi Yan
Mingjian Zhu
Changbin (Brad) Yu
11
4
0
13 Nov 2019
Video Captioning with Text-based Dynamic Attention and Step-by-Step
  Learning
Video Captioning with Text-based Dynamic Attention and Step-by-Step Learning
Huanhou Xiao
Jinglun Shi
11
24
0
05 Nov 2019
On Compositionality in Neural Machine Translation
On Compositionality in Neural Machine Translation
Vikas Raunak
Vaibhav Kumar
Florian Metze
13
17
0
04 Nov 2019
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural
  Network
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Network
R. Yazdani
Olatunji Ruwase
Minjia Zhang
Yuxiong He
J. Arnau
Antonio González
27
4
0
04 Nov 2019
Diverse Video Captioning Through Latent Variable Expansion
Diverse Video Captioning Through Latent Variable Expansion
Huanhou Xiao
Jinglun Shi
DiffM
35
15
0
26 Oct 2019
Imperial College London Submission to VATEX Video Captioning Task
Imperial College London Submission to VATEX Video Captioning Task
Ozan Caglayan
Zixiu "Alex" Wu
Pranava Madhyastha
Josiah Wang
Lucia Specia
17
0
0
16 Oct 2019
Human Action Sequence Classification
Human Action Sequence Classification
Yan Bin Ng
Basura Fernando
30
4
0
07 Oct 2019
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
Kexin Yi
Yuta Saito
Yunzhu Li
Pushmeet Kohli
Jiajun Wu
Antonio Torralba
J. Tenenbaum
NAI
43
457
0
03 Oct 2019
Translation, Sentiment and Voices: A Computational Model to Translate
  and Analyse Voices from Real-Time Video Calling
Translation, Sentiment and Voices: A Computational Model to Translate and Analyse Voices from Real-Time Video Calling
A. Roy
22
1
0
28 Sep 2019
Learning Actions from Human Demonstration Video for Robotic Manipulation
Learning Actions from Human Demonstration Video for Robotic Manipulation
Shuo Yang
Wei Zhang
Weizhi Lu
Hesheng Wang
Yibin Li
14
26
0
10 Sep 2019
Visual Semantic Reasoning for Image-Text Matching
Visual Semantic Reasoning for Image-Text Matching
Kunpeng Li
Yulun Zhang
Keqin Li
Yuanyuan Li
Y. Fu
VLM
17
499
0
06 Sep 2019
A Better Way to Attend: Attention with Trees for Video Question
  Answering
A Better Way to Attend: Attention with Trees for Video Question Answering
Hongyang Xue
Wenqing Chu
Zhou Zhao
Deng Cai
25
33
0
05 Sep 2019
A Semantics-Assisted Video Captioning Model Trained with Scheduled
  Sampling
A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling
Haoran Chen
Ke Lin
A. Maye
Jianmin Li
Xiaoling Hu
25
47
0
31 Aug 2019
Controllable Video Captioning with POS Sequence Guidance Based on Gated
  Fusion Network
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network
Bairui Wang
Lin Ma
Wei Zhang
Wenhao Jiang
Jingwen Wang
Wei Liu
74
163
0
27 Aug 2019
Language Features Matter: Effective Language Representations for
  Vision-Language Tasks
Language Features Matter: Effective Language Representations for Vision-Language Tasks
Andrea Burns
Reuben Tan
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
27
27
0
17 Aug 2019
SF-Net: Structured Feature Network for Continuous Sign Language
  Recognition
SF-Net: Structured Feature Network for Continuous Sign Language Recognition
Zhaoyang Yang
Zhenmei Shi
Xiaoyong Shen
Yu-Wing Tai
SLR
27
63
0
04 Aug 2019
Prediction and Description of Near-Future Activities in Video
Prediction and Description of Near-Future Activities in Video
T. Mahmud
Mohammad Billah
Mahmudul Hasan
A. Roy-Chowdhury
28
16
0
02 Aug 2019
Falls Prediction Based on Body Keypoints and Seq2Seq Architecture
Falls Prediction Based on Body Keypoints and Seq2Seq Architecture
Minjie Hua
Yibing Nan
Shiguo Lian
3DH
33
12
0
01 Aug 2019
ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a
  Mapping from Parts Detected in Multiple Views to Sentences
ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences
Zhizhong Han
Chao Chen
Yu-Shen Liu
Matthias Zwicker
3DPC
27
46
0
31 Jul 2019
Use What You Have: Video Retrieval Using Representations From
  Collaborative Experts
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
36
387
0
31 Jul 2019
Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based
  Mechanism for Videos
Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos
Sebastian Agethen
Winston H. Hsu
HAI
24
25
0
30 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
25
132
0
22 Jul 2019
Watch It Twice: Video Captioning with a Refocused Video Encoder
Watch It Twice: Video Captioning with a Refocused Video Encoder
Xiangxi Shi
Jianfei Cai
Chenyu You
Jiuxiang Gu
21
29
0
21 Jul 2019
Activity2Vec: Learning ADL Embeddings from Sensor Data with a
  Sequence-to-Sequence Model
Activity2Vec: Learning ADL Embeddings from Sensor Data with a Sequence-to-Sequence Model
Alireza Ghods
D. Cook
HAI
AI4TS
26
17
0
12 Jul 2019
Video Question Generation via Cross-Modal Self-Attention Networks
  Learning
Video Question Generation via Cross-Modal Self-Attention Networks Learning
Yu-Siang Wang
Hung-Ting Su
Chen-Hsi Chang
Zhe-Yu Liu
Winston H. Hsu
32
9
0
05 Jul 2019
Expressing Visual Relationships via Language
Expressing Visual Relationships via Language
Hao Tan
Franck Dernoncourt
Zhe-nan Lin
Trung Bui
Joey Tianyi Zhou
26
63
0
18 Jun 2019
Object-aware Aggregation with Bidirectional Temporal Graph for Video
  Captioning
Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning
Junchao Zhang
Yuxin Peng
24
170
0
11 Jun 2019
FASTER Recurrent Networks for Efficient Video Classification
FASTER Recurrent Networks for Efficient Video Classification
Linchao Zhu
Laura Sevilla-Lara
Du Tran
Matt Feiszli
Yi Yang
Heng Wang
49
6
0
10 Jun 2019
Attention is all you need for Videos: Self-attention based Video
  Summarization using Universal Transformers
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers
Manjot Bilkhu
Siyang Wang
Tushar Dobhal
ViT
11
15
0
06 Jun 2019
Two-Stream Region Convolutional 3D Network for Temporal Activity
  Detection
Two-Stream Region Convolutional 3D Network for Temporal Activity Detection
Huijuan Xu
Abir Das
Kate Saenko
3DPC
19
46
0
05 Jun 2019
Relational Reasoning using Prior Knowledge for Visual Captioning
Relational Reasoning using Prior Knowledge for Visual Captioning
Jingyi Hou
Xinxiao Wu
Yayun Qi
Wentian Zhao
Jiebo Luo
Yunde Jia
17
14
0
04 Jun 2019
Reconstruct and Represent Video Contents for Captioning via
  Reinforcement Learning
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning
Wei Zhang
Bairui Wang
Lin Ma
Wei Liu
20
67
0
03 Jun 2019
Learning to Generate Grounded Visual Captions without Localization
  Supervision
Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma
Yannis Kalantidis
Ghassan AlRegib
Peter Vajda
Marcus Rohrbach
Z. Kira
SSL
19
10
0
01 Jun 2019
AttentionRNN: A Structured Spatial Attention Mechanism
AttentionRNN: A Structured Spatial Attention Mechanism
Siddhesh Khandelwal
Leonid Sigal
24
3
0
22 May 2019
Memory-Attended Recurrent Network for Video Captioning
Memory-Attended Recurrent Network for Video Captioning
Wenjie Pei
Jiyuan Zhang
Xiangrong Wang
Lei Ke
Xiaoyong Shen
Yu-Wing Tai
14
200
0
10 May 2019
Multimodal Semantic Attention Network for Video Captioning
Multimodal Semantic Attention Network for Video Captioning
Liang Sun
Bing Li
Chunfen Yuan
Zhengjun Zha
Weiming Hu
29
11
0
08 May 2019
Towards More Realistic Human-Robot Conversation: A Seq2Seq-based Body
  Gesture Interaction System
Towards More Realistic Human-Robot Conversation: A Seq2Seq-based Body Gesture Interaction System
Minjie Hua
Fuyuan Shi
Yibing Nan
Kai Wang
Hao Chen
Shiguo Lian
8
10
0
05 May 2019
Previous
123456...8910
Next