ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00487
  4. Cited By
Sequence to Sequence -- Video to Text

Sequence to Sequence -- Video to Text

3 May 2015
Subhashini Venugopalan
Marcus Rohrbach
Jeff Donahue
Raymond J. Mooney
Trevor Darrell
Kate Saenko
ArXivPDFHTML

Papers citing "Sequence to Sequence -- Video to Text"

50 / 459 papers shown
Title
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video
  Retrieval
BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval
Ning Han
Jingjing Chen
Chuhao Shi
Yawen Zeng
Guangyi Xiao
Hao Chen
22
10
0
29 Oct 2021
Video and Text Matching with Conditioned Embeddings
Video and Text Matching with Conditioned Embeddings
Ameen Ali
Idan Schwartz
Tamir Hazan
Lior Wolf
94
13
0
21 Oct 2021
Visual-aware Attention Dual-stream Decoder for Video Captioning
Visual-aware Attention Dual-stream Decoder for Video Captioning
Zhixin Sun
Xian Zhong
Shuqin Chen
Lin Li
Luo Zhong
31
3
0
16 Oct 2021
CLIP4Caption: CLIP for Video Caption
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
35
150
0
13 Oct 2021
Reliable Shot Identification for Complex Event Detection via
  Visual-Semantic Embedding
Reliable Shot Identification for Complex Event Detection via Visual-Semantic Embedding
Minnan Luo
Xiaojun Chang
Chen Gong
13
9
0
12 Oct 2021
Beam Search with Bidirectional Strategies for Neural Response Generation
Beam Search with Bidirectional Strategies for Neural Response Generation
Pierre Colombo
Chouchang Yang
Giovanna Varni
Chloé Clavel
41
13
0
07 Oct 2021
Attention Gate in Traffic Forecasting
Attention Gate in Traffic Forecasting
Anh-Phuong Lam
A. Nguyen
H. Le
25
0
0
27 Sep 2021
Scene Graph Generation for Better Image Captioning?
Scene Graph Generation for Better Image Captioning?
Maximilian Mozes
Martin Schmitt
Vladimir Golkov
Hinrich Schütze
Daniel Cremers
GNN
29
3
0
23 Sep 2021
EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery
  Generation
EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery Generation
Yanjun Gao
Lulu Liu
Jason Wang
Xin Chen
Huayan Wang
Rui Zhang
31
1
0
10 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal
  Attention
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
31
12
0
07 Sep 2021
Target Adaptive Context Aggregation for Video Scene Graph Generation
Target Adaptive Context Aggregation for Video Scene Graph Generation
Yao Teng
Limin Wang
Zhifeng Li
Gangshan Wu
37
62
0
18 Aug 2021
End-to-End Dense Video Captioning with Parallel Decoding
End-to-End Dense Video Captioning with Parallel Decoding
Teng Wang
Ruimao Zhang
Zhichao Lu
Feng Zheng
Ran Cheng
Ping Luo
3DV
47
179
0
17 Aug 2021
Discriminative Latent Semantic Graph for Video Captioning
Discriminative Latent Semantic Graph for Video Captioning
Yang Bai
Junyan Wang
Yang Long
Bingzhang Hu
Yang Song
Maurice Pagnucco
Yu Guan
46
31
0
08 Aug 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
167
154
0
07 Aug 2021
Hybrid Reasoning Network for Video-based Commonsense Captioning
Hybrid Reasoning Network for Video-based Commonsense Captioning
Weijiang Yu
Jian Liang
Lei Ji
Lu Li
Yuejian Fang
Nong Xiao
Nan Duan
19
10
0
05 Aug 2021
Optimizing Latency for Online Video CaptioningUsing Audio-Visual
  Transformers
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Chiori Hori
Takaaki Hori
Jonathan Le Roux
25
4
0
04 Aug 2021
PiSLTRc: Position-informed Sign Language Transformer with Content-aware
  Convolution
PiSLTRc: Position-informed Sign Language Transformer with Content-aware Convolution
Pan Xie
Mengyi Zhao
Xiaohui Hu
ViT
SLR
38
35
0
27 Jul 2021
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for
  Video-and-Language Inference
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference
Juncheng Li
Siliang Tang
Linchao Zhu
Haochen Shi
Xuanwen Huang
Fei Wu
Yi Yang
Yueting Zhuang
27
28
0
26 Jul 2021
Boosting Video Captioning with Dynamic Loss Network
Boosting Video Captioning with Dynamic Loss Network
Nasib Ullah
Partha Pratim Mohanta
30
1
0
25 Jul 2021
A comparison of LSTM and GRU networks for learning symbolic sequences
A comparison of LSTM and GRU networks for learning symbolic sequences
Roberto Cahuantzi
Xinye Chen
S. Güttel
28
136
0
05 Jul 2021
Towards Diverse Paragraph Captioning for Untrimmed Videos
Towards Diverse Paragraph Captioning for Untrimmed Videos
Yuqing Song
Shizhe Chen
Qin Jin
21
37
0
30 May 2021
Unsupervised Video Summarization with a Convolutional Attentive
  Adversarial Network
Unsupervised Video Summarization with a Convolutional Attentive Adversarial Network
Guoqiang Liang
Yanbing Lv
Shucheng Li
Shizhou Zhang
Yanning Zhang
GAN
13
9
0
24 May 2021
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
45
444
0
18 May 2021
Video Corpus Moment Retrieval with Contrastive Learning
Video Corpus Moment Retrieval with Contrastive Learning
Hao Zhang
Aixin Sun
Wei Jing
Guoshun Nan
Liangli Zhen
Qiufeng Wang
Rick Siow Mong Goh
44
81
0
13 May 2021
Reconstructive Sequence-Graph Network for Video Summarization
Reconstructive Sequence-Graph Network for Video Summarization
Bin Zhao
Haopeng Li
Xiaoqiang Lu
Xuelong Li
18
101
0
10 May 2021
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
TEACHTEXT: CrossModal Generalized Distillation for Text-Video Retrieval
Ioana Croitoru
Simion-Vlad Bogolin
Marius Leordeanu
Hailin Jin
Andrew Zisserman
Samuel Albanie
Yang Liu
VGen
21
124
0
16 Apr 2021
Grounding Open-Domain Instructions to Automate Web Support Tasks
Grounding Open-Domain Instructions to Automate Web Support Tasks
N. Xu
Sam Masling
Michael Du
Giovanni Campagna
Larry Heck
James A. Landay
M. Lam
LLMAG
AI4TS
8
41
0
30 Mar 2021
A Comprehensive Review of the Video-to-Text Problem
A Comprehensive Review of the Video-to-Text Problem
Jesus Perez-Martin
B. Bustos
S. Guimarães
I. Sipiran
Jorge A. Pérez
Grethel Coello Said
13
17
0
27 Mar 2021
On the hidden treasure of dialog in video question answering
On the hidden treasure of dialog in video question answering
Deniz Engin
Franccois Schnitzler
Ngoc Q. K. Duong
Yannis Avrithis
29
10
0
26 Mar 2021
On Semantic Similarity in Video Retrieval
On Semantic Similarity in Video Retrieval
Michael Wray
Hazel Doughty
Dima Damen
33
66
0
18 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal
  Tasks with Language and Vision
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
35
37
0
06 Mar 2021
RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by
  Camera-Radar Fused Object 3D Localization
RODNet: A Real-Time Radar Object Detection Network Cross-Supervised by Camera-Radar Fused Object 3D Localization
Yizhou Wang
Zhongyu Jiang
Yudong Li
Lei Li
Guanbin Xing
Hui Liu
39
158
0
09 Feb 2021
The Role of the Input in Natural Language Video Description
The Role of the Input in Natural Language Video Description
S. Cascianelli
G. Costante
Alessandro Devo
Thomas Alessandro Ciarfuglia
P. Valigi
M. L. Fravolini
15
5
0
09 Feb 2021
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
AI Choreographer: Music Conditioned 3D Dance Generation with AIST++
Ruilong Li
Sha Yang
David A. Ross
Angjoo Kanazawa
ViT
219
482
0
21 Jan 2021
End-to-End Video Question-Answer Generation with Generator-Pretester
  Network
End-to-End Video Question-Answer Generation with Generator-Pretester Network
Hung-Ting Su
Chen-Hsi Chang
Po-Wei Shen
Yu-Siang Wang
Ya-Liang Chang
Yu-Cheng Chang
Pu-Jen Cheng
Winston H. Hsu
35
31
0
05 Jan 2021
Video Captioning in Compressed Video
Video Captioning in Compressed Video
Mingjian Zhu
Chenrui Duan
Changbin (Brad) Yu
17
4
0
02 Jan 2021
Searching a Raw Video Database using Natural Language Queries
Searching a Raw Video Database using Natural Language Queries
Sriram Krishna
Siddarth Vinay
S. SrinivasK.
18
0
0
31 Dec 2020
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision
  and Language Research in Turkish
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish
Begum Citamak
Ozan Caglayan
Menekse Kuyu
Erkut Erdem
Aykut Erdem
Pranava Madhyastha
Lucia Specia
28
8
0
13 Dec 2020
Driving Behavior Explanation with Multi-level Fusion
Driving Behavior Explanation with Multi-level Fusion
H. Ben-younes
Éloi Zablocki
Patrick Pérez
Matthieu Cord
27
30
0
09 Dec 2020
Understanding Action Sequences based on Video Captioning for
  Learning-from-Observation
Understanding Action Sequences based on Video Captioning for Learning-from-Observation
Iori Yanokura
Naoki Wake
Kazuhiro Sasabuchi
Katsushi Ikeuchi
Masayuki Inaba
30
4
0
09 Dec 2020
StacMR: Scene-Text Aware Cross-Modal Retrieval
StacMR: Scene-Text Aware Cross-Modal Retrieval
Andrés Mafla
Rafael Sampaio de Rezende
Lluís Gómez
Diane Larlus
Dimosthenis Karatzas
3DV
45
14
0
08 Dec 2020
Rethinking movie genre classification with fine-grained semantic
  clustering
Rethinking movie genre classification with fine-grained semantic clustering
Edward Fish
Jon Weinbren
Andrew Gilbert
VLM
34
7
0
04 Dec 2020
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling
BERT-hLSTMs: BERT and Hierarchical LSTMs for Visual Storytelling
Jing Su
Qingyun Dai
Frank Guerin
Mian Zhou
30
24
0
03 Dec 2020
A Comprehensive Review on Recent Methods and Challenges of Video
  Description
A Comprehensive Review on Recent Methods and Challenges of Video Description
Ashutosh Kumar Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
3DV
VLM
19
5
0
30 Nov 2020
Neuro-Symbolic Representations for Video Captioning: A Case for
  Leveraging Inductive Biases for Vision and Language
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Hassan Akbari
Hamid Palangi
Jianwei Yang
Sudha Rao
Asli Celikyilmaz
Roland Fernandez
P. Smolensky
Jianfeng Gao
Shih-Fu Chang
32
3
0
18 Nov 2020
A Hierarchical Multi-Modal Encoder for Moment Localization in Video
  Corpus
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus
Bowen Zhang
Hexiang Hu
Joonseok Lee
Mingde Zhao
Sheide Chammas
Vihan Jain
Eugene Ie
Fei Sha
25
30
0
18 Nov 2020
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video
  Captioning and Video Question Answering
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering
Aman Chadha
Gurneet Arora
Navpreet Kaloty
27
35
0
16 Nov 2020
DORB: Dynamically Optimizing Multiple Rewards with Bandits
DORB: Dynamically Optimizing Multiple Rewards with Bandits
Ramakanth Pasunuru
Han Guo
Joey Tianyi Zhou
OffRL
32
6
0
15 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation
  Learning
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
31
168
0
01 Nov 2020
Curious Case of Language Generation Evaluation Metrics: A Cautionary
  Tale
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
ELM
39
35
0
26 Oct 2020
Previous
123456...8910
Next