ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00487
  4. Cited By
Sequence to Sequence -- Video to Text

Sequence to Sequence -- Video to Text

3 May 2015
Subhashini Venugopalan
Marcus Rohrbach
Jeff Donahue
Raymond J. Mooney
Trevor Darrell
Kate Saenko
ArXivPDFHTML

Papers citing "Sequence to Sequence -- Video to Text"

50 / 459 papers shown
Title
Improved Actor Relation Graph based Group Activity Recognition
Improved Actor Relation Graph based Group Activity Recognition
Zijian Kuang
Xinran Tie
23
5
0
24 Oct 2020
Deep Analysis of CNN-based Spatio-temporal Representations for Action
  Recognition
Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition
Chun-Fu Chen
Yikang Shen
K. Ramakrishnan
Rogerio Feris
J. M. Cohn
A. Oliva
Quanfu Fan
23
95
0
22 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and
  Emerging Trends
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
23
6
0
19 Oct 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase
  Grounding
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
Qinxin Wang
Hao Tan
Sheng Shen
Michael W. Mahoney
Z. Yao
ObjD
47
11
0
12 Oct 2020
Boosting Continuous Sign Language Recognition via Cross Modality
  Augmentation
Boosting Continuous Sign Language Recognition via Cross Modality Augmentation
Junfu Pu
Wen-gang Zhou
Hezhen Hu
Houqiang Li
43
108
0
11 Oct 2020
Support-set bottlenecks for video-text representation learning
Support-set bottlenecks for video-text representation learning
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
22
244
0
06 Oct 2020
Video captioning with stacked attention and semantic hard pull
Video captioning with stacked attention and semantic hard pull
Md. Mushfiqur Rahman
Thasinul Abedin
Khondokar S. S. Prottoy
Ayana Moshruba
Fazlul Hasan Siddiqui
27
2
0
15 Sep 2020
Multimodal Memorability: Modeling Effects of Semantics and Decay on
  Video Memorability
Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability
Anelise Newman
Camilo Luciano Fosco
Vincent Casser
Allen Lee
Mcnamara
A. Oliva
13
49
0
05 Sep 2020
Video Captioning Using Weak Annotation
Video Captioning Using Weak Annotation
Jingyi Hou
Yunde Jia
Xinxiao Wu
Yayun Qi
37
2
0
02 Sep 2020
Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
  Learning
Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer Learning
Yinghua Zhang
Yangqiu Song
Jian Liang
Kun Bai
Qiang Yang
AAML
34
28
0
25 Aug 2020
In-Home Daily-Life Captioning Using Radio Signals
In-Home Daily-Life Captioning Using Radio Signals
Lijie Fan
Tianhong Li
Yuan. Yuan
Dina Katabi
40
47
0
25 Aug 2020
Identity-Aware Multi-Sentence Video Description
Identity-Aware Multi-Sentence Video Description
J. S. Park
Trevor Darrell
Anna Rohrbach
23
17
0
22 Aug 2020
Poet: Product-oriented Video Captioner for E-commerce
Poet: Product-oriented Video Captioner for E-commerce
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Jie Liu
Jingren Zhou
Hongxia Yang
Fei Wu
14
34
0
16 Aug 2020
Domain-specific Communication Optimization for Distributed DNN Training
Domain-specific Communication Optimization for Distributed DNN Training
Hao Wang
Jingrong Chen
Xinchen Wan
Han Tian
Jiacheng Xia
Gaoxiong Zeng
Weiyan Wang
Kai Chen
Wei Bai
Junchen Jiang
AI4CE
6
15
0
16 Aug 2020
Enriching Video Captions With Contextual Text
Enriching Video Captions With Contextual Text
Philipp Rimle
Pelin Dogan
Markus Gross
30
3
0
29 Jul 2020
Learning Modality Interaction for Temporal Sentence Localization and
  Event Captioning in Videos
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
25
101
0
28 Jul 2020
Active Learning for Video Description With Cluster-Regularized Ensemble
  Ranking
Active Learning for Video Description With Cluster-Regularized Ensemble Ranking
David M. Chan
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
VLM
14
6
0
27 Jul 2020
Fully Convolutional Networks for Continuous Sign Language Recognition
Fully Convolutional Networks for Continuous Sign Language Recognition
Ka Leong Cheng
Zhaoyang Yang
Qifeng Chen
Yu-Wing Tai
SLR
44
143
0
24 Jul 2020
SBAT: Video Captioning with Sparse Boundary-Aware Transformer
SBAT: Video Captioning with Sparse Boundary-Aware Transformer
Tao Jin
Siyu Huang
Ming Chen
Yingming Li
Zhongfei Zhang
32
52
0
23 Jul 2020
Learning to Discretely Compose Reasoning Module Networks for Video
  Captioning
Learning to Discretely Compose Reasoning Module Networks for Video Captioning
Ganchao Tan
Daqing Liu
Meng Wang
Zhengjun Zha
LRM
25
73
0
17 Jul 2020
Visual Relation Grounding in Videos
Visual Relation Grounding in Videos
Junbin Xiao
Xindi Shang
Xun Yang
Sheng Tang
Tat-Seng Chua
20
40
0
17 Jul 2020
Analyzing and Mitigating Data Stalls in DNN Training
Analyzing and Mitigating Data Stalls in DNN Training
Jayashree Mohan
Amar Phanishayee
Ashish Raniwala
Vijay Chidambaram
28
104
0
14 Jul 2020
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal
  Shuffled Transformers
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers
Shijie Geng
Peng Gao
Moitreya Chatterjee
Chiori Hori
Jonathan Le Roux
Yongfeng Zhang
Hongsheng Li
A. Cherian
27
11
0
08 Jul 2020
Auto-captions on GIF: A Large-scale Video-sentence Dataset for
  Vision-language Pre-training
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Yingwei Pan
Yehao Li
Jianjie Luo
Jun Xu
Ting Yao
Tao Mei
38
57
0
05 Jul 2020
Modality Shifting Attention Network for Multi-modal Video Question
  Answering
Modality Shifting Attention Network for Multi-modal Video Question Answering
Junyeong Kim
Minuk Ma
T. Pham
Kyungsu Kim
Chang D. Yoo
26
72
0
04 Jul 2020
Efficient Algorithms for Device Placement of DNN Graph Operators
Efficient Algorithms for Device Placement of DNN Graph Operators
Jakub Tarnawski
Amar Phanishayee
Nikhil R. Devanur
Divya Mahajan
Fanny Nina Paravecino
27
66
0
29 Jun 2020
SACT: Self-Aware Multi-Space Feature Composition Transformer for
  Multinomial Attention for Video Captioning
SACT: Self-Aware Multi-Space Feature Composition Transformer for Multinomial Attention for Video Captioning
C. Sur
4
7
0
25 Jun 2020
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for
  Mixture Signals
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals
Jing Shi
Xuankai Chang
Pengcheng Guo
Shinji Watanabe
Yusuke Fujita
Jiaming Xu
Bo Xu
Lei Xie
34
21
0
25 Jun 2020
Comprehensive Information Integration Modeling Framework for Video
  Titling
Comprehensive Information Integration Modeling Framework for Video Titling
Shengyu Zhang
Ziqi Tan
Jin Yu
Zhou Zhao
Kun Kuang
Tan Jiang
Jingren Zhou
Hongxia Yang
Fei Wu
31
40
0
24 Jun 2020
Sub-Seasonal Climate Forecasting via Machine Learning: Challenges,
  Analysis, and Advances
Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances
Sijie He
Xinyan Li
T. DelSole
Pradeep Ravikumar
A. Banerjee
AI4Cl
29
43
0
14 Jun 2020
NITS-VC System for VATEX Video Captioning Challenge 2020
NITS-VC System for VATEX Video Captioning Challenge 2020
Alok Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
6
16
0
07 Jun 2020
Multi-modal Feature Fusion with Feature Attention for VATEX Captioning
  Challenge 2020
Multi-modal Feature Fusion with Feature Attention for VATEX Captioning Challenge 2020
Ke Lin
Zhuoxin Gan
Liwei Wang
11
8
0
05 Jun 2020
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal
  Transformer
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer
Vladimir E. Iashin
Esa Rahtu
22
127
0
17 May 2020
Text Synopsis Generation for Egocentric Videos
Text Synopsis Generation for Egocentric Videos
Aidean Sharghi
N. Lobo
M. Shah
DiffM
EgoV
11
1
0
08 May 2020
Learning from Noisy Labels with Noise Modeling Network
Learning from Noisy Labels with Noise Modeling Network
Zhuolin Jiang
J. Silovský
M. Siu
William Hartmann
H. Gish
Sancar Adali
NoLa
17
2
0
01 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
46
494
0
01 May 2020
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation
Boxiao Pan
Haoye Cai
De-An Huang
Kuan-Hui Lee
Adrien Gaidon
Ehsan Adeli
Juan Carlos Niebles
31
235
0
31 Mar 2020
Detection and Description of Change in Visual Streams
Detection and Description of Change in Visual Streams
Davis Gilton
Ruotian Luo
Rebecca Willett
Gregory Shakhnarovich
AI4TS
18
4
0
27 Mar 2020
Action Localization through Continual Predictive Learning
Action Localization through Continual Predictive Learning
Sathyanarayanan N. Aakur
Sudeep Sarkar
22
12
0
26 Mar 2020
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
J. Liu
Wenhu Chen
Yu Cheng
Zhe Gan
Licheng Yu
Yiming Yang
Jingjing Liu
MLLM
VGen
43
68
0
25 Mar 2020
Normalized and Geometry-Aware Self-Attention Network for Image
  Captioning
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
135
189
0
19 Mar 2020
Multi-modal Dense Video Captioning
Multi-modal Dense Video Captioning
Vladimir E. Iashin
Esa Rahtu
22
164
0
17 Mar 2020
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video
  Captioning
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning
Zhiyuan Fang
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
23
60
0
11 Mar 2020
Video Caption Dataset for Describing Human Actions in Japanese
Video Caption Dataset for Describing Human Actions in Japanese
Yutaro Shigeto
Yuya Yoshikawa
Jiaqing Lin
A. Takeuchi
17
3
0
10 Mar 2020
OVC-Net: Object-Oriented Video Captioning with Temporal Graph and Detail
  Enhancement
OVC-Net: Object-Oriented Video Captioning with Temporal Graph and Detail Enhancement
Fangyi Zhu
Lei Li
Zhanyu Ma
Guang Chen
Jun Guo
19
1
0
08 Mar 2020
RODNet: Radar Object Detection Using Cross-Modal Supervision
RODNet: Radar Object Detection Using Cross-Modal Supervision
Yizhou Wang
Zhongyu Jiang
Xiangyu Gao
Lei Li
Guanbin Xing
Hui Liu
24
9
0
03 Mar 2020
Understanding Contexts Inside Robot and Human Manipulation Tasks through
  a Vision-Language Model and Ontology System in a Video Stream
Understanding Contexts Inside Robot and Human Manipulation Tasks through a Vision-Language Model and Ontology System in a Video Stream
Chen Jiang
Masood Dehghan
Martin Jägersand
LM&Ro
41
8
0
02 Mar 2020
Matching Neuromorphic Events and Color Images via Adversarial Learning
Matching Neuromorphic Events and Color Images via Adversarial Learning
Fang Xu
Shijie Lin
Wen Yang
Lei Yu
Dengxin Dai
Guisong Xia
25
1
0
02 Mar 2020
Hierarchical Memory Decoding for Video Captioning
Hierarchical Memory Decoding for Video Captioning
Aming Wu
Yahong Han
22
2
0
27 Feb 2020
CLARA: Clinical Report Auto-completion
CLARA: Clinical Report Auto-completion
Siddharth Biswal
Cao Xiao
Lucas Glass
M. P. M. Brandon Westover
Jimeng Sun
24
27
0
26 Feb 2020
Previous
12345...8910
Next