Sequence to Sequence -- Video to Text

3 May 2015

Subhashini Venugopalan

Papers citing "Sequence to Sequence -- Video to Text"

50 / 459 papers shown

Title
Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning Jingwen Chen Yingwei Pan Yehao Li Ting Yao Hongyang Chao Tao Mei 21 104 0 03 May 2019
Hierarchical Recurrent Neural Network for Video Summarization Bin Zhao Xuelong Li Xiaoqiang Lu 23 174 0 28 Apr 2019
Knowing When to Stop: Evaluation and Verification of Conformity to Output-size Specifications Chenglong Wang Rudy Bunel Krishnamurthy Dvijotham Po-Sen Huang Edward Grefenstette Pushmeet Kohli 30 5 0 26 Apr 2019
FishNet: A Camera Localizer using Deep Recurrent Networks Hsin-I Chen Sebastian Agethen Chia-Min Wu Winston H. Hsu Bing-Yu Chen 16 0 0 22 Apr 2019
Neural-Attention-Based Deep Learning Architectures for Modeling Traffic Dynamics on Lane Graphs Matthew A. Wright Simon F. G. Ehlers R. Horowitz AI4CE GNN 14 4 0 18 Apr 2019
Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions Peratham Wiriyathammabhum Abhinav Shrivastava Vlad I. Morariu L. Davis 25 4 0 08 Apr 2019
Streamlined Dense Video Captioning Jonghwan Mun L. Yang Zhou Ren N. Xu Bohyung Han 28 136 0 08 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Xin Eric Wang Jiawei Wu Junkun Chen Lei Li Yuan-fang Wang William Yang Wang 32 539 0 06 Apr 2019
The Steep Road to Happily Ever After: An Analysis of Current Visual Storytelling Models Yatri Modi Natalie Parde 21 16 0 06 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries Niluthpol Chowdhury Mithun S. Paul A. Roy-Chowdhury 30 193 0 05 Apr 2019
Scene Understanding for Autonomous Manipulation with Deep Learning A. Nguyen 22 6 0 23 Mar 2019
V2CNet: A Deep Learning Framework to Translate Videos to Commands for Robotic Manipulation A. Nguyen Thanh-Toan Do Ian Reid D. Caldwell Nikos G. Tsagarakis 29 21 0 23 Mar 2019
M-VAD Names: a Dataset for Video Captioning with Naming S. Pini Marcella Cornia Federico Bolelli Lorenzo Baraldi Rita Cucchiara 21 29 0 04 Mar 2019
Spatiotemporal Pyramid Network for Video Action Recognition Yunbo Wang Mingsheng Long Jianmin Wang Philip S. Yu 32 227 0 04 Mar 2019
Video Summarization via Actionness Ranking Mohamed Elfeki Ali Borji 19 42 0 01 Mar 2019
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning Nayyer Aafaq Naveed Akhtar Wei Liu Syed Zulqarnain Gilani Ajmal Mian 31 204 0 27 Feb 2019
Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning Youngeun Kwon Minsoo Rhu 19 56 0 18 Feb 2019
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey Longlong Jing Yingli Tian SSL 20 1,689 0 16 Feb 2019
Hierarchical Photo-Scene Encoder for Album Storytelling Bairui Wang Lin Ma Wei Zhang Wenhao Jiang Feng-Li Zhang 11 28 0 02 Feb 2019
Not All Words are Equal: Video-specific Information Loss for Video Captioning Jiarong Dong Ke Gao Xiaokai Chen Junbo Guo Juan Cao Yongdong Zhang 21 7 0 01 Jan 2019
Hierarchical LSTMs with Adaptive Attention for Visual Captioning Jingkuan Song Xiangpeng Li Lianli Gao Heng Tao Shen 23 221 0 26 Dec 2018
Context, Attention and Audio Feature Explorations for Audio Visual Scene-Aware Dialog Shachi H. Kumar Eda Okur Saurav Sahay Juan Jose Alvarado Leanos Jonathan Huang L. Nachman 8 10 0 20 Dec 2018
Adversarial Inference for Multi-Sentence Video Description J. S. Park Marcus Rohrbach Trevor Darrell Anna Rohrbach 21 79 0 13 Dec 2018
Weakly Supervised Dense Event Captioning in Videos Xuguang Duan Wen-bing Huang Chuang Gan Jingdong Wang Wenwu Zhu Junzhou Huang 33 148 0 10 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning Yapeng Tian Chenxiao Guan Justin Goodman Marc Moore Chenliang Xu 36 20 0 07 Dec 2018
Zero-Shot Anticipation for Instructional Activities Fadime Sener Angela Yao LM&Ro 25 68 0 06 Dec 2018
How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos Shaojie Wang Wentian Zhao Ziyi Kou Chenliang Xu 9 5 0 02 Dec 2018
Multi-Stream Dynamic Video Summarization Mohamed Elfeki Liqiang Wang Ali Borji EgoV 34 15 0 01 Dec 2018
A deep neural network to enhance prediction of 1-year mortality using echocardiographic videos of the heart Alvaro E. Ulloa Linyuan Jing Christopher W. Good David P. vanMaanen S. Raghunath ... Aalpen A. Patel H. Kirchner Marios S. Pattichis C. Haggerty Brandon K. Fornwalt 22 3 0 26 Nov 2018
Chat More If You Like: Dynamic Cue Words Planning to Flow Longer Conversations Lili Yao Ruijian Xu Chong Li Dongyan Zhao Rui Yan 14 9 0 19 Nov 2018
A Perceptual Prediction Framework for Self Supervised Event Segmentation Sathyanarayanan N. Aakur Sudeep Sarkar 19 69 0 12 Nov 2018
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning Yoonchang Sung Jiawei Wu Da Zhang Yu-Chuan Su Pratap Tokekar 32 38 0 07 Nov 2018
Y^2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences Simon Denman Mingyang Shang Sabesan Sivapalan Yu-Shen Liu Matthias Zwicker 3DV 19 53 0 07 Nov 2018
Middle-Out Decoding Shikib Mehri Leonid Sigal 24 22 0 28 Oct 2018
A Knowledge-Grounded Multimodal Search-Based Conversational Agent Shubham Agarwal Ondrej Dusek Ioannis Konstas Verena Rieser 31 22 0 20 Oct 2018
Cross-Modal and Hierarchical Modeling of Video and Text Bowen Zhang Hexiang Hu Fei Sha BDL AI4TS 23 188 0 16 Oct 2018
Trellis Networks for Sequence Modeling Shaojie Bai J. Zico Kolter V. Koltun 25 145 0 15 Oct 2018
Deep Photovoltaic Nowcasting Jinsong Zhang Rodrigo Verschae S. Nobuhara Jean-François Lalonde 20 158 0 15 Oct 2018
Image-to-Video Person Re-Identification by Reusing Cross-modal Embeddings Zhongwei Xie Lin Li Xian Zhong Luo Zhong 14 2 0 04 Oct 2018
Vector Learning for Cross Domain Representations Shagan Sah Chi Zhang Thang Nguyen D. Peri Ameya Shringi R. Ptucha GAN 21 3 0 27 Sep 2018
Semantic Sentence Embeddings for Paraphrasing and Text Summarization Chi Zhang Shagan Sah Thang Nguyen D. Peri A. Loui C. Salvaggio R. Ptucha 29 31 0 26 Sep 2018
MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description Oliver A. Nina Washington Garcia Scott Clouse Alper Yilmaz 20 4 0 19 Sep 2018
LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts Shuming Ma Lei Cui Damai Dai Furu Wei Xu Sun VGen 23 61 0 13 Sep 2018
Game-Based Video-Context Dialogue Ramakanth Pasunuru Joey Tianyi Zhou 31 33 0 12 Sep 2018
Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report Renjie Zheng Yilin Yang Mingbo Ma Liang Huang 12 8 0 31 Aug 2018
Multi-Reference Training with Pseudo-References for Neural Translation and Text Generation Renjie Zheng Mingbo Ma Liang Huang 41 35 0 28 Aug 2018
Natural Language Generation with Neural Variational Models Hareesh Bahuleyan DRL 16 6 0 27 Aug 2018
Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions Ke Ning Linchao Zhu Ming Cai Yi Yang Di Xie Fei Wu 21 2 0 27 Aug 2018
simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions Fenglin Liu Xuancheng Ren Yuanxin Liu Houfeng Wang Xu Sun 98 65 0 27 Aug 2018
Deep Adaptive Temporal Pooling for Activity Recognition Sibo Song Ngai-man Cheung V. Chandrasekhar Bappaditya Mandal 16 16 0 22 Aug 2018