ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1701.03126
  4. Cited By
Attention-Based Multimodal Fusion for Video Description

Attention-Based Multimodal Fusion for Video Description

11 January 2017
Chiori Hori
Takaaki Hori
Teng-Yok Lee
Kazuhiro Sumi
J. Hershey
Tim K. Marks
ArXivPDFHTML

Papers citing "Attention-Based Multimodal Fusion for Video Description"

31 / 31 papers shown
Title
Video ReCap: Recursive Captioning of Hour-Long Videos
Video ReCap: Recursive Captioning of Hour-Long Videos
Md. Mohaiminul Islam
Ngan Ho
Xitong Yang
Tushar Nagarajan
Lorenzo Torresani
Gedas Bertasius
VGen
VLM
35
44
0
20 Feb 2024
Modality Mixer Exploiting Complementary Information for Multi-modal
  Action Recognition
Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition
Sumin Lee
Sangmin Woo
Muhammad Adi Nugroho
Changick Kim
25
0
0
21 Nov 2023
SemAttNet: Towards Attention-based Semantic Aware Guided Depth
  Completion
SemAttNet: Towards Attention-based Semantic Aware Guided Depth Completion
Danish Nazir
Marcus Liwicki
D. Stricker
Muhammad Zeshan Afzal
VLM
MDE
29
45
0
28 Apr 2022
Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic
  Segmentation
Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation
Damien Robert
Bruno Vallet
Loic Landrieu
3DPC
31
69
0
15 Apr 2022
Guiding Attention using Partial-Order Relationships for Image Captioning
Guiding Attention using Partial-Order Relationships for Image Captioning
Murad Popattia
Muhammad Rafi
Rizwan Qureshi
Shah Nawaz
19
4
0
15 Apr 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech
  Recognition and Translation
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi
Jiatong Shi
Brian Yan
Osbel López-Francisco
Jonathan D. Amith
Shinji Watanabe
8
26
0
05 Apr 2022
Dynamic Multimodal Fusion
Dynamic Multimodal Fusion
Zihui Xue
R. Marculescu
37
47
0
31 Mar 2022
Audio-Driven Talking Face Video Generation with Dynamic Convolution
  Kernels
Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels
Zipeng Ye
Mengfei Xia
Ran Yi
Juyong Zhang
Yu-Kun Lai
Xuanteng Huang
Guoxin Zhang
Yong-jin Liu
CVBM
22
39
0
16 Jan 2022
Attention-based Multi-hypothesis Fusion for Speech Summarization
Attention-based Multi-hypothesis Fusion for Speech Summarization
Takatomo Kano
A. Ogawa
Marc Delcroix
Shinji Watanabe
22
13
0
16 Nov 2021
The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World
  Soundtracks
The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks
Darius Petermann
G. Wichern
Zhong-Qiu Wang
Jonathan Le Roux
21
37
0
19 Oct 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal
  Attention
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention
Katsuyuki Nakamura
Hiroki Ohashi
Mitsuhiro Okada
EgoV
31
12
0
07 Sep 2021
Optimizing Latency for Online Video CaptioningUsing Audio-Visual
  Transformers
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Chiori Hori
Takaaki Hori
Jonathan Le Roux
17
4
0
04 Aug 2021
TMT: A Transformer-based Modal Translator for Improving Multimodal
  Sequence Representations in Audio Visual Scene-aware Dialog
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog
Wubo Li
Dongwei Jiang
Wei Zou
Xiangang Li
23
6
0
21 Oct 2020
Exploiting Multi-Modal Features From Pre-trained Networks for
  Alzheimer's Dementia Recognition
Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition
Junghyun Koo
Jie Hwan Lee
Jaewoo Pyo
Yujin Jo
Kyogu Lee
16
58
0
09 Sep 2020
Learning Modality Interaction for Temporal Sentence Localization and
  Event Captioning in Videos
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
Shaoxiang Chen
Wenhao Jiang
Wei Liu
Yu-Gang Jiang
23
101
0
28 Jul 2020
SBAT: Video Captioning with Sparse Boundary-Aware Transformer
SBAT: Video Captioning with Sparse Boundary-Aware Transformer
Tao Jin
Siyu Huang
Ming Chen
Yingming Li
Zhongfei Zhang
30
52
0
23 Jul 2020
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal
  Shuffled Transformers
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers
Shijie Geng
Peng Gao
Moitreya Chatterjee
Chiori Hori
Jonathan Le Roux
Yongfeng Zhang
Hongsheng Li
A. Cherian
19
11
0
08 Jul 2020
Multi-modal Automated Speech Scoring using Attention Fusion
Multi-modal Automated Speech Scoring using Attention Fusion
Manraj Singh Grover
Yaman Kumar Singla
Sumit Sarin
Payman Vafaee
Mika Hama
R. Shah
16
11
0
17 May 2020
How to Teach DNNs to Pay Attention to the Visual Modality in Speech
  Recognition
How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
George Sterpu
Christian Saam
N. Harte
34
28
0
17 Apr 2020
Spatio-Temporal Ranked-Attention Networks for Video Captioning
Spatio-Temporal Ranked-Attention Networks for Video Captioning
A. Cherian
Jue Wang
Chiori Hori
Tim K. Marks
AI4TS
20
19
0
17 Jan 2020
Delving Deeper into the Decoder for Video Captioning
Delving Deeper into the Decoder for Video Captioning
Haoran Chen
Jianmin Li
Xiaolin Hu
32
34
0
16 Jan 2020
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event
  Captioning
Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Tanzila Rahman
Bicheng Xu
Leonid Sigal
25
77
0
22 Sep 2019
Selective Sensor Fusion for Neural Visual-Inertial Odometry
Selective Sensor Fusion for Neural Visual-Inertial Odometry
Changhao Chen
Stefano Rosa
Yishu Miao
Chris Xiaoxuan Lu
Wei Yu Wu
Andrew Markham
A. Trigoni
14
132
0
04 Mar 2019
Weakly Supervised Dense Event Captioning in Videos
Weakly Supervised Dense Event Captioning in Videos
Xuguang Duan
Wen-bing Huang
Chuang Gan
Jingdong Wang
Wenwu Zhu
Junzhou Huang
25
148
0
10 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
25
20
0
07 Dec 2018
Stream attention-based multi-array end-to-end speech recognition
Stream attention-based multi-array end-to-end speech recognition
Xiaofei Wang
Ruizhi Li
Sri Harish Reddy Mallidi
Takaaki Hori
Shinji Watanabe
H. Hermansky
17
21
0
12 Nov 2018
PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for
  3D Shape Recognition
PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition
Haoxuan You
Yifan Feng
R. Ji
Yue Gao
3DPC
34
169
0
23 Aug 2018
End-to-End Audio Visual Scene-Aware Dialog using Multimodal
  Attention-Based Video Features
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features
Chiori Hori
Huda AlAmri
Jue Wang
G. Wichern
Takaaki Hori
...
Raphael Gontijo-Lopes
Abhishek Das
Irfan Essa
Dhruv Batra
Devi Parikh
VGen
16
125
0
21 Jun 2018
Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7
Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7
Huda AlAmri
Vincent Cartillier
Raphael Gontijo-Lopes
Abhishek Das
Jue Wang
...
Dhruv Batra
Devi Parikh
A. Cherian
Tim K. Marks
Chiori Hori
17
32
0
01 Jun 2018
ECO: Efficient Convolutional Network for Online Video Understanding
ECO: Efficient Convolutional Network for Online Video Understanding
Mohammadreza Zolfaghari
Kamaljeet Singh
Thomas Brox
130
496
0
24 Apr 2018
SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos
SoccerNet: A Scalable Dataset for Action Spotting in Soccer Videos
Silvio Giancola
Mohieddine Amine
Tarek Dghaily
Bernard Ghanem
AI4TS
19
193
0
12 Apr 2018
1