Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.01166
Cited By
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems
2 July 2019
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems"
34 / 34 papers shown
Title
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han
Kaixiong Gong
Yiyuan Zhang
Jiaqi Wang
Kaipeng Zhang
Dahua Lin
Yu Qiao
Peng Gao
Xiangyu Yue
MLLM
106
111
0
10 Jan 2025
Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles
Abhijnan Nath
Huma Jamil
Shafiuddin Rehan Ahmed
George Baker
Rahul Ghosh
James H. Martin
Nathaniel Blanchard
Nikhil Krishnaswamy
40
2
0
13 Apr 2024
OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog
Adnen Abdessaied
Manuel von Hochmeister
Andreas Bulling
40
2
0
20 Feb 2024
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Sunjae Yoon
Dahyun Kim
Eunseop Yoon
Hee Suk Yoon
Junyeong Kim
C. Yoo
39
6
0
15 Dec 2023
Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog
Haoyu Zhang
Meng Liu
Yaowei Wang
Da Cao
Weili Guan
Liqiang Nie
36
0
0
11 Oct 2023
A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System
Mauajama Firdaus
Avinash Madasu
Asif Ekbal
47
7
0
27 May 2023
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph
Yuxing Long
Binyuan Hui
Fulong Ye
Yanyang Li
Zhuoxin Han
Caixia Yuan
Yongbin Li
Xiaojie Wang
LLMAG
38
7
0
05 Jan 2023
Multimodal Transformer for Parallel Concatenated Variational Autoencoders
Stephen D. Liang
J. Mendel
ViT
32
5
0
28 Oct 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
32
2
0
02 Jul 2022
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
32
10
0
25 May 2022
Learning to Retrieve Videos by Asking Questions
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
32
16
0
11 May 2022
Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures
Yongji Wu
Matthew Lentz
Danyang Zhuo
Yao Lu
34
22
0
10 May 2022
Speech-to-SQL: Towards Speech-driven SQL Query Generation From Natural Language Question
Yuanfeng Song
Raymond Chi-Wing Wong
Xuefang Zhao
Di Jiang
39
13
0
04 Jan 2022
Multimodal Interactions Using Pretrained Unimodal Models for SIMMC 2.0
Joosung Lee
Kijong Han
45
6
0
10 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
40
690
0
08 Dec 2021
Interventional Video Grounding with Dual Contrastive Learning
Guoshun Nan
Rui Qiao
Yao Xiao
Jun Liu
Sicong Leng
H. Zhang
Wei Lu
26
144
0
21 Jun 2021
Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey
Jinjie Ni
Tom Young
Vlad Pandelea
Fuzhao Xue
Min Zhang
54
268
0
10 May 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations
Satwik Kottur
Seungwhan Moon
A. Geramifard
Babak Damavandi
20
88
0
18 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
28
19
0
16 Apr 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang D. Yoo
26
26
0
24 Mar 2021
Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
36
14
0
01 Mar 2021
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
Rafal Powalski
Łukasz Borchmann
Dawid Jurkiewicz
Tomasz Dwojak
Michal Pietruszka
Gabriela Pałka
ViT
36
157
0
18 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,989
0
09 Feb 2021
DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue
Hung Le
Chinnadhurai Sankar
Seungwhan Moon
Ahmad Beirami
A. Geramifard
Satwik Kottur
VGen
36
18
0
01 Jan 2021
Cross Copy Network for Dialogue Generation
Changzhen Ji
Xiaoxia Zhou
Yating Zhang
Xiaozhong Liu
Changlong Sun
Conghui Zhu
Tiejun Zhao
27
11
0
22 Oct 2020
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog
Wubo Li
Dongwei Jiang
Wei Zou
Xiangang Li
23
6
0
21 Oct 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
50
30
0
20 Oct 2020
Multi-modal Attention for Speech Emotion Recognition
Zexu Pan
Zhaojie Luo
Jichen Yang
Haizhou Li
CVBM
30
71
0
09 Sep 2020
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers
Shijie Geng
Peng Gao
Moitreya Chatterjee
Chiori Hori
Jonathan Le Roux
Yongfeng Zhang
Hongsheng Li
A. Cherian
27
11
0
08 Jul 2020
Video-Grounded Dialogues with Pretrained Generation Language Models
Hung Le
Guosheng Lin
34
28
0
27 Jun 2020
KPQA: A Metric for Generative Question Answering Using Keyphrase Weights
Hwanhee Lee
Seunghyun Yoon
Franck Dernoncourt
Doo Soon Kim
Trung Bui
Joongbo Shin
Kyomin Jung
24
0
0
01 May 2020
Multiresolution and Multimodal Speech Recognition with Transformers
Georgios Paraskevopoulos
Srinivas Parthasarathy
Aparna Khare
Shiva Sundaram
25
29
0
29 Apr 2020
Span-based Localizing Network for Natural Language Video Localization
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
32
312
0
29 Apr 2020
1