Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.00163
Cited By
Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog
1 February 2020
Zekang Li
Zongjia Li
Jinchao Zhang
Yang Feng
Cheng Niu
Jie Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog"
10 / 10 papers shown
Title
Multimodal Transformer for Parallel Concatenated Variational Autoencoders
Stephen D. Liang
J. Mendel
ViT
27
5
0
28 Oct 2022
Learning to Retrieve Videos by Asking Questions
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
32
16
0
11 May 2022
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
35
27
0
08 Mar 2022
Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering
Donggeon Lee
Seongho Choi
Youwon Jang
Byoung-Tak Zhang
16
2
0
11 Aug 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
26
19
0
16 Apr 2021
Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
36
14
0
01 Mar 2021
WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track
Zekang Li
Zongjia Li
Jinchao Zhang
Yang Feng
Jie Zhou
24
1
0
20 Jan 2021
DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue
Hung Le
Chinnadhurai Sankar
Seungwhan Moon
Ahmad Beirami
A. Geramifard
Satwik Kottur
VGen
31
18
0
01 Jan 2021
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
21
66
0
10 Dec 2020
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
1