ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.00163
  4. Cited By
Bridging Text and Video: A Universal Multimodal Transformer for
  Video-Audio Scene-Aware Dialog

Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog

1 February 2020
Zekang Li
Zongjia Li
Jinchao Zhang
Yang Feng
Cheng Niu
Jie Zhou
ArXivPDFHTML

Papers citing "Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog"

10 / 10 papers shown
Title
Multimodal Transformer for Parallel Concatenated Variational
  Autoencoders
Multimodal Transformer for Parallel Concatenated Variational Autoencoders
Stephen D. Liang
J. Mendel
ViT
27
5
0
28 Oct 2022
Learning to Retrieve Videos by Asking Questions
Learning to Retrieve Videos by Asking Questions
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
32
16
0
11 May 2022
AssistQ: Affordance-centric Question-driven Task Completion for
  Egocentric Assistant
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
35
27
0
08 Mar 2022
Mounting Video Metadata on Transformer-based Language Model for
  Open-ended Video Question Answering
Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering
Donggeon Lee
Seongho Choi
Youwon Jang
Byoung-Tak Zhang
16
2
0
11 Aug 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language
  Tasks
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
26
19
0
16 Apr 2021
Learning Reasoning Paths over Semantic Graphs for Video-grounded
  Dialogues
Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
36
14
0
01 Mar 2021
WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation
  Track
WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track
Zekang Li
Zongjia Li
Jinchao Zhang
Yang Feng
Jie Zhou
24
1
0
20 Jan 2021
DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded
  Dialogue
DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue
Hung Le
Chinnadhurai Sankar
Seungwhan Moon
Ahmad Beirami
A. Geramifard
Satwik Kottur
VGen
31
18
0
01 Jan 2021
Look Before you Speak: Visually Contextualized Utterances
Look Before you Speak: Visually Contextualized Utterances
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
21
66
0
10 Dec 2020
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
1