Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.02218
Cited By
v1
v2 (latest)
Multi-Modal Video Dialog State Tracking in the Wild
2 July 2024
Adnen Abdessaied
Lei Shi
Andreas Bulling
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multi-Modal Video Dialog State Tracking in the Wild"
30 / 30 papers shown
Title
Emu: Generative Pretraining in Multimodality
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
92
138
0
11 Jul 2023
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts
Yunshui Li
Binyuan Hui
Zhichao Yin
Min Yang
Fei Huang
Yongbin Li
MoE
59
20
0
24 May 2023
End-to-End Multimodal Representation Learning for Video Dialog
Huda AlAmri
Anthony Bilic
Michael Hu
Apoorva Beedu
Irfan Essa
74
7
0
26 Oct 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
418
3,602
0
29 Apr 2022
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
Junbin Xiao
Xindi Shang
Angela Yao
Tat-Seng Chua
97
506
0
18 May 2021
GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph
Junhan Yang
Zheng Liu
Shitao Xiao
Chaozhuo Li
Defu Lian
Sanjay Agrawal
Amit Singh
Guangzhong Sun
Xing Xie
AI4CE
52
159
0
06 May 2021
SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations
Satwik Kottur
Seungwhan Moon
A. Geramifard
Babak Damavandi
75
92
0
18 Apr 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang D. Yoo
55
26
0
24 Mar 2021
Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
48
14
0
01 Mar 2021
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
100
31
0
20 Oct 2020
Video-Grounded Dialogues with Pretrained Generation Language Models
Hung Le
Guosheng Lin
72
28
0
27 Jun 2020
Hierarchical Conditional Relation Networks for Video Question Answering
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
79
260
0
25 Feb 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
541
42,591
0
03 Dec 2019
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning
Rohit Girdhar
Deva Ramanan
72
178
0
10 Oct 2019
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
63
111
0
02 Jul 2019
DAG-GNN: DAG Structure Learning with Graph Neural Networks
Yue Yu
Jie Chen
Tian Gao
Mo Yu
BDL
CML
GNN
82
489
0
22 Apr 2019
Audio-Visual Scene-Aware Dialog
Huda AlAmri
Vincent Cartillier
Abhishek Das
Jue Wang
A. Cherian
...
Tim K. Marks
Chiori Hori
Peter Anderson
Stefan Lee
Devi Parikh
VGen
54
194
0
25 Jan 2019
Predict then Propagate: Graph Neural Networks meet Personalized PageRank
Johannes Klicpera
Aleksandar Bojchevski
Stephan Günnemann
GNN
225
1,694
0
14 Oct 2018
An End-to-end Approach for Handling Unknown Slot Values in Dialogue State Tracking
Puyang Xu
Qi Hu
69
125
0
03 May 2018
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
151
2,151
0
14 Nov 2017
Graph Attention Networks
Petar Velickovic
Guillem Cucurull
Arantxa Casanova
Adriana Romero
Pietro Lio
Yoshua Bengio
GNN
481
20,233
0
30 Oct 2017
Inductive Representation Learning on Large Graphs
William L. Hamilton
Z. Ying
J. Leskovec
514
15,319
0
07 Jun 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
240
8,038
0
22 May 2017
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
149
1,002
0
26 Nov 2016
Semi-Supervised Classification with Graph Convolutional Networks
Thomas Kipf
Max Welling
GNN
SSL
662
29,156
0
09 Sep 2016
Neural Belief Tracker: Data-Driven Dialogue State Tracking
N. Mrksic
Diarmuid Ó Séaghdha
Tsung-Hsien Wen
Blaise Thomson
S. Young
100
484
0
12 Jun 2016
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
Gunnar Sigurdsson
Gül Varol
Xinyu Wang
Ali Farhadi
Ivan Laptev
Abhinav Gupta
VGen
111
1,246
0
06 Apr 2016
VQA: Visual Question Answering
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
CoGe
226
5,503
0
03 May 2015
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
297
4,508
0
20 Nov 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.7K
100,508
0
04 Sep 2014
1