Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.07921
Cited By
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
16 April 2021
Hung Le
Nancy F. Chen
S. Hoi
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks"
15 / 15 papers shown
Title
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yunhong Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
36
3
0
05 Aug 2024
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Penghao Zhao
Hailin Zhang
Qinhan Yu
Zhengren Wang
Yunteng Geng
Fangcheng Fu
Ling Yang
Wentao Zhang
Jie Jiang
Bin Cui
3DV
115
228
0
29 Feb 2024
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering
Yueqian Wang
Yuxuan Wang
Kai Chen
Dongyan Zhao
33
2
0
08 Jan 2024
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Sunjae Yoon
Dahyun Kim
Eunseop Yoon
Hee Suk Yoon
Junyeong Kim
C. Yoo
39
6
0
15 Dec 2023
Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog
Haoyu Zhang
Meng Liu
Yaowei Wang
Da Cao
Weili Guan
Liqiang Nie
36
0
0
11 Oct 2023
Dynamic MOdularized Reasoning for Compositional Structured Explanation Generation
Xiyan Fu
Anette Frank
LRM
23
1
0
14 Sep 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
46
1
0
04 Jun 2023
Retrieving Multimodal Information for Augmented Generation: A Survey
Ruochen Zhao
Hailin Chen
Weishi Wang
Fangkai Jiao
Do Xuan Long
...
Bosheng Ding
Xiaobao Guo
Minzhi Li
Xingxuan Li
Shafiq R. Joty
28
80
0
20 Mar 2023
Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue
Sunjae Yoon
Eunseop Yoon
Hee Suk Yoon
Junyeong Kim
Changdong Yoo
19
19
0
12 Dec 2022
Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
Swarnadeep Saha
Shiyue Zhang
Peter Hase
Joey Tianyi Zhou
26
19
0
21 Sep 2022
LAVIS: A Library for Language-Vision Intelligence
Dongxu Li
Junnan Li
Hung Le
Guangsen Wang
Silvio Savarese
S. Hoi
VLM
126
51
0
15 Sep 2022
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
3DGS
36
38
0
20 Jan 2022
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
S. Hoi
40
30
0
20 Oct 2020
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
158
1,464
0
06 Jun 2016
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
255
13,364
0
25 Aug 2014
1