Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.05019
Cited By
v1
v2 (latest)
Learning to Answer Visual Questions from Web Videos
10 May 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to Answer Visual Questions from Web Videos"
24 / 24 papers shown
Title
ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation
Tony Montes
Fernando Lozano
84
0
0
21 May 2025
Hierarchical Banzhaf Interaction for General Video-Language Representation Learning
Peng Jin
Haoyang Li
Li Yuan
Shuicheng Yan
Jie Chen
155
2
0
31 Dec 2024
Foundation Models and Adaptive Feature Selection: A Synergistic Approach to Video Question Answering
Sai Bhargav Rongali
M. Cui
Ankit Jha
Neha Bhargava
Saurabh Prasad
Biplab Banerjee
129
0
0
12 Dec 2024
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Sagnik Majumder
Tushar Nagarajan
Ziad Al-Halah
Reina Pradhan
Kristen Grauman
80
0
0
13 Nov 2024
OMCAT: Omni Context Aware Transformer
Arushi Goel
Karan Sapra
Matthieu Le
Rafael Valle
Andrew Tao
Bryan Catanzaro
MLLM
VLM
84
1
0
15 Oct 2024
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
Xiao Wang
Jianlong Wu
Zijia Lin
Fuzheng Zhang
Di Zhang
Liqiang Nie
VGen
74
3
0
29 Sep 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Xiaowei Chi
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
149
7
0
30 Jul 2024
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
Xingrui Wang
Wufei Ma
Angtian Wang
Shuo Chen
Adam Kortylewski
Alan Yuille
112
6
0
02 Jun 2024
Step Differences in Instructional Video
Tushar Nagarajan
Lorenzo Torresani
VGen
110
5
0
24 Apr 2024
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
Haopeng Li
Andong Deng
Qiuhong Ke
Jun Liu
Hossein Rahmani
Yulan Guo
Mohammed Bennamoun
Chen Chen
188
17
0
03 Jan 2024
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Bipin Rajendran
Bashir M. Al-Hashimi
MLLM
VLM
95
3
0
27 Sep 2023
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
102
28
0
25 Sep 2023
Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models
Wei Han
Hui Chen
MingSung Kan
Soujanya Poria
101
1
0
09 Jul 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
82
9
0
23 May 2023
TG-VQA: Ternary Game of Video Question Answering
Hao Li
Peng Jin
Ze-Long Cheng
Songyang Zhang
Kai-xiang Chen
Zhennan Wang
Chang-rui Liu
Jie Chen
90
10
0
17 May 2023
Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
Hung-Ting Su
Yulei Niu
Xudong Lin
Winston H. Hsu
Shih-Fu Chang
VGen
ELM
112
6
0
07 Apr 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
183
242
0
27 Feb 2023
Contrastive Video Question Answering via Video Graph Transformer
Junbin Xiao
Pan Zhou
Angela Yao
Yicong Li
Richang Hong
Shuicheng Yan
Tat-Seng Chua
ViT
115
37
0
27 Feb 2023
Connecting Vision and Language with Video Localized Narratives
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
143
23
0
22 Feb 2023
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
99
2
0
08 Oct 2022
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval
Xudong Lin
Simran Tiwari
Shiyuan Huang
Manling Li
Mike Zheng Shou
Heng Ji
Shih-Fu Chang
138
21
0
05 Jun 2022
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video
Bin Li
Yixuan Weng
Bin Sun
Shutao Li
165
33
0
13 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
128
93
0
02 Mar 2022
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
Mathilde Caron
Ishan Misra
Julien Mairal
Priya Goyal
Piotr Bojanowski
Armand Joulin
OCL
SSL
358
4,115
0
17 Jun 2020
1