Encoding and Controlling Global Semantics for Long-form Video Question
Answering

Encoding and Controlling Global Semantics for Long-form Video Question Answering

30 May 2024

Zhiyuan Hu

Cong-Duy Nguyen

See-Kiong Ng

Papers citing "Encoding and Controlling Global Semantics for Long-form Video Question Answering"

3 / 3 papers shown

Title
TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems Khang H. N. Vo D. Q. Nguyen T. Nguyen Tho Quan 45 0 0 09 Mar 2025
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,018 0 13 Oct 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Chao Jia Yinfei Yang Ye Xia Yi-Ting Chen Zarana Parekh Hieu H. Pham Quoc V. Le Yun-hsuan Sung Zhen Li Tom Duerig VLM CLIP 298 3,693 0 11 Feb 2021