Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model

12 June 2024

Papers citing "Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model"

5 / 5 papers shown

Title
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 392 4,137 0 28 Jan 2022
From None to Severe: Predicting Severity in Movie Scripts Yigeng Zhang Mahsa Shafaei Fabio Gonzalez Thamar Solorio 56 5 0 20 Sep 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Hassan Akbari Liangzhe Yuan Rui Qian Wei-Hong Chuang Shih-Fu Chang Huayu Chen Boqing Gong ViT 248 577 0 22 Apr 2021
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 255 4,781 0 24 Feb 2021
Self-supervised Co-training for Video Representation Learning Tengda Han Weidi Xie Andrew Zisserman SSL 215 309 0 19 Oct 2020