Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.07841
Cited By
Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
12 June 2024
Elaheh Baharlouei
Mahsa Shafaei
Yigeng Zhang
Hugo Jair Escalante
Thamar Solorio
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model"
5 / 5 papers shown
Title
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,137
0
28 Jan 2022
From None to Severe: Predicting Severity in Movie Scripts
Yigeng Zhang
Mahsa Shafaei
Fabio Gonzalez
Thamar Solorio
56
5
0
20 Sep 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Huayu Chen
Boqing Gong
ViT
248
577
0
22 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,781
0
24 Feb 2021
Self-supervised Co-training for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
215
309
0
19 Oct 2020
1