Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution

Abstract
Hateful meme detection is a new research area recently brought out that requires both visual, linguistic understanding of the meme and some background knowledge to performing well on the task. This technical report summarises the first place solution of the Hateful Meme Detection Challenge 2020, which extending state-of-the-art visual-linguistic transformers to tackle this problem. At the end of the report, we also point out the shortcomings and possible directions for improving the current methodology.
View on arXivComments on this paper