ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data

22 January 2020

Papers citing "ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data"

4 / 54 papers shown

Title
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph Fei Yu Jiji Tang Weichong Yin Yu Sun Hao Tian Hua-Hong Wu Haifeng Wang 16 375 0 30 Jun 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers Zhicheng Huang Zhaoyang Zeng Bei Liu Dongmei Fu Jianlong Fu ViT 30 436 0 02 Apr 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA Luowei Zhou Hamid Palangi Lei Zhang Houdong Hu Jason J. Corso Jianfeng Gao MLLM VLM 252 927 0 24 Sep 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,746 0 26 Sep 2016