Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.07236
Cited By
Effective End-to-End Vision Language Pretraining with Semantic Visual Loss
18 January 2023
Xiaofeng Yang
Fayao Liu
Guosheng Lin
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Effective End-to-End Vision Language Pretraining with Semantic Visual Loss"
21 / 21 papers shown
Title
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Xiaoxu Xu
Yitian Yuan
Qiudan Zhang
Wen-Bin Wu
Zequn Jie
Lin Ma
Xu Wang
95
4
0
15 Dec 2023
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Wonjae Kim
Bokyung Son
Ildoo Kim
VLM
CLIP
112
1,741
0
05 Feb 2021
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning
Beliz Gunel
Jingfei Du
Alexis Conneau
Ves Stoyanov
60
505
0
03 Nov 2020
Rethinking Pre-training and Self-training
Barret Zoph
Golnaz Ghiasi
Nayeon Lee
Huayu Chen
Hanxiao Liu
E. D. Cubuk
Quoc V. Le
SSeg
85
651
0
11 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
195
1,700
0
08 Jun 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
372
13,025
0
26 May 2020
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data
Di Qi
Lin Su
Jianwei Song
Edward Cui
Taroon Bharti
Arun Sacheti
VLM
76
261
0
22 Jan 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
218
7,498
0
02 Oct 2019
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
217
3,485
0
30 Sep 2019
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
101
447
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
94
1,860
0
23 Sep 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
231
2,477
0
20 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
219
3,674
0
06 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
582
24,422
0
26 Jul 2019
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
153
879
0
27 Nov 2018
Discriminability objective for training descriptive captions
Ruotian Luo
Brian L. Price
Scott D. Cohen
Gregory Shakhnarovich
100
203
0
12 Mar 2018
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
117
4,214
0
25 Jul 2017
COCO-Stuff: Thing and Stuff Classes in Context
Holger Caesar
J. Uijlings
V. Ferrari
121
1,385
0
12 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
324
3,235
0
02 Dec 2016
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Julia Hockenmaier
Svetlana Lazebnik
193
2,053
0
19 May 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
205
2,475
0
01 Apr 2015
1