Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.06794
Cited By
PaLI: A Jointly-Scaled Multilingual Language-Image Model
14 September 2022
Xi Chen
Xiao Wang
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
Daniel M. Salz
Sebastian Goodman
Adam Grycner
Basil Mustafa
Lucas Beyer
Alexander Kolesnikov
J. Puigcerver
Nan Ding
Keran Rong
Hassan Akbari
Gaurav Mishra
Linting Xue
Ashish V. Thapliyal
James Bradbury
Weicheng Kuo
Mojtaba Seyedhosseini
Chao Jia
Burcu Karagol Ayan
C. Riquelme
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PaLI: A Jointly-Scaled Multilingual Language-Image Model"
21 / 121 papers shown
Title
IC3: Image Captioning by Committee Consensus
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
26
17
0
02 Feb 2023
Advancing Radiograph Representation Learning with Masked Record Modeling
Hong-Yu Zhou
Chenyu Lian
Lian-cheng Wang
Yizhou Yu
MedIm
36
55
0
30 Jan 2023
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
Xinsong Zhang
Yan Zeng
Jipeng Zhang
Hang Li
VLM
AI4CE
LRM
14
17
0
12 Jan 2023
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Ziniu Hu
Ahmet Iscen
Chen Sun
Zirui Wang
Kai-Wei Chang
Yizhou Sun
Cordelia Schmid
David A. Ross
Alireza Fathi
RALM
VLM
38
88
0
10 Dec 2022
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
26
46
0
09 Dec 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
40
101
0
15 Nov 2022
Underspecification in Scene Description-to-Depiction Tasks
Ben Hutchinson
Jason Baldridge
Vinodkumar Prabhakaran
DiffM
66
32
0
11 Oct 2022
A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Aishwarya Kamath
Peter Anderson
Su Wang
Jing Yu Koh
Alexander Ku
Austin Waters
Yinfei Yang
Jason Baldridge
Zarana Parekh
LM&Ro
20
45
0
06 Oct 2022
PreSTU: Pre-Training for Scene-Text Understanding
Jihyung Kil
Soravit Changpinyo
Xi Chen
Hexiang Hu
Sebastian Goodman
Wei-Lun Chao
Radu Soricut
VLM
135
29
0
12 Sep 2022
Pre-training image-language transformers for open-vocabulary tasks
A. Piergiovanni
Weicheng Kuo
A. Angelova
VLM
ViT
34
8
0
09 Sep 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
51
392
0
17 Jun 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
50
525
0
13 Jun 2022
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
78
72
0
25 May 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
26
157
0
30 Apr 2022
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
235
344
0
22 Sep 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
268
2,603
0
04 May 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,081
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
253
525
0
04 Feb 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
228
4,460
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,817
0
17 Sep 2019
Previous
1
2
3