Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.13782
Cited By
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks
23 May 2023
Sherzod Hakimov
David Schlangen
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks"
24 / 24 papers shown
Title
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
372
2,377
0
09 Nov 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
167
3,116
0
20 Oct 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
344
3,532
0
29 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
137
582
0
01 Apr 2022
How Pre-trained Language Models Capture Factual Knowledge? A Causal-Inspired Analysis
Shaobo Li
Xiaoguang Li
Lifeng Shang
Zhenhua Dong
Chengjie Sun
Bingquan Liu
Zhenzhou Ji
Xin Jiang
Qun Liu
KELM
79
54
0
31 Mar 2022
Conditional Prompt Learning for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VLM
CLIP
VPVLM
98
1,348
0
10 Mar 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
137
873
0
07 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
501
4,340
0
28 Jan 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
747
9,330
0
28 Jan 2022
KAT: A Knowledge Augmented Transformer for Vision-and-Language
Liangke Gui
Borui Wang
Qiuyuan Huang
Alexander G. Hauptmann
Yonatan Bisk
Jianfeng Gao
60
157
0
16 Dec 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
337
1,700
0
15 Oct 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
239
419
0
10 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
120
3,742
0
03 Sep 2021
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
149
777
0
25 Jun 2021
Multi-Modal Answer Validation for Knowledge-Based VQA
Jialin Wu
Jiasen Lu
Ashish Sabharwal
Roozbeh Mottaghi
122
144
0
23 Mar 2021
Visual Transformers: Token-based Image Representation and Processing for Computer Vision
Bichen Wu
Chenfeng Xu
Xiaoliang Dai
Alvin Wan
Peizhao Zhang
Zhicheng Yan
Masayoshi Tomizuka
Joseph E. Gonzalez
Kurt Keutzer
Peter Vajda
ViT
98
558
0
05 Jun 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
365
13,025
0
26 May 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
87
604
0
10 May 2020
oLMpics -- On what Language Model Pre-training Captures
Alon Talmor
Yanai Elazar
Yoav Goldberg
Jonathan Berant
LRM
96
304
0
31 Dec 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.2K
12,181
0
27 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
130
1,950
0
09 Aug 2019
A Corpus for Reasoning About Natural Language Grounded in Photographs
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
96
603
0
01 Nov 2018
Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks
Kaipeng Zhang
Zhanpeng Zhang
Zhifeng Li
Yu Qiao
CVBM
170
4,960
0
11 Apr 2016
Challenges in Representation Learning: A report on three machine learning contests
Ian Goodfellow
D. Erhan
P. Carrier
Aaron Courville
M. Berk Mirza
...
Jingjing Xie
Lukasz Romaszko
Bing Xu
Chuang Zhang
Yoshua Bengio
CVBM
126
1,613
0
01 Jul 2013
1