Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.13884
Cited By
A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision
27 December 2021
Ajinkya Tejankar
Maziar Sanjabi
Bichen Wu
Saining Xie
Madian Khabsa
Hamed Pirsiavash
Hamed Firooz
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision"
18 / 18 papers shown
Title
FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models
Adrian Bulat
Yassine Ouali
Georgios Tzimiropoulos
VLM
40
4
0
16 May 2024
RankCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang
Zhuokai Zhao
Zhaorun Chen
Zhili Feng
Zenghui Ding
Yining Sun
SSL
VLM
48
7
0
15 Apr 2024
Differentially Private Representation Learning via Image Captioning
Tom Sander
Yaodong Yu
Maziar Sanjabi
Alain Durmus
Yi-An Ma
Kamalika Chaudhuri
Chuan Guo
58
3
0
04 Mar 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
33
72
0
12 Feb 2024
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Fan Liu
Tianshu Zhang
Wenwen Dai
Wenwen Cai
Wenwen Cai Xiaocong Zhou
Delong Chen
VLM
OffRL
31
22
0
03 Jan 2024
Revisiting the Role of Language Priors in Vision-Language Models
Zhiqiu Lin
Xinyue Chen
Deepak Pathak
Pengchuan Zhang
Deva Ramanan
VLM
23
22
0
02 Jun 2023
Improved baselines for vision-language pre-training
Enrico Fini
Pietro Astolfi
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
SSL
CLIP
VLM
45
22
0
15 May 2023
Semantic Composition in Visually Grounded Language Models
Rohan Pandey
CoGe
16
1
0
15 May 2023
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
Hritik Bansal
Nishad Singhi
Yu Yang
Fan Yin
Aditya Grover
Kai-Wei Chang
AAML
34
42
0
06 Mar 2023
Aligning Bag of Regions for Open-Vocabulary Object Detection
Size Wu
Wenwei Zhang
Sheng Jin
Wentao Liu
Chen Change Loy
VLM
ObjD
44
108
0
27 Feb 2023
ContextCLIP: Contextual Alignment of Image-Text pairs on CLIP visual representations
Chanda Grover
Indra Deep Mastan
Debayan Gupta
VLM
CLIP
11
4
0
14 Nov 2022
When and why vision-language models behave like bags-of-words, and what to do about it?
Mert Yuksekgonul
Federico Bianchi
Pratyusha Kalluri
Dan Jurafsky
James Y. Zou
VLM
CoGe
28
362
0
04 Oct 2022
IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training
Xinyu Huang
Youcai Zhang
Ying Cheng
Weiwei Tian
Ruiwei Zhao
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
X. Zhang
VLM
21
14
0
12 Jul 2022
CyCLIP: Cyclic Contrastive Language-Image Pretraining
Shashank Goel
Hritik Bansal
S. Bhatia
Ryan A. Rossi
Vishwa Vinay
Aditya Grover
CLIP
VLM
173
132
0
28 May 2022
EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling
Jue Wang
Haofan Wang
Jincan Deng
Weijia Wu
Debing Zhang
VLM
CLIP
64
18
0
10 Sep 2021
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations
Debidatta Dwibedi
Y. Aytar
Jonathan Tompson
P. Sermanet
Andrew Zisserman
SSL
188
452
0
29 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
UnNatural Language Inference
Koustuv Sinha
Prasanna Parthasarathi
Joelle Pineau
Adina Williams
216
94
0
30 Dec 2020
1