ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.01917
  4. Cited By
CoCa: Contrastive Captioners are Image-Text Foundation Models
v1v2 (latest)

CoCa: Contrastive Captioners are Image-Text Foundation Models

4 May 2022
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
    VLMCLIPOffRL
ArXiv (abs)PDFHTML

Papers citing "CoCa: Contrastive Captioners are Image-Text Foundation Models"

50 / 935 papers shown
Title
A simple, efficient and scalable contrastive masked autoencoder for
  learning visual representations
A simple, efficient and scalable contrastive masked autoencoder for learning visual representations
Shlok Kumar Mishra
Joshua Robinson
Huiwen Chang
David Jacobs
Aaron Sarna
Aaron Maschinot
Dilip Krishnan
DiffM
114
31
0
30 Oct 2022
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified
  Retrieval and Captioning
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning
Suvir Mirchandani
Licheng Yu
Mengjiao MJ Wang
Animesh Sinha
Wen-Jun Jiang
Tao Xiang
Ning Zhang
81
16
0
26 Oct 2022
A Case for Business Process-Specific Foundation Models
A Case for Business Process-Specific Foundation Models
Sadhana Kumaravel
Praveen Venkateswaran
Vatche Isahagian
Vinod Muthusamy
AI4CE
68
9
0
26 Oct 2022
The Curious Case of Benign Memorization
The Curious Case of Benign Memorization
Sotiris Anagnostidis
Gregor Bachmann
Lorenzo Noci
Thomas Hofmann
AAML
132
10
0
25 Oct 2022
Global Contrastive Batch Sampling via Optimization on Sample
  Permutations
Global Contrastive Batch Sampling via Optimization on Sample Permutations
Vin Sachidananda
Ziyi Yang
Chenguang Zhu
64
6
0
23 Oct 2022
CPL: Counterfactual Prompt Learning for Vision and Language Models
CPL: Counterfactual Prompt Learning for Vision and Language Models
Xuehai He
Diji Yang
Weixi Feng
Tsu-Jui Fu
Arjun Reddy Akula
Varun Jampani
P. Narayana
Sugato Basu
William Yang Wang
Xinze Wang
VPVLMVLM
94
15
0
19 Oct 2022
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
Zifeng Wang
Zhenbang Wu
Dinesh Agarwal
Jimeng Sun
CLIPVLMMedIm
129
434
0
18 Oct 2022
Perceptual Grouping in Contrastive Vision-Language Models
Perceptual Grouping in Contrastive Vision-Language Models
Kanchana Ranasinghe
Brandon McKinzie
S. S. Ravi
Yinfei Yang
Alexander Toshev
Jonathon Shlens
VLM
131
55
0
18 Oct 2022
Non-Contrastive Learning Meets Language-Image Pre-Training
Non-Contrastive Learning Meets Language-Image Pre-Training
Jinghao Zhou
Li Dong
Zhe Gan
Lijuan Wang
Furu Wei
VLMCLIP
75
26
0
17 Oct 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
231
3,520
0
16 Oct 2022
CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
Denis Kuznedelev
Eldar Kurtic
Elias Frantar
Dan Alistarh
VLMViT
74
13
0
14 Oct 2022
Caption supervision enables robust learners
Caption supervision enables robust learners
Ben Feuer
Ameya Joshi
Chinmay Hegde
SSLCLIPVLM
67
2
0
13 Oct 2022
Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion
  Image Manipulation
Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion Image Manipulation
Chaerin Kong
D. Jeon
Oh-Hun Kwon
Nojun Kwak
DiffM
77
17
0
12 Oct 2022
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained
  Models
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
Omiros Pantazis
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
VLM
82
42
0
07 Oct 2022
SynBench: Task-Agnostic Benchmarking of Pretrained Representations using
  Synthetic Data
SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data
Ching-Yun Ko
Pin-Yu Chen
Jeet Mohapatra
Payel Das
Lucani E. Daniel
111
3
0
06 Oct 2022
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question
  Answering over Images and Text
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
Wenhu Chen
Hexiang Hu
Xi Chen
Pat Verga
William W. Cohen
RALM
100
160
0
06 Oct 2022
A Closer Look at Robustness to L-infinity and Spatial Perturbations and
  their Composition
A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition
Luke Rowe
Benjamin Thérien
Krzysztof Czarnecki
Hongyang R. Zhang
OOD
56
0
0
05 Oct 2022
Progressive Text-to-Image Generation
Progressive Text-to-Image Generation
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
153
4
0
05 Oct 2022
ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training
ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training
Antonio Norelli
Marco Fumero
Valentino Maiorca
Luca Moschella
Emanuele Rodolà
Francesco Locatello
VLM
162
36
0
04 Oct 2022
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of
  Vision & Language Models
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models
Adrian Bulat
Georgios Tzimiropoulos
VLMVPVLM
62
51
0
03 Oct 2022
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Bruce X. B. Yu
Jianlong Chang
Lin Liu
Qi Tian
Changan Chen
VPVLMVLM
115
36
0
03 Oct 2022
Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple
  Tasks
Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks
Zhenhailong Wang
Xiaoman Pan
Dian Yu
Dong Yu
Jianshu Chen
Heng Ji
VLM
109
10
0
01 Oct 2022
Medical Image Understanding with Pretrained Vision Language Models: A
  Comprehensive Study
Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study
Ziyuan Qin
Huahui Yi
Qicheng Lao
Kang Li
VLM
103
71
0
30 Sep 2022
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text
  Pre-training
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
Bin Shan
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
VLM
75
19
0
30 Sep 2022
Physical Adversarial Attack meets Computer Vision: A Decade Survey
Physical Adversarial Attack meets Computer Vision: A Decade Survey
Hui Wei
Hao Tang
Xuemei Jia
Zhixiang Wang
Han-Bing Yu
Zhubo Li
Shiníchi Satoh
Luc Van Gool
Zheng Wang
AAML
144
56
0
30 Sep 2022
REST: REtrieve & Self-Train for generative action recognition
REST: REtrieve & Self-Train for generative action recognition
Adrian Bulat
Enrique Sanchez
Brais Martínez
Georgios Tzimiropoulos
VLM
54
4
0
29 Sep 2022
Spotlight: Mobile UI Understanding using Vision-Language Models with a
  Focus
Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus
Gang Li
Yang Li
99
70
0
29 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLMVLM
136
153
0
15 Sep 2022
Neural Networks Reduction via Lumping
Neural Networks Reduction via Lumping
Dalila Ressi
Riccardo Romanello
S. Rossi
Carla Piazza
80
4
0
15 Sep 2022
Correlation Information Bottleneck: Towards Adapting Pretrained
  Multimodal Models for Robust Visual Question Answering
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering
Jingjing Jiang
Zi-yi Liu
Nanning Zheng
87
8
0
14 Sep 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLMVLM
202
741
0
14 Sep 2022
Vision Transformers for Action Recognition: A Survey
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Mian
ViT
82
45
0
13 Sep 2022
FETA: Towards Specializing Foundation Models for Expert Task
  Applications
FETA: Towards Specializing Foundation Models for Expert Task Applications
Amit Alfassy
Assaf Arbelle
Oshri Halimi
Sivan Harary
Roei Herzig
...
Christoph Auer
Kate Saenko
Peter W. J. Staar
Rogerio Feris
Leonid Karlinsky
90
20
0
08 Sep 2022
What does a platypus look like? Generating customized prompts for
  zero-shot image classification
What does a platypus look like? Generating customized prompts for zero-shot image classification
Sarah M Pratt
Ian Covert
Rosanne Liu
Ali Farhadi
VLM
189
224
0
07 Sep 2022
Statistical Foundation Behind Machine Learning and Its Impact on
  Computer Vision
Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision
Lei Zhang
H. Shum
VLMSSL
65
2
0
06 Sep 2022
Language-aware Domain Generalization Network for Cross-Scene
  Hyperspectral Image Classification
Language-aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification
Yuxiang Zhang
Mengmeng Zhang
Wei Li
Shuai Wang
Ran Tao
VLM
115
118
0
06 Sep 2022
Design of the topology for contrastive visual-textual alignment
Design of the topology for contrastive visual-textual alignment
Zhun Sun
93
1
0
05 Sep 2022
Generalization in Neural Networks: A Broad Survey
Generalization in Neural Networks: A Broad Survey
Chris Rohlfs
OODAI4CE
67
7
0
04 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Tengjiao Wang
Ming-Hsuan Yang
DiffMMedIm
485
1,420
0
02 Sep 2022
Topic Detection in Continuous Sign Language Videos
Topic Detection in Continuous Sign Language Videos
Álvaro Budria
Laia Tarrés
Gerard I. Gállego
Francesc Moreno-Noguer
Jordi Torres
Xavier Giró-i-Nieto
SLRVLM
91
1
0
01 Sep 2022
Efficient Vision-Language Pretraining with Visual Concepts and
  Hierarchical Alignment
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLMCLIP
100
27
0
29 Aug 2022
Overparameterization from Computational Constraints
Overparameterization from Computational Constraints
Sanjam Garg
S. Jha
Saeed Mahloujifar
Mohammad Mahmoody
Mingyuan Wang
47
2
0
27 Aug 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and
  Vision-Language Tasks
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLMVLMViT
157
645
0
22 Aug 2022
Improved Image Classification with Token Fusion
Improved Image Classification with Token Fusion
Keong-Hun Choi
Jin-Woo Kim
Yaolong Wang
J. Ha
ViT
42
0
0
19 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation
MILAN: Masked Image Pretraining on Language Assisted Representation
Zejiang Hou
Fei Sun
Yen-kuang Chen
Yuan Xie
S. Kung
ViT
121
70
0
11 Aug 2022
Patching open-vocabulary models by interpolating weights
Patching open-vocabulary models by interpolating weights
Gabriel Ilharco
Mitchell Wortsman
S. Gadre
Shuran Song
Hannaneh Hajishirzi
Simon Kornblith
Ali Farhadi
Ludwig Schmidt
VLMKELM
130
176
0
10 Aug 2022
Self-supervised Multi-modal Training from Uncurated Image and Reports
  Enables Zero-shot Oversight Artificial Intelligence in Radiology
Self-supervised Multi-modal Training from Uncurated Image and Reports Enables Zero-shot Oversight Artificial Intelligence in Radiology
Sangjoon Park
Eunha Lee
Kyung Sook Shin
Jeonghyeon Lee
Jong Chul Ye
53
2
0
10 Aug 2022
Advancing Plain Vision Transformer Towards Remote Sensing Foundation
  Model
Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
Di Wang
Qiming Zhang
Yufei Xu
Jing Zhang
Bo Du
Dacheng Tao
Lefei Zhang
84
257
0
08 Aug 2022
Prompt Tuning for Generative Multimodal Pretrained Models
Prompt Tuning for Generative Multimodal Pretrained Models
Han Yang
Junyang Lin
An Yang
Peng Wang
Chang Zhou
Hongxia Yang
VLMLRMVPVLM
86
31
0
04 Aug 2022
Masked Vision and Language Modeling for Multi-modal Representation
  Learning
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
88
68
0
03 Aug 2022
Previous
123...171819
Next