ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.08981
  4. Cited By
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
v1v2 (latest)

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

17 February 2021
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
    VLM
ArXiv (abs)PDFHTML

Papers citing "Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts"

50 / 871 papers shown
Title
What Matters in Training a GPT4-Style Language Model with Multimodal
  Inputs?
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
106
79
0
05 Jul 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image
  Understanding
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tongfei Sun
VLMMLLM
103
238
0
29 Jun 2023
CLIPAG: Towards Generator-Free Text-to-Image Generation
CLIPAG: Towards Generator-Free Text-to-Image Generation
Roy Ganz
Michael Elad
VLM
82
8
0
29 Jun 2023
Federated Generative Learning with Foundation Models
Federated Generative Learning with Foundation Models
Jie Zhang
Xiaohua Qi
Bo Zhao
FedML
116
22
0
28 Jun 2023
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy
  within a \$10,000 Budget; An Extra \$4,000 Unlocks 81.8% Accuracy
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \10,000 Budget; An Extra \4,000 Unlocks 81.8% Accuracy
Xianhang Li
Zeyu Wang
Cihang Xie
CLIPVLM
129
20
0
27 Jun 2023
Mitigating Hallucination in Large Multi-Modal Models via Robust
  Instruction Tuning
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Fuxiao Liu
Kevin Qinghong Lin
Linjie Li
Jianfeng Wang
Yaser Yacoob
Lijuan Wang
VLMMLLM
177
287
0
26 Jun 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLMLRM
138
612
0
23 Jun 2023
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic
  Compatible Adapter
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter
Binjie Zhang
Yixiao Ge
Xuyuan Xu
Ying Shan
Mike Zheng Shou
94
8
0
22 Jun 2023
DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D
  Generation
DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
Yukun Huang
Jianan Wang
Yukai Shi
Zhengjun Zha
Xianbiao Qi
Lei Zhang
102
64
0
21 Jun 2023
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text
  Documents
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Hugo Laurenccon
Lucile Saulnier
Léo Tronchon
Stas Bekman
Amanpreet Singh
...
Siddharth Karamcheti
Alexander M. Rush
Douwe Kiela
Matthieu Cord
Victor Sanh
161
246
0
21 Jun 2023
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large
  Vision-Language Model for Remote Sensing
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Yuxiang Cai
DiffMVLM
146
66
0
20 Jun 2023
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen
  Large Language Models
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Junting Pan
Ziyi Lin
Yuying Ge
Xiatian Zhu
Renrui Zhang
Yi Wang
Yu Qiao
Hongsheng Li
MLLM
97
27
0
15 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large
  Vision-Language Models
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELMMLLM
108
174
0
15 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
Qingbin Liu
VLMCLIP
83
9
0
15 Jun 2023
Safeguarding Data in Multimodal AI: A Differentially Private Approach to
  CLIP Training
Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training
Alyssa Huang
Peihan Liu
Ryumei Nakada
Linjun Zhang
Wanrong Zhang
VLM
141
6
0
13 Jun 2023
Image Captioners Are Scalable Vision Learners Too
Image Captioners Are Scalable Vision Learners Too
Michael Tschannen
Manoj Kumar
Andreas Steiner
Xiaohua Zhai
N. Houlsby
Lucas Beyer
VLMCLIP
112
60
0
13 Jun 2023
Scalable 3D Captioning with Pretrained Models
Scalable 3D Captioning with Pretrained Models
Tiange Luo
C. Rockwell
Honglak Lee
Justin Johnson
116
160
0
12 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
Retrieval-Enhanced Contrastive Vision-Text Models
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIPVLM
111
28
0
12 Jun 2023
Sticker820K: Empowering Interactive Retrieval with Stickers
Sticker820K: Empowering Interactive Retrieval with Stickers
Sijie Zhao
Yixiao Ge
Zhongang Qi
Lin Song
Xiaohan Ding
Zehua Xie
Ying Shan
62
8
0
12 Jun 2023
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Yue Liu
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
Cuiping Li
Ziwei Liu
MLLMVLM
105
240
0
08 Jun 2023
ScaleDet: A Scalable Multi-Dataset Object Detector
ScaleDet: A Scalable Multi-Dataset Object Detector
Yanbei Chen
Manchen Wang
Abhay Mittal
Zhenlin Xu
Paolo Favaro
Joseph Tighe
Davide Modolo
ObjD
51
22
0
08 Jun 2023
On the Generalization of Multi-modal Contrastive Learning
On the Generalization of Multi-modal Contrastive Learning
Qi Zhang
Yifei Wang
Yisen Wang
79
26
0
07 Jun 2023
Recognize Anything: A Strong Image Tagging Model
Recognize Anything: A Strong Image Tagging Model
Youcai Zhang
Xinyu Huang
Jinyu Ma
Zhaoyang Li
Zhaochuan Luo
...
Tong Luo
Yaqian Li
Siyi Liu
Yandong Guo
Lei Zhang
VLM
144
242
0
06 Jun 2023
Diversifying Joint Vision-Language Tokenization Learning
Diversifying Joint Vision-Language Tokenization Learning
Vardaan Pahuja
A. Piergiovanni
A. Angelova
71
0
0
06 Jun 2023
Composition and Deformance: Measuring Imageability with a Text-to-Image
  Model
Composition and Deformance: Measuring Imageability with a Text-to-Image Model
Si Wu
David A. Smith
EGVMCoGe
29
3
0
05 Jun 2023
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual
  Representation Learners
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
Yonglong Tian
Lijie Fan
Phillip Isola
Huiwen Chang
Dilip Krishnan
VLMDiffM
145
153
0
01 Jun 2023
Vocabulary-free Image Classification
Vocabulary-free Image Classification
Alessandro Conti
Enrico Fini
Massimiliano Mancini
Paolo Rota
Yiming Wang
Elisa Ricci
VLM
129
27
0
01 Jun 2023
Exploring Open-Vocabulary Semantic Segmentation without Human Labels
Exploring Open-Vocabulary Semantic Segmentation without Human Labels
Jun Chen
Deyao Zhu
Guocheng Qian
Guohao Li
Zhicheng Yan
Chenchen Zhu
Fanyi Xiao
Mohamed Elhoseiny
Sean Culatana
VLM
87
11
0
01 Jun 2023
Improving CLIP Training with Language Rewrites
Improving CLIP Training with Language Rewrites
Lijie Fan
Dilip Krishnan
Phillip Isola
Dina Katabi
Yonglong Tian
BDLVLMCLIP
119
177
0
31 May 2023
Too Large; Data Reduction for Vision-Language Pre-Training
Too Large; Data Reduction for Vision-Language Pre-Training
Alex Jinpeng Wang
Kevin Qinghong Lin
David Junhao Zhang
Stan Weixian Lei
Mike Zheng Shou
VLM
78
24
0
31 May 2023
Joint Adaptive Representations for Image-Language Learning
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
76
0
0
31 May 2023
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented
  Language Model Prompting
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting
R. Ramos
Bruno Martins
Desmond Elliott
VLM
62
16
0
31 May 2023
Improved Probabilistic Image-Text Representations
Improved Probabilistic Image-Text Representations
Sanghyuk Chun
VLM
116
31
0
29 May 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and
  Dataset
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
195
112
0
29 May 2023
Image Captioning with Multi-Context Synthetic Data
Image Captioning with Multi-Context Synthetic Data
Feipeng Ma
Y. Zhou
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
DiffM
69
8
0
29 May 2023
Conditional Score Guidance for Text-Driven Image-to-Image Translation
Conditional Score Guidance for Text-Driven Image-to-Image Translation
Hyunsoo Lee
Minsoo Kang
Bohyung Han
DiffM
49
15
0
29 May 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image
  Captions
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions
Noam Rotstein
David Bensaid
Shaked Brody
Roy Ganz
Ron Kimmel
VLM
81
31
0
28 May 2023
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge
  Interaction Graph for Lightweight Text-Image Retrieval
ConaCLIP: Exploring Distillation of Fully-Connected Knowledge Interaction Graph for Lightweight Text-Image Retrieval
Jiapeng Wang
Chengyu Wang
Xiaodan Wang
Jun Huang
Lianwen Jin
VLM
113
5
0
28 May 2023
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Jannik Kossen
Mark Collier
Basil Mustafa
Tianlin Li
Xiaohua Zhai
Lucas Beyer
Andreas Steiner
Jesse Berent
Rodolphe Jenatton
Efi Kokiopoulou
VLM
67
13
0
26 May 2023
ChatBridge: Bridging Modalities with Large Language Model as a Language
  Catalyst
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst
Zijia Zhao
Longteng Guo
Tongtian Yue
Si-Qing Chen
Shuai Shao
Xinxin Zhu
Zehuan Yuan
Jing Liu
MLLM
111
61
0
25 May 2023
PathAsst: A Generative Foundation AI Assistant Towards Artificial
  General Intelligence of Pathology
PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology
Yuxuan Sun
Chenglu Zhu
S. Zheng
Kai Zhang
Xiaoxuan Yu
Zhongyi Shui
Yunlong Zhang
Honglin Li
Lin Yang
LM&MAMedIm
135
49
0
24 May 2023
BLIP-Diffusion: Pre-trained Subject Representation for Controllable
  Text-to-Image Generation and Editing
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
Dongxu Li
Junnan Li
Steven C. H. Hoi
105
331
0
24 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
103
5
0
23 May 2023
Can Language Models Understand Physical Concepts?
Can Language Models Understand Physical Concepts?
Lei Li
Jingjing Xu
Qingxiu Dong
Ce Zheng
Qi Liu
Lingpeng Kong
Xu Sun
ALM
61
22
0
23 May 2023
Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes
  From Text-To-Image Models
Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models
Y. Qu
Xinyue Shen
Xinlei He
Michael Backes
Savvas Zannettou
Yang Zhang
71
124
0
23 May 2023
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
  Improved Vision-Language Compositionality
Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for Improved Vision-Language Compositionality
Harman Singh
Pengchuan Zhang
Qifan Wang
Mengjiao MJ Wang
Wenhan Xiong
Jingfei Du
Yu Chen
CoGeVLM
91
26
0
23 May 2023
Preconditioned Visual Language Inference with Weak Supervision
Preconditioned Visual Language Inference with Weak Supervision
Ehsan Qasemi
Amani Maina-Kilaas
Devadutta Dash
Khalid Alsaggaf
Muhao Chen
85
0
0
22 May 2023
VLAB: Enhancing Video Language Pre-training by Feature Adapting and
  Blending
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
Xingjian He
Sihan Chen
Fan Ma
Zhicheng Huang
Xiaojie Jin
Zikang Liu
Dongmei Fu
Yi Yang
Qingbin Liu
Jiashi Feng
VLMCLIP
102
18
0
22 May 2023
DreamWaltz: Make a Scene with Complex 3D Animatable Avatars
DreamWaltz: Make a Scene with Complex 3D Animatable Avatars
Yukun Huang
Jianan Wang
Ailing Zeng
He Cao
Xianbiao Qi
Yukai Shi
Zhengjun Zha
Lei Zhang
101
73
0
21 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner
  and Dense Captioner
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Qingbin Liu
60
1
0
19 May 2023
Previous
123...111213...161718
Next