ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.08981
  4. Cited By
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
v1v2 (latest)

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

17 February 2021
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
    VLM
ArXiv (abs)PDFHTML

Papers citing "Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts"

50 / 871 papers shown
Title
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation
  Models
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
Dohwan Ko
Joon-Young Choi
Hyeong Kyu Choi
Kyoung-Woon On
Byungseok Roh
Hyunwoo J. Kim
121
22
0
23 Mar 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
MAGVLT: Masked Generative Vision-and-Language Transformer
Sungwoong Kim
DaeJin Jo
Donghoon Lee
Jongmin Kim
VLM
58
12
0
21 Mar 2023
Large AI Models in Health Informatics: Applications, Challenges, and the
  Future
Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny Lo
AI4MHLM&MA
114
140
0
21 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLMViTCLIP
125
289
0
20 Mar 2023
VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for
  Weakly-Supervised Object Detection
VEIL: Vetting Extracted Image Labels from In-the-Wild Captions for Weakly-Supervised Object Detection
Arushi Rai
Adriana Kovashka
69
0
0
16 Mar 2023
Enabling Calibration In The Zero-Shot Inference of Large Vision-Language
  Models
Enabling Calibration In The Zero-Shot Inference of Large Vision-Language Models
Will LeVine
Benjamin Pikus
P. Raj
Fernando Amat Gil
VLMUQCV
71
12
0
11 Mar 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIPMLLMVLM3DV
143
77
0
10 Mar 2023
Weakly-Supervised HOI Detection from Interaction Labels Only and
  Language/Vision-Language Priors
Weakly-Supervised HOI Detection from Interaction Labels Only and Language/Vision-Language Priors
Mesut Erhan Unal
Adriana Kovashka
VLM
73
5
0
09 Mar 2023
Refined Vision-Language Modeling for Fine-grained Multi-modal
  Pre-training
Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training
Lisai Zhang
Qingcai Chen
Zhijian Chen
Yunpeng Han
Zhonghua Li
Bo Zhao
VLM
59
1
0
09 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only
  Training
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
107
89
0
06 Mar 2023
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware
  Attention
HiCLIP: Contrastive Language-Image Pretraining with Hierarchy-aware Attention
Shijie Geng
Jianbo Yuan
Yu Tian
Yuxiao Chen
Yongfeng Zhang
CLIPVLM
72
46
0
06 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLMMLLM
139
25
0
04 Mar 2023
The Trade-off between Universality and Label Efficiency of
  Representations from Contrastive Learning
The Trade-off between Universality and Label Efficiency of Representations from Contrastive Learning
Zhenmei Shi
Jiefeng Chen
Kunyang Li
Jayaram Raghuram
Xi Wu
Yingyu Liang
S. Jha
SSL
79
20
0
28 Feb 2023
Language Is Not All You Need: Aligning Perception with Language Models
Language Is Not All You Need: Aligning Perception with Language Models
Shaohan Huang
Li Dong
Wenhui Wang
Y. Hao
Saksham Singhal
...
Johan Bjorck
Vishrav Chaudhary
Subhojit Som
Xia Song
Furu Wei
VLMLRMMLLM
137
566
0
27 Feb 2023
The Role of Pre-training Data in Transfer Learning
The Role of Pre-training Data in Transfer Learning
R. Entezari
Mitchell Wortsman
O. Saukh
M. Shariatnia
Hanie Sedghi
Ludwig Schmidt
98
23
0
27 Feb 2023
Learning Visual Representations via Language-Guided Sampling
Learning Visual Representations via Language-Guided Sampling
Mohamed El Banani
Karan Desai
Justin Johnson
SSLVLM
124
28
0
23 Feb 2023
Entity-Level Text-Guided Image Manipulation
Entity-Level Text-Guided Image Manipulation
Yikai Wang
Jianan Wang
Guansong Lu
Hang Xu
Zhenguo Li
Wei Zhang
Yanwei Fu
VGen
68
3
0
22 Feb 2023
Poisoning Web-Scale Training Datasets is Practical
Poisoning Web-Scale Training Datasets is Practical
Nicholas Carlini
Matthew Jagielski
Christopher A. Choquette-Choo
Daniel Paleka
Will Pearce
Hyrum S. Anderson
Andreas Terzis
Kurt Thomas
Florian Tramèr
SILM
131
204
0
20 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CEVLM
151
214
0
20 Feb 2023
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot
  Image Captioning
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Zhuolin Yang
Ming-Yu Liu
Zihan Liu
V. Korthikanti
Weili Nie
...
Yuke Zhu
Mohammad Shoeybi
Bryan Catanzaro
Chaowei Xiao
Anima Anandkumar
VLMRALM
108
40
0
09 Feb 2023
Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models
Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models
Shawn Shan
Jenna Cryan
Emily Wenger
Haitao Zheng
Rana Hanocka
Ben Y. Zhao
WIGM
80
189
0
08 Feb 2023
SimCon Loss with Multiple Views for Text Supervised Semantic
  Segmentation
SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Yash J. Patel
Yusheng Xie
Yi Zhu
Srikar Appalaraju
R. Manmatha
75
4
0
07 Feb 2023
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale
  Image-Text Retrieval
LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Retrieval
Ziyang Luo
Pu Zhao
Can Xu
Xiubo Geng
Tao Shen
Chongyang Tao
Jing Ma
Qingwen Lin
Daxin Jiang
VLMCLIP
63
3
0
06 Feb 2023
Semantic-Guided Generative Image Augmentation Method with Diffusion
  Models for Image Classification
Semantic-Guided Generative Image Augmentation Method with Diffusion Models for Image Classification
Bohan Li
Xiao Xu
Xinghao Wang
Yutai Hou
Yunlong Feng
Feng Wang
Xuanliang Zhang
Qingfu Zhu
Wanxiang Che
DiffMVLM
70
12
0
04 Feb 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image
  and Video
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Haiyang Xu
Qinghao Ye
Mingshi Yan
Yaya Shi
Jiabo Ye
...
Guohai Xu
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
MLLMVLMMoE
116
171
0
01 Feb 2023
ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View
  Semantic Consistency
ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency
Pengzhen Ren
Changlin Li
Hang Xu
Yi Zhu
Guangrun Wang
Jian-zhuo Liu
Xiaojun Chang
Xiaodan Liang
106
45
0
31 Jan 2023
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens
Chen Chen
Bowen Zhang
Liangliang Cao
Jiguang Shen
Tom Gunter
Albin Madappally Jose
Alexander Toshev
Jonathon Shlens
Ruoming Pang
Yinfei Yang
VLM3DV
63
16
0
30 Jan 2023
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
Ming Tao
Bingkun Bao
Hao Tang
Changsheng Xu
DiffMVLM
117
109
0
30 Jan 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
469
4,668
0
30 Jan 2023
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale
  Text-to-Image Synthesis
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Axel Sauer
Tero Karras
S. Laine
Andreas Geiger
Timo Aila
96
218
0
23 Jan 2023
Learning Open-vocabulary Semantic Segmentation Models From Natural
  Language Supervision
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
Jilan Xu
Junlin Hou
Yuejie Zhang
Rui Feng
Yi Wang
Yu Qiao
Weidi Xie
VLM
84
87
0
22 Jan 2023
Visual Semantic Relatedness Dataset for Image Captioning
Visual Semantic Relatedness Dataset for Image Captioning
Ahmed Sabir
Francesc Moreno-Noguer
Lluís Padró
CoGeVLM
67
3
0
20 Jan 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
54
5
0
19 Jan 2023
Embodied Agents for Efficient Exploration and Smart Scene Description
Embodied Agents for Efficient Exploration and Smart Scene Description
Roberto Bigazzi
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
LM&Ro
66
7
0
17 Jan 2023
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Haotian Liu
Kilho Son
Jianwei Yang
Ce Liu
Jianfeng Gao
Yong Jae Lee
Chunyuan Li
VLM
129
56
0
17 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
95
11
0
17 Jan 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang
Wenhao Wu
Chang-rui Liu
Yu Zhou
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
107
57
0
16 Jan 2023
Toward Building General Foundation Models for Language, Vision, and
  Vision-Language Understanding Tasks
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
Xinsong Zhang
Yan Zeng
Jipeng Zhang
Hang Li
VLMAI4CELRM
117
17
0
12 Jan 2023
CiT: Curation in Training for Effective Vision-Language Data
CiT: Curation in Training for Effective Vision-Language Data
Hu Xu
Saining Xie
Po-Yao (Bernie) Huang
Licheng Yu
Russ Howes
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLMDiffM
64
26
0
05 Jan 2023
ANNA: Abstractive Text-to-Image Synthesis with Filtered News Captions
ANNA: Abstractive Text-to-Image Synthesis with Filtered News Captions
Aashish Anantha Ramakrishnan
Sharon X. Huang
Dongwon Lee
108
6
0
05 Jan 2023
Attribute-Centric Compositional Text-to-Image Generation
Attribute-Centric Compositional Text-to-Image Generation
Yuren Cong
Martin Renqiang Min
Erran L. Li
Bodo Rosenhahn
M. Yang
114
13
0
04 Jan 2023
Foreground-Background Separation through Concept Distillation from
  Generative Image Foundation Models
Foreground-Background Separation through Concept Distillation from Generative Image Foundation Models
Mischa Dombrowski
Hadrien Reynaud
Matthew Baugh
Bernhard Kainz
DiffM
89
3
0
29 Dec 2022
Exploring Vision Transformers as Diffusion Learners
Exploring Vision Transformers as Diffusion Learners
He Cao
Jianan Wang
Tianhe Ren
Xianbiao Qi
Yihao Chen
Yuan Yao
Lefei Zhang
83
10
0
28 Dec 2022
Noise-aware Learning from Web-crawled Image-Text Data for Image
  Captioning
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Woohyun Kang
Jonghwan Mun
Sungjun Lee
Byungseok Roh
VLM
97
20
0
27 Dec 2022
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction
  Tuning
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
139
120
0
21 Dec 2022
Position-guided Text Prompt for Vision-Language Pre-training
Position-guided Text Prompt for Vision-Language Pre-training
Alex Jinpeng Wang
Pan Zhou
Mike Zheng Shou
Shuicheng Yan
VLM
70
38
0
19 Dec 2022
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Samuel Cahyawijaya
Holy Lovenia
Alham Fikri Aji
Genta Indra Winata
Bryan Wilie
...
Timothy Baldwin
Sebastian Ruder
Herry Sujaini
S. Sakti
Ayu Purwarianti
127
50
0
19 Dec 2022
NLIP: Noise-robust Language-Image Pre-training
NLIP: Noise-robust Language-Image Pre-training
Runhu Huang
Yanxin Long
Jianhua Han
Hang Xu
Xiwen Liang
Chunjing Xu
Xiaodan Liang
VLM
109
30
0
14 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
94
142
0
13 Dec 2022
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with
  Multi-Source Multimodal Knowledge Memory
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Ziniu Hu
Ahmet Iscen
Chen Sun
Zirui Wang
Kai-Wei Chang
Yizhou Sun
Cordelia Schmid
David A. Ross
Alireza Fathi
RALMVLM
98
96
0
10 Dec 2022
Previous
123...131415161718
Next