ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.05918
  4. Cited By
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

11 February 2021
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
    VLM
    CLIP
ArXivPDFHTML

Papers citing "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision"

50 / 899 papers shown
Title
Black-box Targeted Adversarial Attack on Segment Anything (SAM)
Black-box Targeted Adversarial Attack on Segment Anything (SAM)
Sheng Zheng
Chaoning Zhang
Xinhong Hao
AAML
42
7
0
16 Oct 2023
Prompting Scientific Names for Zero-Shot Species Recognition
Prompting Scientific Names for Zero-Shot Species Recognition
Shubham Parashar
Zhiqiu Lin
Yanan Li
Shu Kong
VLM
23
12
0
15 Oct 2023
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Xi Chen
Xiao Wang
Lucas Beyer
Alexander Kolesnikov
Jialin Wu
...
Keran Rong
Tianli Yu
Daniel Keysers
Xiao-Qi Zhai
Radu Soricut
MLLM
VLM
41
94
0
13 Oct 2023
Extending Multi-modal Contrastive Representations
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
31
5
0
13 Oct 2023
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
Wen-Hsuan Chu
Adam W. Harley
P. Tokmakov
Achal Dave
Leonidas J. Guibas
Katerina Fragkiadaki
VLM
40
7
0
10 Oct 2023
Improving Compositional Text-to-image Generation with Large
  Vision-Language Models
Improving Compositional Text-to-image Generation with Large Vision-Language Models
Song Wen
Guian Fang
Renrui Zhang
Peng Gao
Hao Dong
Dimitris N. Metaxas
25
17
0
10 Oct 2023
Sentence-level Prompts Benefit Composed Image Retrieval
Sentence-level Prompts Benefit Composed Image Retrieval
Yang Bai
Xinxing Xu
Yong-Jin Liu
Salman Khan
Fahad Khan
Wangmeng Zuo
Rick Siow Mong Goh
Chun-Mei Feng
46
26
0
09 Oct 2023
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for
  Open-vocabulary 3D Object Detection
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Yang Cao
Yihan Zeng
Hang Xu
Dan Xu
3DPC
ObjD
24
33
0
04 Oct 2023
Delving into CLIP latent space for Video Anomaly Recognition
Delving into CLIP latent space for Video Anomaly Recognition
Luca Zanella
Benedetta Liberatori
Willi Menapace
Fabio Poiesi
Yiming Wang
Elisa Ricci
31
23
0
04 Oct 2023
Towards reporting bias in visual-language datasets: bimodal augmentation
  by decoupling object-attribute association
Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association
Qiyu Wu
Mengjie Zhao
Yutong He
Lang Huang
Junya Ono
Hiromi Wakaki
Yuki Mitsufuji
33
4
0
02 Oct 2023
GeRA: Label-Efficient Geometrically Regularized Alignment
GeRA: Label-Efficient Geometrically Regularized Alignment
Dustin Klebe
Tal Shnitzer
Mikhail Yurochkin
Leonid Karlinsky
Justin Solomon
18
2
0
01 Oct 2023
Data Filtering Networks
Data Filtering Networks
Alex Fang
Albin Madappally Jose
Amit Jain
Ludwig Schmidt
Alexander Toshev
Vaishaal Shankar
CLIP
46
127
0
29 Sep 2023
Understanding and Mitigating the Label Noise in Pre-training on
  Downstream Tasks
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
Hao Chen
Jindong Wang
Ankit Shah
Ran Tao
Hongxin Wei
Berfin cSimcsek
Masashi Sugiyama
Bhiksha Raj
44
26
0
29 Sep 2023
FLIP: Cross-domain Face Anti-spoofing with Language Guidance
FLIP: Cross-domain Face Anti-spoofing with Language Guidance
K. Srivatsan
Muzammal Naseer
Karthik Nandakumar
CVBM
52
44
0
28 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced
  Text-image Comprehension and Composition
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
MLLM
80
226
0
26 Sep 2023
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Hila Levi
Guy Heller
Dan Levi
Ethan Fetaya
OCL
VLM
29
3
0
26 Sep 2023
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
R. S. Srinivasa
Jaejin Cho
Chouchang Yang
Yashas Malur Saidutta
Ching Hua Lee
Yilin Shen
Hongxia Jin
VLM
38
8
0
26 Sep 2023
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary
  Instance Segmentation
MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
Jiahao Xie
Wei Li
Xiangtai Li
Ziwei Liu
Yew-Soon Ong
Chen Change Loy
DiffM
VLM
72
35
0
22 Sep 2023
Gradient constrained sharpness-aware prompt learning for vision-language
  models
Gradient constrained sharpness-aware prompt learning for vision-language models
Liangchen Liu
Nannan Wang
Dawei Zhou
Xinbo Gao
Decheng Liu
Xi Yang
Tongliang Liu
VLM
33
2
0
14 Sep 2023
Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge
  Distillation at Multiple Levels
Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels
Bo Wan
Tinne Tuytelaars
VLM
32
3
0
10 Sep 2023
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language
  Models
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
LRM
36
25
0
08 Sep 2023
Zero-Shot Robustification of Zero-Shot Models
Zero-Shot Robustification of Zero-Shot Models
Dyah Adila
Changho Shin
Lin Cai
Frederic Sala
48
19
0
08 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Peng Gao
Peng Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
51
117
0
07 Sep 2023
Learning Speech Representation From Contrastive Token-Acoustic
  Pretraining
Learning Speech Representation From Contrastive Token-Acoustic Pretraining
Chunyu Qiang
Hao Li
Yixin Tian
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
34
5
0
01 Sep 2023
AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute
  Decomposition-Aggregation
AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation
Chaofan Ma
Yu-Hao Yang
Chen Ju
Fei Zhang
Ya Zhang
Yanfeng Wang
VLM
48
17
0
31 Aug 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal
  Instruction-Following Models
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
41
22
0
31 Aug 2023
Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes
  in Product Images for e-commerce Vision-Language Applications
Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications
Wenyi Wu
Karim Bouyarmane
Ismail B. Tutar
31
2
0
30 Aug 2023
SAM-Med2D
SAM-Med2D
Junlong Cheng
Jin Ye
Zhongying Deng
Jianpin Chen
Tian-Xin Li
...
Hui Sun
Junjun He
Shaoting Zhang
Min Zhu
Yu Qiao
MedIm
VLM
47
123
0
30 Aug 2023
Towards Fast and Accurate Image-Text Retrieval with Self-Supervised
  Fine-Grained Alignment
Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment
Jiamin Zhuang
Jing Yu
Yang Ding
Xiangyang Qu
Yue Hu
32
9
0
27 Aug 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual
  Captioning
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
46
13
0
25 Aug 2023
Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields
Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields
H. Song
Seokhun Choi
Hoseok Do
Chul Lee
Taehyeong Kim
DiffM
33
24
0
23 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and
  Modality-Aware MoE
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
60
9
0
23 Aug 2023
GOPro: Generate and Optimize Prompts in CLIP using Self-Supervised
  Learning
GOPro: Generate and Optimize Prompts in CLIP using Self-Supervised Learning
Mainak Singha
Ankit Jha
Biplab Banerjee
VLM
39
4
0
22 Aug 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive
  Language-Image Pre-training
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIP
VLM
29
3
0
22 Aug 2023
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models
Baoshuo Kan
Teng Wang
Wenpeng Lu
Xiantong Zhen
Weili Guan
Feng Zheng
VPVLM
VLM
33
25
0
22 Aug 2023
UnLoc: A Unified Framework for Video Localization Tasks
UnLoc: A Unified Framework for Video Localization Tasks
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
36
53
0
21 Aug 2023
An Examination of the Compositionality of Large Generative
  Vision-Language Models
An Examination of the Compositionality of Large Generative Vision-Language Models
Teli Ma
Rong Li
Junwei Liang
CoGe
36
2
0
21 Aug 2023
A Foundation Language-Image Model of the Retina (FLAIR): Encoding Expert Knowledge in Text Supervision
A Foundation Language-Image Model of the Retina (FLAIR): Encoding Expert Knowledge in Text Supervision
Julio Silva-Rodríguez
H. Chakor
Riadh Kobbi
Jose Dolz
Ismail Ben Ayed
VLM
MedIm
74
35
0
15 Aug 2023
Exploring Transfer Learning in Medical Image Segmentation using
  Vision-Language Models
Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models
K. Poudel
Manish Dhakal
Prasiddha Bhandari
Rabin Adhikari
Safal Thapaliya
Bishesh Khanal
VLM
30
17
0
15 Aug 2023
Distributionally Robust Classification on a Data Budget
Distributionally Robust Classification on a Data Budget
Ben Feuer
Ameya Joshi
Minh Pham
C. Hegde
OOD
39
2
0
07 Aug 2023
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen
  Convolutional CLIP
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VLM
CLIP
45
136
0
04 Aug 2023
Exploring Part-Informed Visual-Language Learning for Person Re-Identification
Exploring Part-Informed Visual-Language Learning for Person Re-Identification
Y. Lin
Cong Liu
Yehansen Chen
Jinshui Hu
Bing Yin
Baocai Yin
Zengfu Wang
64
7
0
04 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image
  Captioning
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
36
34
0
31 Jul 2023
UniBriVL: Robust Universal Representation and Generation of Audio Driven
  Diffusion Models
UniBriVL: Robust Universal Representation and Generation of Audio Driven Diffusion Models
Sen Fang
Bowen Gao
Yangjian Wu
T. Teoh
DiffM
34
1
0
29 Jul 2023
Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation
Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation
Bokui (William) Shen
Ge Yang
Alan Yu
J. Wong
L. Kaelbling
Phillip Isola
VLM
34
104
0
27 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
37
20
0
27 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
40
119
0
25 Jul 2023
Towards a Visual-Language Foundation Model for Computational Pathology
Towards a Visual-Language Foundation Model for Computational Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Ivy Liang
...
Andrew Zhang
L. Le
Georg Gerber
Anil V. Parwani
Faisal Mahmood
VLM
MedIm
42
46
0
24 Jul 2023
Geometry-Aware Adaptation for Pretrained Models
Geometry-Aware Adaptation for Pretrained Models
Nicholas Roberts
Xintong Li
Dyah Adila
Sonia Cromp
Tzu-Heng Huang
Jitian Zhao
Frederic Sala
VLM
31
1
0
23 Jul 2023
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for
  Referring Image Segmentation
Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation
Zunnan Xu
Zhihong Chen
Yong Zhang
Yibing Song
Xiang Wan
Guanbin Li
VLM
35
48
0
21 Jul 2023
Previous
123...789...161718
Next