ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.05918
  4. Cited By
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

11 February 2021
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
    VLM
    CLIP
ArXivPDFHTML

Papers citing "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision"

50 / 697 papers shown
Title
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Zhaochen Su
Linjie Li
Mingyang Song
Yunzhuo Hao
Zhengyuan Yang
...
Guanjie Chen
Jiawei Gu
Juntao Li
Xiaoye Qu
Yu Cheng
OffRL
LRM
31
0
0
13 May 2025
Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models
Beyond CLIP Generalization: Against Forward&Backward Forgetting Adapter for Continual Learning of Vision-Language Models
Songlin Dong
Chenhao Ding
Jiangyang Li
Jizhou Han
Qiang Wang
Yuhang He
Yihong Gong
CLL
VLM
35
0
0
12 May 2025
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via D\mathbf{\texttt{D}}Dual-H\mathbf{\texttt{H}}Head O\mathbf{\texttt{O}}Optimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
54
0
0
12 May 2025
A Vision-Language Foundation Model for Leaf Disease Identification
A Vision-Language Foundation Model for Leaf Disease Identification
Khang Nguyen Quoc
Lan Le Thi Thu
Luyl-Da Quach
VLM
26
0
0
11 May 2025
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
Yicheng Song
Tiancheng Lin
Die Peng
Su Yang
Yi Xu
MedIm
31
0
0
10 May 2025
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP
Hanxun Huang
Sarah Monazam Erfani
Yige Li
Xingjun Ma
James Bailey
AAML
44
0
0
08 May 2025
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models
VR-RAG: Open-vocabulary Species Recognition with RAG-Assisted Large Multi-Modal Models
F. Khan
Jun Chen
Youssef Mohamed
Chun-Mei Feng
Mohamed Elhoseiny
VLM
33
0
0
08 May 2025
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
Feng Xiao
Hongbin Xu
Guocan Zhao
Wenxiong Kang
48
0
0
07 May 2025
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
HsiaoYuan Hsu
Yuxin Peng
21
0
0
06 May 2025
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
L. Wang
Senmao Li
Fei Yang
Jianye Wang
Ziheng Zhang
Y. Liu
Y. Wang
Jian Yang
DiffM
58
0
0
06 May 2025
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification
Utsav Nareti
S. Chattopadhyay
Prolay Mallick
Suraj Kumar
Ayush Vikas Daga
Chandranath Adak
Adarsh Wase
Arjab Roy
20
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
Compositional Image-Text Matching and Retrieval by Grounding Entities
Compositional Image-Text Matching and Retrieval by Grounding Entities
Madhukar Reddy Vongala
Saurabh Srivastava
Jana Kosecka
CLIP
CoGe
VLM
36
0
0
04 May 2025
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models
Mitigating Group-Level Fairness Disparities in Federated Visual Language Models
Chaomeng Chen
Zitong Yu
J. Dong
Sen Su
L. Shen
Shutao Xia
Xiaochun Cao
FedML
VLM
137
0
0
03 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
65
1
0
01 May 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Weicai Yan
Wang Lin
Zirun Guo
Ye Wang
Fangming Feng
Xiaoda Yang
Z. Wang
Tao Jin
DiffM
118
2
0
30 Apr 2025
FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models
FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models
Mainak Singha
Subhankar Roy
Sarthak Mehrotra
Ankit Jha
Moloud Abdar
Biplab Banerjee
Elisa Ricci
VLM
VPVLM
119
0
0
29 Apr 2025
EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia
EcoWikiRS: Learning Ecological Representation of Satellite Images from Weak Supervision with Species Observations and Wikipedia
Valerie Zermatten
J. Castillo-Navarro
Pallavi Jain
D. Tuia
Diego Marcos
62
0
0
28 Apr 2025
ShapeSpeak: Body Shape-Aware Textual Alignment for Visible-Infrared Person Re-Identification
ShapeSpeak: Body Shape-Aware Textual Alignment for Visible-Infrared Person Re-Identification
Shuanglin Yan
Neng Dong
Shuang Li
Rui Yan
Hao Tang
Jing Qin
122
0
0
25 Apr 2025
Revisiting Data Auditing in Large Vision-Language Models
Revisiting Data Auditing in Large Vision-Language Models
Hongyu Zhu
Sichu Liang
W. Wang
Boheng Li
Tongxin Yuan
Fangqi Li
Shilin Wang
Zhuosheng Zhang
VLM
167
0
0
25 Apr 2025
EmoSEM: Segment and Explain Emotion Stimuli in Visual Art
EmoSEM: Segment and Explain Emotion Stimuli in Visual Art
Jing Zhang
Dan Guo
Zhangbin Li
Meng Wang
31
0
0
20 Apr 2025
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
LGD: Leveraging Generative Descriptions for Zero-Shot Referring Image Segmentation
Jiachen Li
Qing Xie
Xiaohan Yu
Hongyun Wang
Jinyu Xu
Yongjian Liu
ObjD
76
0
0
20 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage
PATFinger: Prompt-Adapted Transferable Fingerprinting against Unauthorized Multimodal Dataset Usage
W. Zhang
Ju Jia
Xiaojun Jia
Yihao Huang
X. Li
Cong Wu
Lina Wang
AAML
33
0
0
15 Apr 2025
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning
Yichao Cai
Yuhang Liu
Erdun Gao
T. Jiang
Zhen Zhang
Anton van den Hengel
J. Shi
62
0
0
14 Apr 2025
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng
Songyou Peng
Kyle Genova
Gordon Wetzstein
Noah Snavely
Leonidas J. Guibas
Thomas Funkhouser
HAI
134
0
0
11 Apr 2025
Impact of Language Guidance: A Reproducibility Study
Impact of Language Guidance: A Reproducibility Study
Cherish Puniani
Advika Sinha
Shree Singhi
Aayan Yadav
VLM
47
0
0
10 Apr 2025
EffOWT: Transfer Visual Language Models to Open-World Tracking Efficiently and Effectively
EffOWT: Transfer Visual Language Models to Open-World Tracking Efficiently and Effectively
Bingyang Wang
Kaer Huang
Bin Li
Yiqiang Yan
L. Zhang
Huchuan Lu
You He
VLM
37
0
0
07 Apr 2025
A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?
A Reality Check of Vision-Language Pre-training in Radiology: Have We Progressed Using Text?
Julio Silva-Rodríguez
Jose Dolz
Ismail ben Ayed
VLM
MedIm
31
0
0
07 Apr 2025
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
Justus Westerhoff
Erblina Purellku
Jakob Hackstein
Jonas Loos
Leo Pinetzki
Lorenz Hufe
AAML
28
0
0
07 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
40
0
0
03 Apr 2025
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning
Jie Ma
Zhitao Gao
Qi Chai
J. Liu
P. Wang
Jing Tao
Zhou Su
48
0
0
01 Apr 2025
Efficient Adaptation For Remote Sensing Visual Grounding
Efficient Adaptation For Remote Sensing Visual Grounding
Hasan Moughnieh
Mohamad Chalhoub
Hasan Nasrallah
Cristiano Nattero
Paolo Campanella
Giovanni Nico
A. Ghandour
46
0
0
29 Mar 2025
VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection
VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection
Bin Zhang
Xiaoyang Qu
Guokuan Li
Jiguang Wan
Jianzong Wang
VLM
51
0
0
28 Mar 2025
Feature Calibration enhanced Parameter Synthesis for CLIP-based Class-incremental Learning
Feature Calibration enhanced Parameter Synthesis for CLIP-based Class-incremental Learning
J. Guo
Xiaoguang Zhu
Xiaoguang Zhu
Lianlong Sun
Liangyu Teng
Y. Liu
Di Li
Wei Zhou
Liang Song
CLL
VLM
59
1
0
24 Mar 2025
GOAL: Global-local Object Alignment Learning
GOAL: Global-local Object Alignment Learning
Hyungyu Choi
Young Kyun Jang
Chanho Eom
VLM
124
0
0
22 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li
Cristiano Saltori
Fabio Poiesi
N. Sebe
156
0
0
20 Mar 2025
SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation
SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation
Thomas Pickard
Aline Villavicencio
Maggie Mi
Wei He
Dylan Phelps
Carolina Scarton
78
1
0
19 Mar 2025
TULIP: Towards Unified Language-Image Pretraining
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLM
CLIP
MLLM
103
3
0
19 Mar 2025
Advancing Medical Representation Learning Through High-Quality Data
Advancing Medical Representation Learning Through High-Quality Data
Negin Baghbanzadeh
Adibvafa Fallahpour
Yasaman Parhizkar
Franklin Ogidi
Shuvendu Roy
...
Vahid Reza Khazaie
Michael Colacci
Ali Etemad
Arash Afkanpour
Elham Dolatabadi
LM&MA
83
0
0
18 Mar 2025
ChatBEV: A Visual Language Model that Understands BEV Maps
ChatBEV: A Visual Language Model that Understands BEV Maps
Qingyao Xu
S. Chen
Guang Chen
Yanfeng Wang
Y. Zhang
46
0
0
18 Mar 2025
Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models
Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models
X. Jia
Sensen Gao
Simeng Qin
Ke Ma
X. Li
Yihao Huang
Wei Dong
Yang Liu
Xiaochun Cao
AAML
VLM
58
0
0
17 Mar 2025
Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis
Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis
Hongyu Sun
Qiuhong Ke
Ming Cheng
Y. Wang
Deying Li
Chenhui Gou
Jianfei Cai
3DPC
87
0
0
15 Mar 2025
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification
Ans Munir
Faisal Z. Qureshi
M. H. Khan
Mohsen Ali
VLM
70
0
0
15 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
68
0
0
13 Mar 2025
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
Xiangyan Qu
Gaopeng Gou
Jiamin Zhuang
Jing Yu
Kun Song
Qihao Wang
Yili Li
Gang Xiong
VLM
80
0
0
13 Mar 2025
ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning
ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning
Quanxing Zha
Xin Liu
Shu-Juan Peng
Y. Cheung
X. Xu
Nannan Wang
45
0
0
13 Mar 2025
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
B. Zhu
Jiequan Cui
H. Zhang
Chi Zhang
83
0
0
12 Mar 2025
MMRL: Multi-Modal Representation Learning for Vision-Language Models
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo
Xiaodong Gu
VLM
OffRL
129
1
0
11 Mar 2025
Aligning Text to Image in Diffusion Models is Easier Than You Think
Aligning Text to Image in Diffusion Models is Easier Than You Think
J. Lee
Byunghee Cha
Jeongsol Kim
Jong Chul Ye
52
0
0
11 Mar 2025
1234...121314
Next