ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.05918
  4. Cited By
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

11 February 2021
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
    VLM
    CLIP
ArXivPDFHTML

Papers citing "Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision"

50 / 790 papers shown
Title
ProtChatGPT: Towards Understanding Proteins with Large Language Models
ProtChatGPT: Towards Understanding Proteins with Large Language Models
Chao Wang
Hehe Fan
Ruijie Quan
Yi Yang
26
13
0
15 Feb 2024
The Landscape and Challenges of HPC Research and LLMs
The Landscape and Challenges of HPC Research and LLMs
Le Chen
Nesreen K. Ahmed
Akashnil Dutta
Arijit Bhattacharjee
Sixing Yu
...
Vy A. Vo
J. P. Muñoz
Ted Willke
Tim Mattson
Ali Jannesari
AI4CE
48
20
0
03 Feb 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Junlong Du
Yue Fan
Qing Li
Qing Li
Yuntao Du
VLM
75
75
0
03 Feb 2024
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Jorge Sánchez
Rodrigo Laguna
VLM
44
0
0
29 Jan 2024
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD
  Generalization
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
Yuhang Zang
Hanlin Goh
Josh Susskind
Chen Huang
VLM
37
12
0
29 Jan 2024
LanDA: Language-Guided Multi-Source Domain Adaptation
LanDA: Language-Guided Multi-Source Domain Adaptation
Zhenbin Wang
Lei Zhang
Lituan Wang
Minjuan Zhu
35
10
0
25 Jan 2024
SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by
  Visual-Textual Contrastive Learning
SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning
Hao Chen
Jiaze Wang
Ziyu Guo
Jinpeng Li
Donghao Zhou
Bian Wu
Chenyong Guan
Guangyong Chen
Pheng-Ann Heng
30
5
0
22 Jan 2024
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
Antonín Vobecký
Oriane Siméoni
David Hurych
Spyros Gidaris
Andrei Bursuc
Patrick Pérez
Josef Sivic
40
33
0
17 Jan 2024
Domain Adaptation for Large-Vocabulary Object Detectors
Domain Adaptation for Large-Vocabulary Object Detectors
Kai Jiang
Jiaxing Huang
Weiying Xie
Jie Lei
Yunsong Li
Ling Shao
Shijian Lu
ObjD
VLM
40
2
0
13 Jan 2024
Do Vision and Language Encoders Represent the World Similarly?
Do Vision and Language Encoders Represent the World Similarly?
Mayug Maniparambil
Raiymbek Akshulakov
Y. A. D. Djilali
Sanath Narayan
M. Seddik
K. Mangalam
Noel E. O'Connor
VLM
26
11
0
10 Jan 2024
Low-Resource Vision Challenges for Foundation Models
Low-Resource Vision Challenges for Foundation Models
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
30
5
0
09 Jan 2024
Effective pruning of web-scale datasets based on complexity of concept
  clusters
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
34
22
0
09 Jan 2024
AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
Qiuhui Chen
Yi Hong
MedIm
20
1
0
02 Jan 2024
3VL: Using Trees to Improve Vision-Language Models' Interpretability
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
49
4
0
28 Dec 2023
Prompt Expansion for Adaptive Text-to-Image Generation
Prompt Expansion for Adaptive Text-to-Image Generation
Siddhartha Datta
Alexander Ku
Deepak Ramachandran
Peter Anderson
DiffM
37
9
0
27 Dec 2023
Black-Box Tuning of Vision-Language Models with Effective Gradient
  Approximation
Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation
Zixian Guo
Yuxiang Wei
Ming-Yu Liu
Zhilong Ji
Jinfeng Bai
Yiwen Guo
Wangmeng Zuo
VLM
36
8
0
26 Dec 2023
Parrot Captions Teach CLIP to Spot Text
Parrot Captions Teach CLIP to Spot Text
Yiqi Lin
Conghui He
Alex Jinpeng Wang
Bin Wang
Weijia Li
Mike Zheng Shou
36
7
0
21 Dec 2023
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for
  Remote Sensing
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Zhecheng Wang
R. Prabha
Tianyuan Huang
Jiajun Wu
Ram Rajagopal
34
53
0
20 Dec 2023
A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
Julio Silva-Rodríguez
Sina Hajimiri
Ismail Ben Ayed
Jose Dolz
VLM
26
27
0
20 Dec 2023
Misalign, Contrast then Distill: Rethinking Misalignments in
  Language-Image Pretraining
Misalign, Contrast then Distill: Rethinking Misalignments in Language-Image Pretraining
Bumsoo Kim
Yeonsik Jo
Jinhyung Kim
S. Kim
VLM
24
7
0
19 Dec 2023
Expediting Contrastive Language-Image Pretraining via Self-distilled
  Encoders
Expediting Contrastive Language-Image Pretraining via Self-distilled Encoders
Bumsoo Kim
Jinhyung Kim
Yeonsik Jo
S. Kim
VLM
26
3
0
19 Dec 2023
Zero-shot Building Attribute Extraction from Large-Scale Vision and
  Language Models
Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models
Fei Pan
Sangryul Jeon
Brian Wang
Frank Mckenna
Stella X. Yu
44
2
0
19 Dec 2023
Mask Grounding for Referring Image Segmentation
Mask Grounding for Referring Image Segmentation
Yong Xien Chng
Henry Zheng
Yizeng Han
Xuchong Qiu
Gao Huang
ISeg
ObjD
32
15
0
19 Dec 2023
UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray
  Classification
UniChest: Conquer-and-Divide Pre-training for Multi-Source Chest X-Ray Classification
Tianjie Dai
Ruipeng Zhang
Feng Hong
Jiangchao Yao
Ya-Qin Zhang
Yanfeng Wang
38
8
0
18 Dec 2023
LaViP:Language-Grounded Visual Prompts
LaViP:Language-Grounded Visual Prompts
Nilakshan Kunananthaseelan
Jing Zhang
Mehrtash Harandi
VLM
22
0
0
18 Dec 2023
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Xiaoxu Xu
Yitian Yuan
Qiudan Zhang
Wen-Bin Wu
Zequn Jie
Lin Ma
Xu Wang
56
4
0
15 Dec 2023
TiMix: Text-aware Image Mixing for Effective Vision-Language
  Pre-training
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
Chaoya Jiang
Wei Ye
Haiyang Xu
Qinghao Ye
Mingshi Yan
Ji Zhang
Shikun Zhang
CLIP
VLM
24
4
0
14 Dec 2023
MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning
MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning
Yi Xin
Junlong Du
Qiang Wang
Ke Yan
Shouhong Ding
VLM
44
45
0
14 Dec 2023
Uni3DL: Unified Model for 3D and Language Understanding
Uni3DL: Unified Model for 3D and Language Understanding
Xiang Li
Jian Ding
Zhaoyang Chen
Mohamed Elhoseiny
35
3
0
05 Dec 2023
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video
  Grounding with Multimodal Large Language Model
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model
Guozhang Li
Xinpeng Ding
De-Chun Cheng
Jie Li
Nannan Wang
Xinbo Gao
34
1
0
05 Dec 2023
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language
  Understanding
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng
Sicheng Xie
Zuyao You
Shiyi Lan
Zuxuan Wu
VLM
CoGe
MLLM
30
18
0
30 Nov 2023
UniIR: Training and Benchmarking Universal Multimodal Information
  Retrievers
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Cong Wei
Yang Chen
Haonan Chen
Hexiang Hu
Ge Zhang
Jie Fu
Alan Ritter
Wenhu Chen
45
51
0
28 Nov 2023
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li
Chengyao Wang
Jiaya Jia
VLM
MLLM
40
259
0
28 Nov 2023
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf
  Vision-Language Models
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
Jiayun Luo
Siddhesh Khandelwal
Leonid Sigal
Boyang Albert Li
MLLM
VLM
35
7
0
28 Nov 2023
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai
Yuhang Liu
Zhen Zhang
Javen Qinfeng Shi
CLIP
VLM
34
5
0
28 Nov 2023
C-SAW: Self-Supervised Prompt Learning for Image Generalization in
  Remote Sensing
C-SAW: Self-Supervised Prompt Learning for Image Generalization in Remote Sensing
Avigyan Bhattacharya
Mainak Singha
Ankit Jha
Biplab Banerjee
SSL
VLM
26
6
0
27 Nov 2023
Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Song Tang
Wenxin Su
Mao Ye
Xiatian Zhu
VLM
33
21
0
27 Nov 2023
Choosing Wisely and Learning Deeply: Selective Cross-Modality
  Distillation via CLIP for Domain Generalization
Choosing Wisely and Learning Deeply: Selective Cross-Modality Distillation via CLIP for Domain Generalization
Jixuan Leng
Yijiang Li
Haohan Wang
VLM
34
0
0
26 Nov 2023
Invisible Relevance Bias: Text-Image Retrieval Models Prefer
  AI-Generated Images
Invisible Relevance Bias: Text-Image Retrieval Models Prefer AI-Generated Images
Shicheng Xu
Danyang Hou
Liang Pang
Jingcheng Deng
Jun Xu
Huawei Shen
Xueqi Cheng
18
8
0
23 Nov 2023
HGCLIP: Exploring Vision-Language Models with Graph Representations for
  Hierarchical Understanding
HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding
Peng Xia
Xingtong Yu
Ming Hu
Lie Ju
Zhiyong Wang
Peibo Duan
Zongyuan Ge
VLM
57
9
0
23 Nov 2023
DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation
DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation
Zhiqin Chen
Qimin Chen
Hang Zhou
Hao Zhang
3DPC
3DV
37
2
0
22 Nov 2023
Adversarial Prompt Tuning for Vision-Language Models
Adversarial Prompt Tuning for Vision-Language Models
Jiaming Zhang
Xingjun Ma
Xin Wang
Lingyu Qiu
Jiaqi Wang
Yu-Gang Jiang
Jitao Sang
AAML
VPVLM
VLM
30
18
0
19 Nov 2023
Enhancing the Reliability of Segment Anything Model for Auto-Prompting
  Medical Image Segmentation with Uncertainty Rectification
Enhancing the Reliability of Segment Anything Model for Auto-Prompting Medical Image Segmentation with Uncertainty Rectification
Yichi Zhang
Shiyao Hu
Sijie Ren
Chen Jiang
Yuan Cheng
Yuan Qi
MedIm
20
3
0
17 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
45
143
0
10 Nov 2023
Multi-Modal Gaze Following in Conversational Scenarios
Multi-Modal Gaze Following in Conversational Scenarios
Yuqi Hou
Zhongqun Zhang
Nora Horanyi
Jaewon Moon
Yihua Cheng
Hyung Jin Chang
21
5
0
09 Nov 2023
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
Xuzhe Dang
Stefan Edelkamp
37
4
0
06 Nov 2023
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics
P. Sermanet
Tianli Ding
Jeffrey Zhao
Fei Xia
Debidatta Dwibedi
...
Pannag R. Sanketi
Karol Hausman
Izhak Shafran
Brian Ichter
Yuan Cao
LM&Ro
33
50
0
01 Nov 2023
Class Incremental Learning with Pre-trained Vision-Language Models
Class Incremental Learning with Pre-trained Vision-Language Models
Xialei Liu
Xusheng Cao
Haori Lu
Jia-Wen Xiao
Andrew D. Bagdanov
Ming-Ming Cheng
VLM
17
12
0
31 Oct 2023
Partial Tensorized Transformers for Natural Language Processing
Partial Tensorized Transformers for Natural Language Processing
Subhadra Vadlamannati
Ryan Solgi
30
0
0
30 Oct 2023
Image Clustering Conditioned on Text Criteria
Image Clustering Conditioned on Text Criteria
Sehyun Kwon
Jaeseung Park
Minkyu Kim
Jaewoong Cho
Ernest K. Ryu
Kangwook Lee
VLM
39
11
0
27 Oct 2023
Previous
123...567...141516
Next