Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.05918
Cited By
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
11 February 2021
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision"
50 / 790 papers shown
Title
RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models
Junyao Ge
Yang Zheng
Kaitai Guo
Jimin Liang
Jimin Liang
38
1
0
27 Aug 2024
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
Feipeng Ma
Yizhou Zhou
Hebei Li
Zilong He
Siying Wu
Fengyun Rao
Siying Wu
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
33
3
0
21 Aug 2024
PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection
Tri Cao
Chengyu Huang
Yuexin Li
Huilin Wang
Amy He
Nay Oo
Bryan Hooi
LLMAG
OffRL
83
4
0
20 Aug 2024
An Efficient Sign Language Translation Using Spatial Configuration and Motion Dynamics with LLMs
Eui Jun Hwang
Sukmin Cho
Junmyeong Lee
Jong C. Park
SLR
76
4
0
20 Aug 2024
End-to-end Semantic-centric Video-based Multimodal Affective Computing
Ronghao Lin
Ying Zeng
Sijie Mai
Haifeng Hu
VGen
45
0
0
14 Aug 2024
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
Sayna Ebrahimi
Sercan Ö. Arik
Tejas Nama
Tomas Pfister
44
1
0
13 Aug 2024
Robust Domain Generalization for Multi-modal Object Recognition
Yuxin Qiao
Keqin Li
Junhong Lin
Rong Wei
Chufeng Jiang
Yang Luo
Haoyu Yang
VLM
39
26
0
11 Aug 2024
Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning
Lu Yu
Hesong Li
Ying Fu
J. Weijer
Changsheng Xu
CLL
55
1
0
02 Aug 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
27
1
0
30 Jul 2024
Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
Hunmin Yang
Jongoh Jeong
Kuk-Jin Yoon
AAML
VLM
60
4
0
30 Jul 2024
Advancing Prompt Learning through an External Layer
Fangming Cui
Xun Yang
Chao Wu
Liang Xiao
Xinmei Tian
VLM
38
1
0
29 Jul 2024
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
Jingjing Wu
Zhengyao Fang
Pengyuan Lyu
Chengquan Zhang
Fanglin Chen
Guangming Lu
Wenjie Pei
50
2
0
28 Jul 2024
Adversarial Robustification via Text-to-Image Diffusion Models
Daewon Choi
Jongheon Jeong
Huiwon Jang
Jinwoo Shin
DiffM
44
1
0
26 Jul 2024
Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective
Jingren Liu
Zhong Ji
Yunlong Yu
Jiale Cao
Yanwei Pang
Jungong Han
X. Li
CLL
42
3
0
24 Jul 2024
CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning
Emanuele Frascaroli
Aniello Panariello
Pietro Buzzega
Lorenzo Bonicelli
Angelo Porrello
Simone Calderara
VLM
CLL
35
3
0
22 Jul 2024
Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification
Yunyi Xuan
Weijie Chen
Shicai Yang
Di Xie
Luojun Lin
Yueting Zhuang
VLM
34
4
0
21 Jul 2024
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang
Dahun Kim
A. Taalimi
Chen Sun
Weicheng Kuo
ObjD
36
5
0
18 Jul 2024
WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
Xin-Jian Wu
Rui-Song Zhang
Jie Qin
Shijie Ma
Cheng-Lin Liu
VLM
32
1
0
14 Jul 2024
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
Wentao Zhao
Jiaming Chen
Ziyu Meng
Donghui Mao
Ran Song
Wei Zhang
43
8
0
13 Jul 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian
Shuangrui Ding
Dahua Lin
OCL
52
1
0
09 Jul 2024
Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
Reza Abbasi
M. Rohban
M. Baghshah
CoGe
38
5
0
08 Jul 2024
Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning
Bin Ren
Guofeng Mei
D. Paudel
Weijie Wang
Yawei Li
Mengyuan Liu
Rita Cucchiara
Luc Van Gool
N. Sebe
3DPC
50
9
0
08 Jul 2024
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serrà
41
2
0
08 Jul 2024
CLIPVQA:Video Quality Assessment via CLIP
Fengchuang Xing
Mingjie Li
Yuan-Gen Wang
Guopu Zhu
Xiaochun Cao
CLIP
ViT
40
4
0
06 Jul 2024
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Yuhan Zhu
Yuyang Ji
Zhiyu Zhao
Gangshan Wu
Limin Wang
VLM
41
7
0
05 Jul 2024
Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning
Mainak Singha
Ankit Jha
Divyam Gupta
Pranav Singla
Biplab Banerjee
VLM
32
0
0
05 Jul 2024
SOWA: Adapting Hierarchical Frozen Window Self-Attention to Visual-Language Models for Better Anomaly Detection
Zongxiang Hu
Zhaosheng Zhang
VLM
27
1
0
04 Jul 2024
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
Chang-Sheng Kao
Yun-Nung Chen
23
0
0
04 Jul 2024
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Bac Nguyen
Stefan Uhlich
Fabien Cardinaux
Lukas Mauch
Marzieh Edraki
Aaron Courville
OODD
CLL
VLM
57
3
0
03 Jul 2024
Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective
Zhaotian Weng
Zijun Gao
Jerone Andrews
Jieyu Zhao
33
0
0
03 Jul 2024
Camera-LiDAR Cross-modality Gait Recognition
Wenxuan Guo
Yingping Liang
Zhiyu Pan
Ziheng Xi
Jianjiang Feng
Jie Zhou
CVBM
38
3
0
02 Jul 2024
GalLoP: Learning Global and Local Prompts for Vision-Language Models
Marc Lafon
Elias Ramzi
Clément Rambour
Nicolas Audebert
Nicolas Thome
VLM
43
8
0
01 Jul 2024
MATE: Meet At The Embedding -- Connecting Images with Long Texts
Young Kyun Jang
Junmo Kang
Yong Jae Lee
Donghyun Kim
VLM
44
5
0
26 Jun 2024
High-resolution open-vocabulary object 6D pose estimation
Jaime Corsetti
Davide Boscaini
Francesco Giuliari
Changjae Oh
Andrea Cavallaro
Fabio Poiesi
32
1
0
24 Jun 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
75
31
0
24 Jun 2024
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
Delin Qu
Qizhi Chen
Pingrui Zhang
Xianqiang Gao
Bin Zhao
Bin Zhao
Dong Wang
Xuelong Li
AI4CE
42
7
0
23 Jun 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Jia Syuen Lim
Zhuoxiao Chen
Mahsa Baktashmotlagh
Zhi Chen
Xin Yu
Zi Huang
Yadan Luo
VLM
ObjD
82
1
0
21 Jun 2024
Understanding Multi-Granularity for Open-Vocabulary Part Segmentation
Jiho Choi
Seonho Lee
Seungho Lee
Minhyun Lee
Hyunjung Shim
OCL
45
0
0
17 Jun 2024
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Han-Hung Lee
Yiming Zhang
Angel X. Chang
3DPC
45
3
0
17 Jun 2024
Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP
Shuyang Lin
Tong Jia
Hao Wang
Bowen Ma
Mingyuan Li
Dongyue Chen
VLM
ObjD
41
0
0
16 Jun 2024
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
Shuvendu Roy
Yasaman Parhizkar
Franklin Ogidi
Vahid Reza Khazaie
Michael Colacci
Ali Etemad
Elham Dolatabadi
Arash Afkanpour
VLM
52
1
0
11 Jun 2024
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
40
13
0
08 Jun 2024
Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
Amandeep Kumar
Muhammad Awais
Sanath Narayan
Hisham Cholakkal
Salman Khan
Rao Muhammad Anwer
45
0
0
06 Jun 2024
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
Lingchen Meng
Jianwei Yang
Rui Tian
Xiyang Dai
Zuxuan Wu
Jianfeng Gao
Yu-Gang Jiang
VLM
27
9
0
06 Jun 2024
Interpreting the Second-Order Effects of Neurons in CLIP
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
MILM
59
16
0
06 Jun 2024
Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models
A. Bavaresco
A. Testoni
Raquel Fernández
31
2
0
31 May 2024
Vision-Language Meets the Skeleton: Progressively Distillation with Cross-Modal Knowledge for 3D Action Representation Learning
Yang Chen
Tian He
Junfeng Fu
Ling Wang
Jingcai Guo
Hong Cheng
VLM
34
2
0
31 May 2024
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
Himangi Mittal
Nakul Agarwal
Shao-Yuan Lo
Kwonjoon Lee
44
14
0
30 May 2024
From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems
Jianliang He
Siyu Chen
Fengzhuo Zhang
Zhuoran Yang
LM&Ro
LLMAG
44
2
0
30 May 2024
Encoding and Controlling Global Semantics for Long-form Video Question Answering
Thong Nguyen
Zhiyuan Hu
Xiaobao Wu
Cong-Duy Nguyen
See-Kiong Ng
A. Luu
43
3
0
30 May 2024
Previous
1
2
3
4
5
...
14
15
16
Next