Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.07835
Cited By
Disentangling visual and written concepts in CLIP
15 June 2022
Joanna Materzyñska
Antonio Torralba
David Bau
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Disentangling visual and written concepts in CLIP"
38 / 38 papers shown
Title
Transformation of audio embeddings into interpretable, concept-based representations
Alice Zhang
Edison Thomaz
Lie Lu
29
0
0
18 Apr 2025
Steering CLIP's vision transformer with sparse autoencoders
Sonia Joseph
Praneet Suresh
Ethan Goldfarb
Lorenz Hufe
Yossi Gandelsman
Robert Graham
Danilo Bzdok
Wojciech Samek
Blake A. Richards
56
2
0
11 Apr 2025
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
Justus Westerhoff
Erblina Purellku
Jakob Hackstein
Jonas Loos
Leo Pinetzki
Lorenz Hufe
AAML
33
0
0
07 Apr 2025
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
Xin Liang
Yogesh S Rawat
90
0
0
28 Mar 2025
Zero-Shot Visual Concept Blending Without Text Guidance
Hiroya Makino
Takahiro Yamaguchi
Hiroyuki Sakai
DiffM
50
0
0
27 Mar 2025
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Achint Soni
Meet Soni
Sirisha Rambhatla
DiffM
66
0
0
27 Mar 2025
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem
Piotr Teterwak
Kate Saenko
Bryan A. Plummer
AAML
80
0
0
17 Mar 2025
Hyperbolic Safety-Aware Vision-Language Models
Tobia Poppi
Tejaswi Kasarla
Pascal Mettes
Lorenzo Baraldi
Rita Cucchiara
VLM
MU
66
0
0
15 Mar 2025
New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook
Meng Yang
Tianqing Zhu
Chi Liu
Wanlei Zhou
Shui Yu
Philip S. Yu
AAML
ELM
PILM
69
1
0
12 Nov 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
45
3
0
09 Oct 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas J. Guibas
P. Milanfar
Feng Yang
51
2
0
07 Aug 2024
DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning
Dino Ienco
C. Dantas
44
3
0
05 Aug 2024
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval
Gangyan Zeng
Yuan Zhang
Jin Wei
Dongbao Yang
Peng Zhang
Yiwen Gao
Xugong Qin
Yu Zhou
VLM
CLIP
32
0
0
01 Aug 2024
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Sachit Menon
Richard Zemel
Carl Vondrick
LRM
45
2
0
20 Jun 2024
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Simon Schrodi
David T. Hoffmann
Max Argus
Volker Fischer
Thomas Brox
VLM
58
1
0
11 Apr 2024
ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale
Jinbin Huang
Chong Chen
Aditi Mishra
Bum Chul Kwon
Zhicheng Liu
Chris Bryan
50
4
0
03 Apr 2024
Scene Depth Estimation from Traditional Oriental Landscape Paintings
Sungho Kang
Yeonghyeon Park
H. Park
Juneho Yi
52
0
0
06 Mar 2024
Closed-Loop Unsupervised Representation Disentanglement with
β
β
β
-VAE Distillation and Diffusion Probabilistic Feedback
Xin Jin
Bo Li
Baao Xie
Wenyao Zhang
Jinming Liu
Ziqiang Li
Tao Yang
Wenjun Zeng
DRL
DiffM
CoGe
42
7
0
04 Feb 2024
Parrot Captions Teach CLIP to Spot Text
Yiqi Lin
Conghui He
Alex Jinpeng Wang
Bin Wang
Weijia Li
Mike Zheng Shou
38
7
0
21 Dec 2023
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Samuele Poppi
Tobia Poppi
Federico Cocchi
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
27
9
0
27 Nov 2023
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Morris Alper
Hadar Averbuch-Elor
48
10
0
25 Oct 2023
Interpreting CLIP's Image Representation via Text-Based Decomposition
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
VLM
21
83
0
09 Oct 2023
Rigorously Assessing Natural Language Explanations of Neurons
Jing-ling Huang
Atticus Geiger
Karel DÓosterlinck
Zhengxuan Wu
Christopher Potts
MILM
29
26
0
19 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
33
3
0
16 Sep 2023
Parts of Speech-Grounded Subspaces in Vision-Language Models
James Oldfield
Christos Tzelepis
Yannis Panagakis
M. Nicolaou
Ioannis Patras
26
9
0
23 May 2023
What does CLIP know about a red circle? Visual prompt engineering for VLMs
Aleksandar Shtedritski
Christian Rupprecht
Andrea Vedaldi
VLM
MLLM
32
142
0
13 Apr 2023
Defense-Prefix for Preventing Typographic Attacks on CLIP
Hiroki Azuma
Yusuke Matsui
VLM
AAML
20
17
0
10 Apr 2023
Zero-shot Model Diagnosis
Jinqi Luo
Zhaoning Wang
Chen Henry Wu
Dong Huang
Fernando de la Torre
VLM
24
21
0
27 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Sara Sarto
Manuele Barraco
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
21
55
0
21 Mar 2023
SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective
Zipeng Xu
Songlong Xing
E. Sangineto
N. Sebe
CLIP
30
2
0
16 Mar 2023
Teaching CLIP to Count to Ten
Roni Paiss
Ariel Ephrat
Omer Tov
Shiran Zada
Inbar Mosseri
Michal Irani
Tali Dekel
VLM
CLIP
39
93
0
23 Feb 2023
CLIPPO: Image-and-Language Understanding from Pixels Only
Michael Tschannen
Basil Mustafa
N. Houlsby
CLIP
VLM
32
48
0
15 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
37
125
0
13 Dec 2022
Task Bias in Vision-Language Models
Sachit Menon
I. Chandratreya
Carl Vondrick
VLM
SSL
27
6
0
08 Dec 2022
Disentangled Representation Learning
Xin Eric Wang
Hong Chen
Siao Tang
Zihao Wu
Wenwu Zhu
DRL
39
78
0
21 Nov 2022
What the DAAM: Interpreting Stable Diffusion Using Cross Attention
Raphael Tang
Linqing Liu
Akshat Pandey
Zhiying Jiang
Gefei Yang
K. Kumar
Pontus Stenetorp
Jimmy J. Lin
Ferhan Ture
34
167
0
10 Oct 2022
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,805
0
24 Feb 2021
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
307
39,238
0
01 Sep 2014
1