ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.17345
  4. Cited By
3VL: Using Trees to Improve Vision-Language Models' Interpretability

3VL: Using Trees to Improve Vision-Language Models' Interpretability

28 December 2023
Nir Yellinek
Leonid Karlinsky
Raja Giryes
    CoGe
    VLM
ArXivPDFHTML

Papers citing "3VL: Using Trees to Improve Vision-Language Models' Interpretability"

50 / 92 papers shown
Title
A Review of 3D Object Detection with Vision-Language Models
A Review of 3D Object Detection with Vision-Language Models
Ranjan Sapkota
Konstantinos I. Roumeliotis
Rahul Harsha Cheppally
Marco Flores Calero
Manoj Karkee
VLM
114
2
0
25 Apr 2025
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Ido Cohen
Daniela Gottesman
Mor Geva
Raja Giryes
VLM
148
1
1
18 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A
  Comprehensive Survey
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
Shijie Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
122
19
0
03 Dec 2024
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Emergent Visual-Semantic Hierarchies in Image-Text Representations
Morris Alper
Hadar Averbuch-Elor
VLM
87
8
0
11 Jul 2024
Evolving Interpretable Visual Classifiers with Large Language Models
Evolving Interpretable Visual Classifiers with Large Language Models
Mia Chiquier
Utkarsh Mall
Carl Vondrick
VLM
75
10
0
15 Apr 2024
L+M-24: Building a Dataset for Language + Molecules @ ACL 2024
L+M-24: Building a Dataset for Language + Molecules @ ACL 2024
Carl Edwards
Qingyun Wang
Lawrence Zhao
Heng Ji
74
21
0
22 Feb 2024
Faith and Fate: Limits of Transformers on Compositionality
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLM
LRM
122
376
0
29 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
60
37
0
05 May 2023
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
Going Beyond Nouns With Vision & Language Models Using Synthetic Data
Paola Cascante-Bonilla
Khaled Shehada
James Smith
Sivan Doveh
Donghyun Kim
...
Gül Varol
A. Oliva
Vicente Ordonez
Rogerio Feris
Leonid Karlinsky
VLM
SyDa
67
41
0
30 Mar 2023
Variational Information Pursuit for Interpretable Predictions
Variational Information Pursuit for Interpretable Predictions
Aditya Chattopadhyay
Kwan Ho Ryan Chan
B. Haeffele
D. Geman
René Vidal
DRL
67
13
0
06 Feb 2023
Tracr: Compiled Transformers as a Laboratory for Interpretability
Tracr: Compiled Transformers as a Laboratory for Interpretability
David Lindner
János Kramár
Sebastian Farquhar
Matthew Rahtz
Tom McGrath
Vladimir Mikulik
71
75
0
12 Jan 2023
Doubly Right Object Recognition: A Why Prompt for Visual Rationales
Doubly Right Object Recognition: A Why Prompt for Visual Rationales
Chengzhi Mao
Revant Teotia
Amrutha Sundar
Sachit Menon
Junfeng Yang
Xin Eric Wang
Carl Vondrick
37
30
0
12 Dec 2022
Explaining Image Classifiers with Multiscale Directional Image
  Representation
Explaining Image Classifiers with Multiscale Directional Image Representation
Stefan Kolek
Robert Windesheim
Héctor Andrade-Loarca
Gitta Kutyniok
Ron Levie
39
5
0
22 Nov 2022
Teaching Structured Vision&Language Concepts to Vision&Language Models
Teaching Structured Vision&Language Concepts to Vision&Language Models
Sivan Doveh
Assaf Arbelle
Sivan Harary
Yikang Shen
Roei Herzig
...
Donghyun Kim
Raja Giryes
Rogerio Feris
S. Ullman
Leonid Karlinsky
VLM
CoGe
79
72
0
21 Nov 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
181
3,117
0
20 Oct 2022
When and why vision-language models behave like bags-of-words, and what
  to do about it?
When and why vision-language models behave like bags-of-words, and what to do about it?
Mert Yuksekgonul
Federico Bianchi
Pratyusha Kalluri
Dan Jurafsky
James Zou
VLM
CoGe
62
392
0
04 Oct 2022
A Survey of Neural Trees
A Survey of Neural Trees
Haoling Li
Mingli Song
Mengqi Xue
Haofei Zhang
Jingwen Ye
Lechao Cheng
Mingli Song
AI4CE
73
6
0
07 Sep 2022
Efficient Vision-Language Pretraining with Visual Concepts and
  Hierarchical Alignment
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLM
CLIP
58
27
0
29 Aug 2022
Interpretable by Design: Learning Predictors by Composing Interpretable
  Queries
Interpretable by Design: Learning Predictors by Composing Interpretable Queries
Aditya Chattopadhyay
Stewart Slocum
B. Haeffele
René Vidal
D. Geman
61
23
0
03 Jul 2022
VL-CheckList: Evaluating Pre-trained Vision-Language Models with
  Objects, Attributes and Relations
VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Tiancheng Zhao
Tianqi Zhang
Mingwei Zhu
Haozhan Shen
Kyusong Lee
Xiaopeng Lu
Jianwei Yin
VLM
CoGe
MLLM
91
97
0
01 Jul 2022
MixGen: A New Multi-Modal Data Augmentation
MixGen: A New Multi-Modal Data Augmentation
Xiaoshuai Hao
Yi Zhu
Srikar Appalaraju
Aston Zhang
Wanqian Zhang
Boyang Li
Mu Li
VLM
74
89
0
16 Jun 2022
CyCLIP: Cyclic Contrastive Language-Image Pretraining
CyCLIP: Cyclic Contrastive Language-Image Pretraining
Shashank Goel
Hritik Bansal
S. Bhatia
Ryan Rossi
Vishwa Vinay
Aditya Grover
CLIP
VLM
245
139
0
28 May 2022
Visual Spatial Reasoning
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
90
175
0
30 Apr 2022
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model
  Pretraining
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
Yuting Gao
Jinfeng Liu
Zihan Xu
Jinchao Zhang
Ke Li
Rongrong Ji
Chunhua Shen
VLM
CLIP
89
104
0
29 Apr 2022
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
  Visual Models
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
Chunyuan Li
Haotian Liu
Liunian Harold Li
Pengchuan Zhang
J. Aneja
...
Ping Jin
Houdong Hu
Zicheng Liu
Yong Jae Lee
Jianfeng Gao
67
149
0
19 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic
  Compositionality
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
95
425
0
07 Apr 2022
Vision-Language Pre-Training with Triple Contrastive Learning
Vision-Language Pre-Training with Triple Contrastive Learning
Jinyu Yang
Jiali Duan
Son N. Tran
Yi Xu
Sampath Chanda
Liqun Chen
Belinda Zeng
Trishul Chilimbi
Junzhou Huang
VLM
103
295
0
21 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
527
4,343
0
28 Jan 2022
Image Segmentation Using Text and Image Prompts
Image Segmentation Using Text and Image Prompts
Timo Lüddecke
Alexander S. Ecker
CLIP
VLM
125
468
0
18 Dec 2021
VALSE: A Task-Independent Benchmark for Vision and Language Models
  Centered on Linguistic Phenomena
VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
Letitia Parcalabescu
Michele Cafagna
Lilitta Muradjan
Anette Frank
Iacer Calixto
Albert Gatt
CoGe
64
116
0
14 Dec 2021
Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable
  Prototypes
Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes
Jonathan Donnelly
A. Barnett
Chaofan Chen
3DH
73
128
0
29 Nov 2021
FILIP: Fine-grained Interactive Language-Image Pre-Training
FILIP: Fine-grained Interactive Language-Image Pre-Training
Lewei Yao
Runhu Huang
Lu Hou
Guansong Lu
Minzhe Niu
Hang Xu
Xiaodan Liang
Zhenguo Li
Xin Jiang
Chunjing Xu
VLM
CLIP
94
634
0
09 Nov 2021
Object-Region Video Transformers
Object-Region Video Transformers
Roei Herzig
Elad Ben-Avraham
K. Mangalam
Amir Bar
Gal Chechik
Anna Rohrbach
Trevor Darrell
Amir Globerson
ViT
69
84
0
13 Oct 2021
Supervision Exists Everywhere: A Data Efficient Contrastive
  Language-Image Pre-training Paradigm
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLM
CLIP
145
453
0
11 Oct 2021
Cartoon Explanations of Image Classifiers
Cartoon Explanations of Image Classifiers
Stefan Kolek
Duc Anh Nguyen
Ron Levie
Joan Bruna
Gitta Kutyniok
FAtt
83
17
0
07 Oct 2021
Simple Post-Training Robustness Using Test Time Augmentations and Random
  Forest
Simple Post-Training Robustness Using Test Time Augmentations and Random Forest
Gilad Cohen
Raja Giryes
AAML
59
4
0
16 Sep 2021
A Framework for Learning Ante-hoc Explainable Models via Concepts
A Framework for Learning Ante-hoc Explainable Models via Concepts
Anirban Sarkar
Deepak Vijaykeerthy
Anindya Sarkar
V. Balasubramanian
LRM
BDL
73
50
0
25 Aug 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
413
10,328
0
17 Jun 2021
A Comprehensive Taxonomy for Explainable Artificial Intelligence: A
  Systematic Survey of Surveys on Methods and Concepts
A Comprehensive Taxonomy for Explainable Artificial Intelligence: A Systematic Survey of Surveys on Methods and Concepts
Gesina Schwalbe
Bettina Finzel
XAI
76
195
0
15 May 2021
NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network
  Training and Architecture Optimization
NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network Training and Architecture Optimization
Tien-Ju Yang
Yi-Lun Liao
Vivienne Sze
57
55
0
31 Mar 2021
Generic Attention-model Explainability for Interpreting Bi-Modal and
  Encoder-Decoder Transformers
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
Hila Chefer
Shir Gur
Lior Wolf
ViT
60
320
0
29 Mar 2021
Unified Graph Structured Models for Video Understanding
Unified Graph Structured Models for Video Understanding
Anurag Arnab
Chen Sun
Cordelia Schmid
86
46
0
29 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
909
29,372
0
26 Feb 2021
Do Input Gradients Highlight Discriminative Features?
Do Input Gradients Highlight Discriminative Features?
Harshay Shah
Prateek Jain
Praneeth Netrapalli
AAML
FAtt
61
59
0
25 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
443
3,842
0
11 Feb 2021
ViLT: Vision-and-Language Transformer Without Convolution or Region
  Supervision
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
Wonjae Kim
Bokyung Son
Ildoo Kim
VLM
CLIP
116
1,745
0
05 Feb 2021
A Survey on Neural Network Interpretability
A Survey on Neural Network Interpretability
Yu Zhang
Peter Tiño
A. Leonardis
K. Tang
FaML
XAI
194
678
0
28 Dec 2020
Transformer Interpretability Beyond Attention Visualization
Transformer Interpretability Beyond Attention Visualization
Hila Chefer
Shir Gur
Lior Wolf
137
664
0
17 Dec 2020
Neural Prototype Trees for Interpretable Fine-grained Image Recognition
Neural Prototype Trees for Interpretable Fine-grained Image Recognition
Meike Nauta
Ron van Bree
C. Seifert
142
266
0
03 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
634
41,003
0
22 Oct 2020
12
Next