ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.00878
  4. Cited By
Grounding Everything: Emerging Localization Properties in
  Vision-Language Transformers

Grounding Everything: Emerging Localization Properties in Vision-Language Transformers

1 December 2023
Walid Bousselham
Felix Petersen
Vittorio Ferrari
Hilde Kuehne
    ObjD
    VLM
ArXivPDFHTML

Papers citing "Grounding Everything: Emerging Localization Properties in Vision-Language Transformers"

14 / 14 papers shown
Title
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
Mohamed Ali Souibgui
Changkyu Choi
Andrey Barsky
Kangsoo Jung
Ernest Valveny
Dimosthenis Karatzas
28
0
0
12 May 2025
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
Binh M. Le
Shaoyuan Xu
Jinmiao Fu
Zhishen Huang
Moyan Li
Yanhui Guo
Hongdong Li
Sameera Ramasinghe
Bryan Wang
35
0
0
03 Apr 2025
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
Ting Liu
Siyuan Li
44
0
0
01 Apr 2025
Disentangling CLIP for Multi-Object Perception
Disentangling CLIP for Multi-Object Perception
Samyak Rawlekar
Yujun Cai
Yiwei Wang
Ming-Hsuan Yang
Narendra Ahuja
VLM
CoGe
80
0
0
05 Feb 2025
B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable
B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable
Shreyash Arya
Sukrut Rao
Moritz Bohle
Bernt Schiele
68
2
0
28 Jan 2025
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim
Dayun Ju
Woojung Han
Ming-Hsuan Yang
Seong Jae Hwang
VLM
VOS
79
0
0
26 Nov 2024
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Sule Bai
Yong-Jin Liu
Yifei Han
Haoji Zhang
Yansong Tang
VLM
79
3
0
24 Nov 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
M. Arda Aydın
Efe Mert Çırpar
Elvin Abdinli
Gözde B. Ünal
Y. Sahin
VLM
71
0
0
18 Nov 2024
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic
  Segmentation
Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation
Sina Hajimiri
Ismail Ben Ayed
Jose Dolz
VLM
41
22
0
12 Apr 2024
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViT
VLM
192
499
0
22 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,137
0
28 Jan 2022
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
338
5,785
0
29 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
313
3,708
0
11 Feb 2021
Semantic Understanding of Scenes through the ADE20K Dataset
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
253
1,828
0
18 Aug 2016
1