ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09996
  4. Cited By
Perceptual Grouping in Contrastive Vision-Language Models

Perceptual Grouping in Contrastive Vision-Language Models

18 October 2022
Kanchana Ranasinghe
Brandon McKinzie
S. S. Ravi
Yinfei Yang
Alexander Toshev
Jonathon Shlens
    VLM
ArXivPDFHTML

Papers citing "Perceptual Grouping in Contrastive Vision-Language Models"

50 / 55 papers shown
Title
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
Donghoon Kim
Minji Bae
Kyuhong Shim
B. Shim
36
0
0
13 May 2025
FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation
FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation
Yasser Benigmim
Mohammad Fahes
Tuan-Hung Vu
Andrei Bursuc
Raoul de Charette
VLM
37
0
0
14 Apr 2025
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation
Xiao Zhang
Xiangyu Han
Xiwen Lai
Yao Sun
Pei Zhang
Konrad Kording
34
0
0
08 Apr 2025
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
Vladan Stojnić
Yannis Kalantidis
Jirí Matas
Giorgos Tolias
VLM
46
0
0
25 Mar 2025
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level
  Vision-Language Alignment
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Cijo Jose
Théo Moutakanni
Dahyun Kang
Federico Baldassarre
Timothée Darcet
...
Maxime Oquab
Oriane Siméoni
Huy V. Vo
Patrick Labatut
Piotr Bojanowski
CLIP
VLM
94
6
0
20 Dec 2024
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language
  for Open-Vocabulary Segmentation
Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation
Luca Barsellotti
Lorenzo Bianchi
Nicola Messina
F. Carrara
Marcella Cornia
Lorenzo Baraldi
Fabrizio Falchi
Rita Cucchiara
VLM
72
2
0
28 Nov 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
M. Arda Aydın
Efe Mert Çırpar
Elvin Abdinli
Gözde B. Ünal
Y. Sahin
VLM
71
0
0
18 Nov 2024
Moving Off-the-Grid: Scene-Grounded Video Representations
Moving Off-the-Grid: Scene-Grounded Video Representations
Sjoerd van Steenkiste
Daniel Zoran
Yi Yang
Yulia Rubanova
Rishabh Kabra
...
Thomas Keck
João Carreira
Alexey Dosovitskiy
Mehdi S. M. Sajjadi
Thomas Kipf
31
3
0
08 Nov 2024
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for
  Remote Sensing Images
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
Kaiyu Li
Ruixun Liu
Xiangyong Cao
Deyu Meng
Zhi Wang
Deyu Meng
Zhi Wang
30
3
0
02 Oct 2024
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Heeseong Shin
Chaehyun Kim
Sunghwan Hong
Seokju Cho
Anurag Arnab
Paul Hongsuck Seo
Seungryong Kim
VLM
34
1
0
30 Sep 2024
Generalization Boosted Adapter for Open-Vocabulary Segmentation
Generalization Boosted Adapter for Open-Vocabulary Segmentation
Wenhao Xu
Changwei Wang
Xuxiang Feng
Rongtao Xu
Longzhao Huang
Zherui Zhang
Li Guo
Shibiao Xu
VLM
34
2
0
13 Sep 2024
iSeg: An Iterative Refinement-based Framework for Training-free
  Segmentation
iSeg: An Iterative Refinement-based Framework for Training-free Segmentation
Lin Sun
Jiale Cao
J. Xie
F. Khan
Yanwei Pang
DiffM
43
1
0
05 Sep 2024
Image Segmentation in Foundation Model Era: A Survey
Image Segmentation in Foundation Model Era: A Survey
Tianfei Zhou
Fei Zhang
Boyu Chang
Wenguan Wang
Ye Yuan
E. Konukoglu
Daniel Cremers
VLM
42
4
0
23 Aug 2024
ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
Jingyun Wang
Guoliang Kang
VLM
SSL
42
7
0
13 Aug 2024
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic
  Segmentation
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Dahyun Kang
Minsu Cho
ObjD
VLM
35
9
0
09 Aug 2024
Large-vocabulary forensic pathological analyses via prototypical
  cross-modal contrastive learning
Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning
Chen Shen
Chunfeng Lian
Wanqing Zhang
Fan Wang
Jianhua Zhang
...
Hongshu Mu
Hao Wu
Xinggong Liang
Jianhua Ma
Zhenyuan Wang
36
0
0
20 Jul 2024
PDiscoFormer: Relaxing Part Discovery Constraints with Vision
  Transformers
PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
Ananthu Aniraj
C. Dantas
Dino Ienco
Diego Marcos
34
1
0
05 Jul 2024
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
Thomas Stegmüller
Tim Lebailly
Nikola Dukic
Behzad Bozorgtabar
Tinne Tuytelaars
Jean-Philippe Thiran
VLM
36
1
0
23 Jun 2024
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
Jongwoo Park
Kanchana Ranasinghe
Kumara Kahatapitiya
Wonjeong Ryoo
Donghyun Kim
Michael S. Ryoo
65
20
0
13 Jun 2024
Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion
  Models
Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models
Qian Wang
Abdelrahman Eldesokey
Mohit Mendiratta
Fangneng Zhan
Adam Kortylewski
Christian Theobalt
Peter Wonka
DiffM
42
4
0
27 May 2024
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and
  Texts
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
Wonjae Kim
Sanghyuk Chun
Taekyung Kim
Dongyoon Han
Sangdoo Yun
39
7
0
26 Apr 2024
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Kanchana Ranasinghe
Satya Narayan Shukla
Omid Poursaeed
Michael S. Ryoo
Tsung-Yu Lin
LRM
40
23
0
11 Apr 2024
Is CLIP the main roadblock for fine-grained open-world perception?
Is CLIP the main roadblock for fine-grained open-world perception?
Lorenzo Bianchi
F. Carrara
Nicola Messina
Fabrizio Falchi
VLM
35
4
0
04 Apr 2024
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP
  to Alleviate Single Tag Bias
TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Sang-Kee Jo
Soohyun Ryu
Sungyub Kim
Eunho Yang
Kyungsu Kim
35
1
0
30 Mar 2024
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie
Zhe Gan
J. Fauconnier
Sam Dodge
Bowen Zhang
...
Zirui Wang
Ruoming Pang
Peter Grasch
Alexander Toshev
Yinfei Yang
MLLM
32
186
0
14 Mar 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
32
3
0
19 Feb 2024
Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision
Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision
Zhaoqing Wang
Xiaobo Xia
Ziye Chen
Xiao He
Yandong Guo
Mingming Gong
Tongliang Liu
VLM
16
11
0
14 Feb 2024
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
Koichi Namekata
Amirmojtaba Sabour
Sanja Fidler
Seung Wook Kim
47
18
0
22 Jan 2024
Improving fine-grained understanding in image-text pre-training
Improving fine-grained understanding in image-text pre-training
Ioana Bica
Anastasija Ilić
Matthias Bauer
Goker Erdogan
Matko Bovsnjak
...
A. Gritsenko
Matthias Minderer
Charles Blundell
Razvan Pascanu
Jovana Mitrović
VLM
23
22
0
18 Jan 2024
TagAlign: Improving Vision-Language Alignment with Multi-Tag
  Classification
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Qinying Liu
Wei Wu
Kecheng Zheng
Zhan Tong
Jiawei Liu
Yu Liu
Wei Chen
Zilei Wang
Yujun Shen
VLM
20
6
0
21 Dec 2023
CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary
  semantic segmentation
CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation
Monika Wysoczañska
Oriane Siméoni
Michael Ramamonjisoa
Andrei Bursuc
Tomasz Trzciñski
Patrick Pérez
VLM
CLIP
26
29
0
19 Dec 2023
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Feng Wang
Jieru Mei
Alan L. Yuille
VLM
24
54
0
04 Dec 2023
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf
  Vision-Language Models
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
Jiayun Luo
Siddhesh Khandelwal
Leonid Sigal
Boyang Albert Li
MLLM
VLM
29
7
0
28 Nov 2023
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image
  Generation
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
Yuhui Zhang
Brandon McKinzie
Zhe Gan
Vaishaal Shankar
Alexander Toshev
23
3
0
27 Nov 2023
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial
  Understanding
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Haoxiang Wang
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Mehrdad Farajtabar
Sachin Mehta
Mohammad Rastegari
Oncel Tuzel
Hadi Pouransari
VLM
27
67
0
23 Oct 2023
CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement
CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement
Mohammadreza Salehi
Mehrdad Farajtabar
Maxwell Horton
Fartash Faghri
Hadi Pouransari
Raviteja Vemulapalli
Oncel Tuzel
Ali Farhadi
Mohammad Rastegari
Sachin Mehta
CLIP
VLM
37
1
0
21 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
19
33
0
20 Oct 2023
Data Filtering Networks
Data Filtering Networks
Alex Fang
Albin Madappally Jose
Amit Jain
Ludwig Schmidt
Alexander Toshev
Vaishaal Shankar
CLIP
32
124
0
29 Sep 2023
CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic
  Segmentation For-Free
CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free
Monika Wysoczañska
Michael Ramamonjisoa
Tomasz Trzciñski
Oriane Siméoni
3DV
VLM
24
20
0
25 Sep 2023
Diffusion Model is Secretly a Training-free Open Vocabulary Semantic
  Segmenter
Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter
Jinglong Wang
Xiawei Li
Jing Zhang
Qingyuan Xu
Qin Zhou
Qian Yu
Lu Sheng
Dong Xu
VLM
DiffM
21
45
0
06 Sep 2023
Language-based Action Concept Spaces Improve Video Self-Supervised
  Learning
Language-based Action Concept Spaces Improve Video Self-Supervised Learning
Kanchana Ranasinghe
Michael S. Ryoo
SSL
VLM
34
12
0
20 Jul 2023
Learning Open-vocabulary Semantic Segmentation Models From Natural
  Language Supervision
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
Jilan Xu
Junlin Hou
Yuejie Zhang
Rui Feng
Yi Wang
Yu Qiao
Weidi Xie
VLM
16
81
0
22 Jan 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
23
4
0
19 Jan 2023
Self-Supervised Visual Representation Learning with Semantic Grouping
Self-Supervised Visual Representation Learning with Semantic Grouping
Xin Wen
Bingchen Zhao
Anlin Zheng
X. Zhang
Xiaojuan Qi
SSL
117
71
0
30 May 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
X. Wang
ViT
VLM
189
499
0
22 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,434
0
11 Nov 2021
Intriguing Properties of Vision Transformers
Intriguing Properties of Vision Transformers
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Munawar Hayat
F. Khan
Ming-Hsuan Yang
ViT
256
620
0
21 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
314
5,775
0
29 Apr 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
275
1,081
0
17 Feb 2021
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals
Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals
Wouter Van Gansbeke
Simon Vandenhende
Stamatios Georgoulis
Luc Van Gool
SSL
188
250
0
11 Feb 2021
12
Next