ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11173
  4. Cited By
Going Denser with Open-Vocabulary Part Segmentation

Going Denser with Open-Vocabulary Part Segmentation

18 May 2023
Pei Sun
Shoufa Chen
Chenchen Zhu
Fanyi Xiao
Ping Luo
Saining Xie
Zhicheng Yan
    ObjDVLM
ArXiv (abs)PDFHTML

Papers citing "Going Denser with Open-Vocabulary Part Segmentation"

50 / 50 papers shown
Title
UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms
UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms
Xueyang Guo
Hongwei Hu
Chengye Song
Jingshu Chen
Zilin Zhao
Yu Fu
Bowen Guan
Zhenze Liu
75
0
0
11 May 2025
GAT-Grasp: Gesture-Driven Affordance Transfer for Task-Aware Robotic Grasping
Ruixiang Wang
Huayi Zhou
Xinyue Yao
Guiliang Liu
Kui Jia
102
0
0
08 Mar 2025
Generative Artificial Intelligence in Robotic Manipulation: A Survey
Kun Zhang
Peng Yun
Jun Cen
Junhao Cai
DiDi Zhu
...
Qifeng Chen
Jia Pan
Wei Zhang
Bo Yang
Hua Chen
148
1
0
05 Mar 2025
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
M. Arda Aydın
Efe Mert Çırpar
Elvin Abdinli
Gözde B. Ünal
Y. Sahin
VLM
253
1
0
18 Nov 2024
Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Ayca Takmaz
Alexandros Delitzas
R. Sumner
Francis Engelmann
Johanna Wald
Federico Tombari
128
12
0
27 Sep 2024
Functional Eigen-Grasping Using Approach Heatmaps
Functional Eigen-Grasping Using Approach Heatmaps
Malek Aburub
Kazuki Higashi
Weiwei Wan
Kensuke Harada
153
1
0
22 Jan 2024
Auto-Vocabulary Semantic Segmentation
Auto-Vocabulary Semantic Segmentation
Osman Ülger
Maksymilian Kulicki
Yuki M. Asano
Martin R. Oswald
VLM
112
2
0
07 Dec 2023
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with
  Millions of APIs
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs
Yaobo Liang
Chenfei Wu
Ting Song
Wenshan Wu
Yan Xia
...
Shaoguang Mao
Yuntao Wang
Linjun Shou
Ming Gong
Nan Duan
LLMAGCLL
73
201
0
29 Mar 2023
ConceptFusion: Open-set Multimodal 3D Mapping
ConceptFusion: Open-set Multimodal 3D Mapping
Krishna Murthy Jatavallabhula
Ali Kuwajerwala
Qiao Gu
Mohd. Omama
Tao Chen
...
Celso Miguel de Melo
Madhava Krishna
Liam Paull
Florian Shkurti
Antonio Torralba
68
241
0
14 Feb 2023
Learning 6-DoF Fine-grained Grasp Detection Based on Part Affordance Grounding
Learning 6-DoF Fine-grained Grasp Detection Based on Part Affordance Grounding
Yaoxian Song
Penglei Sun
Piaopiao Jin
Yi Ren
Yu Zheng
Zhixu Li
Xiaowen Chu
Yueying Zhang
Tiefeng Li
Jason Gu
129
17
0
27 Jan 2023
PACO: Parts and Attributes of Common Objects
PACO: Parts and Attributes of Common Objects
Vignesh Ramanathan
Anmol Kalia
Vladan Petrovic
Yiqian Wen
Baixue Zheng
...
Abhishek Kadian
Amir Mousavi
Yi-Zhe Song
Abhimanyu Dubey
D. Mahajan
VLM
79
103
0
04 Jan 2023
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language
  Models
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Weicheng Kuo
Huayu Chen
Xiuye Gu
A. Piergiovanni
A. Angelova
MLLMVLMObjD
131
137
0
30 Sep 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjDCLIPVLMViTOCL
92
313
0
12 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
382
3,542
0
29 Apr 2022
Panoptic-PartFormer: Learning a Unified Model for Panoptic Part
  Segmentation
Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation
Xiangtai Li
Shilin Xu
Yibo Yang
Guangliang Cheng
Yunhai Tong
Dacheng Tao
ViT
48
46
0
10 Apr 2022
Learning to Prompt for Open-Vocabulary Object Detection with
  Vision-Language Model
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
Yu Du
Fangyun Wei
Zihe Zhang
Miaojing Shi
Yue Gao
Guoqi Li
VPVLMVLM
79
333
0
28 Mar 2022
R3M: A Universal Visual Representation for Robot Manipulation
R3M: A Universal Visual Representation for Robot Manipulation
Suraj Nair
Aravind Rajeswaran
Vikash Kumar
Chelsea Finn
Abhi Gupta
LM&Ro
98
573
0
23 Mar 2022
Detecting Twenty-thousand Classes using Image-level Supervision
Detecting Twenty-thousand Classes using Image-level Supervision
Xingyi Zhou
Rohit Girdhar
Armand Joulin
Phillip Krahenbuhl
Ishan Misra
CLIPVLM
103
614
0
07 Jan 2022
RegionCLIP: Region-based Language-Image Pretraining
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLMCLIP
140
577
0
16 Dec 2021
Deep ViT Features as Dense Visual Descriptors
Deep ViT Features as Dense Visual Descriptors
Shirzad Amir
Yossi Gandelsman
Shai Bagon
Tali Dekel
MDEViT
106
287
0
10 Dec 2021
Grounded Language-Image Pre-training
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjDVLM
126
1,062
0
07 Dec 2021
Masked-attention Mask Transformer for Universal Image Segmentation
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
Alex Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
248
2,364
0
02 Dec 2021
PartImageNet: A Large, High-Quality Dataset of Parts
PartImageNet: A Large, High-Quality Dataset of Parts
Ju He
Shuo Yang
Shaokang Yang
Adam Kortylewski
Xiaoding Yuan
Jieneng Chen
Shuai Liu
Cheng Yang
Qihang Yu
Alan Yuille
3DVMLLM3DHVLM
94
97
0
02 Dec 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
399
1,092
0
13 Oct 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLMMLLM
128
799
0
24 Aug 2021
Multi-scale Matching Networks for Semantic Correspondence
Multi-scale Matching Networks for Semantic Correspondence
Dongyang Zhao
Ziyang Song
Zhenghao Ji
Gangming Zhao
Weifeng Ge
Yizhou Yu
72
49
0
31 Jul 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
196
1,960
0
16 Jul 2021
Part-aware Panoptic Segmentation
Part-aware Panoptic Segmentation
Daan de Geus
Panagiotis Meletis
Chenyang Lu
Xiaoxiao Wen
Gijs Dubbelman
93
62
0
11 Jun 2021
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
VLMObjD
280
917
0
28 Apr 2021
Differentiable Multi-Granularity Human Representation Learning for
  Instance-Aware Human Semantic Parsing
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
Tianfei Zhou
Wenguan Wang
Si Liu
Yi Yang
Luc Van Gool
3DH
160
64
0
08 Mar 2021
Open-Vocabulary Object Detection Using Captions
Open-Vocabulary Object Detection Using Captions
Alireza Zareian
Kevin Dela Rosa
Derek Hao Hu
Shih-Fu Chang
VLMObjD
122
432
0
20 Nov 2020
Fashionpedia: Ontology, Segmentation, and an Attribute Localization
  Dataset
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
Menglin Jia
Mengyun Shi
Mikhail Sirotenko
Huayu Chen
Claire Cardie
B. Hariharan
Hartwig Adam
Serge J. Belongie
68
96
0
26 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
108
1,941
0
13 Apr 2020
Cross-domain Correspondence Learning for Exemplar-based Image
  Translation
Cross-domain Correspondence Learning for Exemplar-based Image Translation
Peiying Zhang
Bo Zhang
Dong Chen
Lu Yuan
Fang Wen
77
239
0
12 Apr 2020
CATER: A diagnostic dataset for Compositional Actions and TEmporal
  Reasoning
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning
Rohit Girdhar
Deva Ramanan
61
178
0
10 Oct 2019
LVIS: A Dataset for Large Vocabulary Instance Segmentation
LVIS: A Dataset for Large Vocabulary Instance Segmentation
Agrim Gupta
Piotr Dollár
Ross B. Girshick
ISegVLM
103
1,371
0
08 Aug 2019
SFNet: Learning Object-aware Semantic Correspondence
SFNet: Learning Object-aware Semantic Correspondence
Junghyup Lee
Dohyung Kim
Jean Ponce
Bumsub Ham
3DPC
69
141
0
03 Apr 2019
Parsing R-CNN for Instance-Level Human Analysis
Parsing R-CNN for Instance-Level Human Analysis
Lu Yang
Q. Song
Zhihui Wang
Ming Jiang
SSeg
99
123
0
30 Nov 2018
ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for
  Autonomous Driving
ApolloCar3D: A Large 3D Car Instance Understanding Benchmark for Autonomous Driving
Xibin Song
Peng Wang
Dingfu Zhou
Rui Zhu
Chenye Guan
Yuchao Dai
Hao Su
Hongdong Li
Ruigang Yang
3DPC
78
158
0
29 Nov 2018
ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations
ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations
Shuai Zheng
Fan Yang
M. Kiapour
Robinson Piramuthu
62
139
0
03 Jul 2018
VizWiz Grand Challenge: Answering Visual Questions from Blind People
VizWiz Grand Challenge: Answering Visual Questions from Blind People
Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
CoGe
95
849
0
22 Feb 2018
Cascade R-CNN: Delving into High Quality Object Detection
Cascade R-CNN: Delving into High Quality Object Detection
Zhaowei Cai
Nuno Vasconcelos
ObjD
141
4,930
0
03 Dec 2017
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
352
27,195
0
20 Mar 2017
Look into Person: Self-supervised Structure-sensitive Learning and A New
  Benchmark for Human Parsing
Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing
Ke Gong
Xiaodan Liang
Dongyu Zhang
Xiaohui Shen
Liang Lin
SSL
48
476
0
16 Mar 2017
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Nayeon Lee
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
474
22,108
0
09 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
342
3,246
0
02 Dec 2016
Semantic Understanding of Scenes through the ADE20K Dataset
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
402
1,881
0
18 Aug 2016
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for
  Richer Image-to-Sentence Models
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Bryan A. Plummer
Liwei Wang
Christopher M. Cervantes
Juan C. Caicedo
Julia Hockenmaier
Svetlana Lazebnik
199
2,060
0
19 May 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
215
2,478
0
01 Apr 2015
Detect What You Can: Detecting and Representing Objects using Holistic
  Models and Body Parts
Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts
Xianjie Chen
Roozbeh Mottaghi
Xiaobai Liu
Sanja Fidler
R. Urtasun
Alan Yuille
98
640
0
08 Jun 2014
1