ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.11270
  4. Cited By
Generalized Decoding for Pixel, Image, and Language

Generalized Decoding for Pixel, Image, and Language

21 December 2022
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
Chunyuan Li
Xiyang Dai
Harkirat Singh Behl
Jianfeng Wang
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
    VLM
    MLLM
    ObjD
ArXivPDFHTML

Papers citing "Generalized Decoding for Pixel, Image, and Language"

45 / 45 papers shown
Title
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Gabriele Rosi
Fabio Cermelli
VLM
42
0
0
06 May 2025
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Shiu-hong Kao
Yu-Wing Tai
Chi-Keung Tang
LRM
MLLM
56
0
0
10 Mar 2025
Boltzmann Attention Sampling for Image Analysis with Small Objects
Boltzmann Attention Sampling for Image Analysis with Small Objects
Theodore Zhao
Sid Kiblawi
Naoto Usuyama
Ho Hin Lee
Sam Preston
Hoifung Poon
Mu-Hsin Wei
MedIm
73
0
0
04 Mar 2025
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond
Guanyao Wu
Haoyu Liu
Hongming Fu
Yichuan Peng
Jinyuan Liu
Xin-Yue Fan
Risheng Liu
66
0
0
03 Mar 2025
Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance
Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance
Duc-Hai Pham
Duc Dung Nguyen
Anh Pham
Ho Lai Tuan
P. Nguyen
Khoi Duc Minh Nguyen
Rang Nguyen
3DPC
47
1
0
10 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
99
48
0
03 Jan 2025
Interpreting Object-level Foundation Models via Visual Precision Search
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen
Siyuan Liang
Jingzhi Li
Shiming Liu
Maosen Li
Zheng Huang
Hua Zhang
Xiaochun Cao
FAtt
82
4
0
25 Nov 2024
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
44
7
0
30 Sep 2024
Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Search3D: Hierarchical Open-Vocabulary 3D Segmentation
Ayca Takmaz
Alexandros Delitzas
R. Sumner
Francis Engelmann
Johanna Wald
Federico Tombari
75
11
0
27 Sep 2024
Unleashing the Power of Generic Segmentation Models: A Simple Baseline
  for Infrared Small Target Detection
Unleashing the Power of Generic Segmentation Models: A Simple Baseline for Infrared Small Target Detection
Mingjin Zhang
Chi Zhang
Qiming Zhang
Yunsong Li
Xinbo Gao
Jing Zhang
VLM
30
3
0
07 Sep 2024
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring
  Expression Segmentation
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen
Wei-Hua Li
Cheng Sun
Yu-Chiang Frank Wang
Chu-Song Chen
VLM
39
11
0
01 Sep 2024
Visual Agents as Fast and Slow Thinkers
Visual Agents as Fast and Slow Thinkers
Guangyan Sun
Mingyu Jin
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAG
LRM
77
13
0
16 Aug 2024
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen
Lingxi Xie
Xinyue Huo
Xuehui Yu
Xiaopeng Zhang
Yingfei Sun
Zhenjun Han
Qi Tian
VLM
68
1
0
23 Jul 2024
SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images
SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images
Weiyi Xie
Nathalie Willems
Shubham Patil
Yang Li
Mayank Kumar
54
13
0
05 Jul 2024
F-LMM: Grounding Frozen Large Multimodal Models
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu
Sheng Jin
Wenwei Zhang
Lumin Xu
Wentao Liu
Wei Li
Chen Change Loy
MLLM
75
12
0
09 Jun 2024
PerSense: Personalized Instance Segmentation in Dense Images
PerSense: Personalized Instance Segmentation in Dense Images
Muhammad Ibraheem Siddiqui
Muhammad Umer Sheikh
Hassan Abid
Muhammad Haris Khan
VLM
57
0
0
22 May 2024
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition
Kun Yuan
V. Srivastav
Nassir Navab
N. Padoy
41
11
0
16 May 2024
Segment Any 3D Object with Language
Segment Any 3D Object with Language
Seungjun Lee
Yuyang Zhao
Gim Hee Lee
41
1
0
02 Apr 2024
Multi-Grained Cross-modal Alignment for Learning Open-vocabulary
  Semantic Segmentation from Text Supervision
Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision
Yajie Liu
Pu Ge
Qingjie Liu
Di Huang
75
2
0
06 Mar 2024
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Ming-hui Li
Shuai Li
Xindong Zhang
Lei Zhang
VOS
41
16
0
28 Feb 2024
ODIN: A Single Model for 2D and 3D Segmentation
ODIN: A Single Model for 2D and 3D Segmentation
Ayush Jain
Pushkal Katara
N. Gkanatsios
Adam W. Harley
Gabriel H. Sarch
Kriti Aggarwal
Vishrav Chaudhary
Katerina Fragkiadaki
3DPC
40
7
0
04 Jan 2024
LISA++: An Improved Baseline for Reasoning Segmentation with Large
  Language Model
LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model
Senqiao Yang
Tianyuan Qu
Xin Lai
Zhuotao Tian
Bohao Peng
Shu-Lin Liu
Jiaya Jia
VLM
21
28
0
28 Dec 2023
See, Say, and Segment: Teaching LMMs to Overcome False Premises
See, Say, and Segment: Teaching LMMs to Overcome False Premises
Tsung-Han Wu
Giscard Biamby
David M. Chan
Lisa Dunlap
Ritwik Gupta
Xudong Wang
Joseph E. Gonzalez
Trevor Darrell
VLM
MLLM
34
18
0
13 Dec 2023
PEEKABOO: Interactive Video Generation via Masked-Diffusion
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Yash Jain
Anshul Nasery
Vibhav Vineet
Harkirat Singh Behl
VGen
28
30
0
12 Dec 2023
Auto-Vocabulary Semantic Segmentation
Auto-Vocabulary Semantic Segmentation
Osman Ülger
Maksymilian Kulicki
Yuki M. Asano
Martin R. Oswald
VLM
45
2
0
07 Dec 2023
Uni3DL: Unified Model for 3D and Language Understanding
Uni3DL: Unified Model for 3D and Language Understanding
Xiang Li
Jian Ding
Zhaoyang Chen
Mohamed Elhoseiny
30
3
0
05 Dec 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
56
104
0
09 Nov 2023
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model
Jinjin Xu
Liwu Xu
Yuzhe Yang
Xiang Li
Fanyi Wang
Yanchun Xie
Yi-Jie Huang
Yaqian Li
MoE
MLLM
VLM
27
12
0
09 Nov 2023
Tracking Anything with Decoupled Video Segmentation
Tracking Anything with Decoupled Video Segmentation
Ho Kei Cheng
Seoung Wug Oh
Brian L. Price
Alexander Schwing
Joon-Young Lee
VOS
VLM
32
121
0
07 Sep 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
29
118
0
25 Jul 2023
Hierarchical Open-vocabulary Universal Image Segmentation
Hierarchical Open-vocabulary Universal Image Segmentation
Xudong Wang
Shufang Li
Konstantinos Kallidromitis
Yu Kato
Kazuki Kozuka
Trevor Darrell
VLM
OCL
37
36
0
03 Jul 2023
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Xiuye Gu
Yin Cui
Jonathan Huang
Abdullah M. Rashwan
X. Yang
...
Golnaz Ghiasi
Weicheng Kuo
Huizhong Chen
Liang-Chieh Chen
David A. Ross
ISeg
28
26
0
02 Jun 2023
Interactive Segment Anything NeRF with Feature Imitation
Interactive Segment Anything NeRF with Feature Imitation
Xiaokang Chen
Jiaxiang Tang
Diwen Wan
Jingbo Wang
Gang Zeng
48
22
0
25 May 2023
Advancing Referring Expression Segmentation Beyond Single Image
Advancing Referring Expression Segmentation Beyond Single Image
YiXuan Wu
Zhao Zhang
Xie Chi
Feng Zhu
Rui Zhao
VLM
34
18
0
21 May 2023
SAM.MD: Zero-shot medical image segmentation capabilities of the Segment
  Anything Model
SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model
Saikat Roy
Tassilo Wald
Gregor Koehler
Maximilian R. Rokuss
Nico Disch
Julius Holzschuh
David Zimmerer
Klaus H. Maier-Hein
MedIm
VLM
24
94
0
10 Apr 2023
GLIGEN: Open-Set Grounded Text-to-Image Generation
GLIGEN: Open-Set Grounded Text-to-Image Generation
Yuheng Li
Haotian Liu
Qingyang Wu
Fangzhou Mu
Jianwei Yang
Jianfeng Gao
Chunyuan Li
Yong Jae Lee
VLM
53
569
1
17 Jan 2023
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for
  Open-world Detection
DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection
Lewei Yao
Jianhua Han
Youpeng Wen
Xiaodan Liang
Dan Xu
Wei Zhang
Zhenguo Li
Chunjing Xu
Hang Xu
CLIP
VLM
115
152
0
20 Sep 2022
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Alexander Kolesnikov
André Susano Pinto
Lucas Beyer
Xiaohua Zhai
Jeremiah Harmsen
N. Houlsby
103
67
0
20 May 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
X. Wang
ViT
VLM
189
499
0
22 Feb 2022
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip H. S. Torr
141
306
0
04 Dec 2021
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
225
898
0
28 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
256
525
0
04 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal
  Transformers
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
75
110
0
31 Jan 2021
Conditional Convolutions for Instance Segmentation
Conditional Convolutions for Instance Segmentation
Zhi Tian
Chunhua Shen
Hao Chen
ISeg
179
597
0
12 Mar 2020
1