ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.05770
  4. Cited By
PolyMaX: General Dense Prediction with Mask Transformer

PolyMaX: General Dense Prediction with Mask Transformer

9 November 2023
Xuan S. Yang
Liangzhe Yuan
Kimberly Wilber
Astuti Sharma
Xiuye Gu
Siyuan Qiao
Stephanie Debats
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Liang-Chieh Chen
ArXivPDFHTML

Papers citing "PolyMaX: General Dense Prediction with Mask Transformer"

22 / 22 papers shown
Title
PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes
PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes
Xinhua Xu
Hong Liu
Jianbing Wu
Jinfu Liu
DiffM
59
0
0
24 Mar 2025
Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks
Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks
Alessio Quercia
Erenus Yildiz
Zhuo Cao
Kai Krajsek
Abigail Morrison
Ira Assent
Hanno Scharr
56
0
0
22 Jan 2025
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision
  Transformer
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Ding Jia
Jianyuan Guo
Kai Han
Han Wu
Chao Zhang
Chang Xu
Xinghao Chen
ViT
48
16
0
03 Jun 2024
COCONut: Modernizing COCO Segmentation
COCONut: Modernizing COCO Segmentation
XueQing Deng
Qihang Yu
Peng Wang
Xiaohui Shen
Liang-Chieh Chen
48
16
0
12 Apr 2024
HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid,
  Asymmetric, and Progressive Heterogeneous Feature Fusion
HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion
Jiahang Li
Peng Yun
Qijun Chen
Rui Fan
41
8
0
04 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
41
24
0
02 Apr 2024
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
Mu Hu
Wei Yin
C. Zhang
Zhipeng Cai
Xiaoxiao Long
Kaixuan Wang
Kaixuan Wang
Gang Yu
Chunhua Shen
Shaojie Shen
3DGS
54
117
0
22 Mar 2024
3D Human Reconstruction in the Wild with Synthetic Data Using Generative
  Models
3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models
Yongtao Ge
Wenjia Wang
Yongfan Chen
Hao Chen
Chunhua Shen
3DH
40
8
0
17 Mar 2024
EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature
  Refinement and Regularized Image-Text Alignment
EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment
M. Lavrenyuk
Shariq Farooq Bhat
Matthias Müller
Peter Wonka
ObjD
MDE
31
9
0
13 Dec 2023
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene
  Understanding
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding
Hanrong Ye
Dan Xu
ViT
29
10
0
08 Jun 2023
CLUSTSEG: Clustering for Universal Segmentation
CLUSTSEG: Clustering for Universal Segmentation
James Liang
Tianfei Zhou
Dongfang Liu
Wenguan Wang
VLM
69
48
0
03 May 2023
Unleashing Text-to-Image Diffusion Models for Visual Perception
Unleashing Text-to-Image Diffusion Models for Visual Perception
Wenliang Zhao
Yongming Rao
Zuyan Liu
Benlin Liu
Jie Zhou
Jiwen Lu
ObjD
VLM
MDE
163
217
0
03 Mar 2023
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Alexander Kolesnikov
André Susano Pinto
Lucas Beyer
Xiaohua Zhai
Jeremiah Harmsen
N. Houlsby
103
67
0
20 May 2022
MulT: An End-to-End Multitask Learning Transformer
MulT: An End-to-End Multitask Learning Transformer
Deblina Bhattacharjee
Tong Zhang
Sabine Süsstrunk
Mathieu Salzmann
ViT
42
63
0
17 May 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
226
226
0
20 Jan 2022
Deep High-Resolution Representation Learning for Visual Recognition
Deep High-Resolution Representation Learning for Visual Recognition
Jingdong Wang
Ke Sun
Tianheng Cheng
Borui Jiang
Chaorui Deng
...
Yadong Mu
Mingkui Tan
Xinggang Wang
Wenyu Liu
Bin Xiao
195
3,534
0
20 Aug 2019
Deep Ordinal Regression Network for Monocular Depth Estimation
Deep Ordinal Regression Network for Monocular Depth Estimation
Huan Fu
Biwei Huang
Chaohui Wang
Kayhan Batmanghelich
Dacheng Tao
MDE
200
1,708
0
06 Jun 2018
Aggregated Residual Transformations for Deep Neural Networks
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,225
0
16 Nov 2016
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image
  Segmentation
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Vijay Badrinarayanan
Alex Kendall
R. Cipolla
SSeg
446
15,645
0
02 Nov 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
345
75,888
0
18 May 2015
Designing Deep Networks for Surface Normal Estimation
Designing Deep Networks for Surface Normal Estimation
Xinyu Wang
David Fouhey
Abhinav Gupta
3DV
SSL
167
353
0
18 Nov 2014
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,217
0
01 Sep 2014
1