ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1608.05442
  4. Cited By
Semantic Understanding of Scenes through the ADE20K Dataset
v1v2 (latest)

Semantic Understanding of Scenes through the ADE20K Dataset

18 August 2016
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
    SSeg
ArXiv (abs)PDFHTML

Papers citing "Semantic Understanding of Scenes through the ADE20K Dataset"

50 / 67 papers shown
Title
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
A. Fuller
Yousef Yassin
Junfeng Wen
Daniel G. Kyrollos
Tarek Ibrahim
James R. Green
Evan Shelhamer
ViT
182
0
0
23 May 2025
REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders
Savya Khosla
Sethuraman TV
Barnett Lee
Alexander Schwing
Derek Hoiem
VGen
163
0
0
23 May 2025
Stronger ViTs With Octic Equivariance
Stronger ViTs With Octic Equivariance
David Nordström
Johan Edstedt
Fredrik Kahl
Georg Bökman
ViT
216
0
0
21 May 2025
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
Timo Kaiser
Thomas Norrenbrock
Bodo Rosenhahn
151
0
0
08 May 2025
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
179
0
0
15 Apr 2025
vGamba: Attentive State Space Bottleneck for efficient Long-range Dependencies in Visual Recognition
vGamba: Attentive State Space Bottleneck for efficient Long-range Dependencies in Visual Recognition
Yunusa Haruna
A. Lawan
Mamba
121
0
0
27 Mar 2025
DeLoRA: Decoupling Angles and Strength in Low-rank Adaptation
DeLoRA: Decoupling Angles and Strength in Low-rank Adaptation
Massimo Bini
Leander Girrbach
Zeynep Akata
190
1
0
23 Mar 2025
RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment
RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment
Chao Wang
Giulio Franzese
A. Finamore
Pietro Michiardi
229
0
0
18 Mar 2025
APLA: A Simple Adaptation Method for Vision Transformers
APLA: A Simple Adaptation Method for Vision Transformers
Moein Sorkhei
Emir Konuk
Kevin Smith
Christos Matsoukas
129
0
0
14 Mar 2025
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
R. Hu
Lianghui Zhu
Yuxuan Zhang
Tianheng Cheng
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
ObjD
149
0
0
13 Mar 2025
Implicit Contrastive Representation Learning with Guided Stop-gradient
Byeongchan Lee
Sehyun Lee
SSL
264
2
0
12 Mar 2025
Vision-LSTM: xLSTM as Generic Vision Backbone
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
221
49
0
24 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
251
8
0
24 Feb 2025
Exploring Mutual Cross-Modal Attention for Context-Aware Human Affordance Generation
Exploring Mutual Cross-Modal Attention for Context-Aware Human Affordance Generation
Prasun Roy
Saumik Bhattacharya
Subhankar Ghosh
Umapada Pal
Michael Blumenstein
109
0
0
20 Feb 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
472
5
0
25 Jan 2025
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Qu He
Jinlong Peng
P. Xu
Boyuan Jiang
Xiaobin Hu
...
Yang Liu
Yun Wang
Chengjie Wang
Xuelong Li
Jing Zhang
DiffM
190
1
0
04 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
319
3
0
02 Dec 2024
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim
Dayun Ju
Woojung Han
Ming-Hsuan Yang
Seong Jae Hwang
VLMVOS
259
1
0
26 Nov 2024
VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation
VideoSAM: A Large Vision Foundation Model for High-Speed Video Segmentation
Chika Maduabuchi
Ericmoore Jossou
Matteo Bucci
84
1
0
22 Oct 2024
Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion
Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion
Chaodong Xiao
Minghan Li
Zhengqiang Zhang
Deyu Meng
Lei Zhang
Mamba
132
6
0
19 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
245
7
0
14 Oct 2024
Enhanced Generative Data Augmentation for Semantic Segmentation via Stronger Guidance
Enhanced Generative Data Augmentation for Semantic Segmentation via Stronger Guidance
Quang-Huy Che
Duc-Tri Le
Vinh-Tiep Nguyen
D. Lam
Vinh-Tiep Nguyen
DiffM
227
1
0
09 Sep 2024
Physically Feasible Semantic Segmentation
Physically Feasible Semantic Segmentation
Shamik Basu
Luc Van Gool
Daniel Gehrig
196
1
0
26 Aug 2024
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding
Youjun Zhao
Jiaying Lin
Shuquan Ye
Qianshi Pang
Rynson W. H. Lau
152
2
0
20 Aug 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
115
2
0
29 Jul 2024
BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments
BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments
Yu-Yun Tseng
Tanusree Sharma
Lotus Zhang
Abigale Stangl
Leah Findlater
Yang Wang
Danna Gurari
159
0
0
25 Jul 2024
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
SegSTRONG-C: Segmenting Surgical Tools Robustly On Non-adversarial Generated Corruptions -- An EndoVis'24 Challenge
Hao Ding
Tuxun Lu
Yuqian Zhang
Ruixing Liang
Hongchao Shu
...
Bo Wang
Marcos Fernández-Rodríguez
Estevao Lima
João L. Vilaça
Mathias Unberath
228
4
0
16 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
130
5
0
09 Jul 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
185
31
0
28 Jun 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLMLRM
220
197
0
29 Apr 2024
Gen3DSR: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View
Gen3DSR: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View
Andreea Dogaru
M. Ozer
Bernhard Egger
3DGS
130
7
0
04 Apr 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
154
47
0
29 Mar 2024
Subobject-level Image Tokenization
Subobject-level Image Tokenization
Delong Chen
Samuel Cahyawijaya
Jianfeng Liu
Baoyuan Wang
Pascale Fung
VLMOCL
256
9
0
22 Feb 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Junlong Du
Yue Fan
Qing Li
Qing Li
Yuntao Du
VLM
146
85
0
03 Feb 2024
Rethinking Patch Dependence for Masked Autoencoders
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
116
16
0
25 Jan 2024
How to Efficiently Annotate Images for Best-Performing Deep Learning Based Segmentation Models: An Empirical Study with Weak and Noisy Annotations and Segment Anything Model
How to Efficiently Annotate Images for Best-Performing Deep Learning Based Segmentation Models: An Empirical Study with Weak and Noisy Annotations and Segment Anything Model
Yixin Zhang
Shen Zhao
Han Gu
Maciej A. Mazurowski
VLM
108
4
0
17 Dec 2023
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Lorenzo Baraldi
Roberto Amoroso
Marcella Cornia
Lorenzo Baraldi
Andrea Pilzer
Rita Cucchiara
135
2
0
12 Jun 2023
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
140
37
0
01 Jun 2022
Video Region Annotation with Sparse Bounding Boxes
Video Region Annotation with Sparse Bounding Boxes
Yuzheng Xu
Yang Wu
Nur Sabrina binti Zuraimi
S. Nobuhara
Ko Nishino
VGen
137
2
0
17 Aug 2020
EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's
  Principle
EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's Principle
Trisha Mittal
P. Guhan
Uttaran Bhattacharya
Rohan Chandra
Aniket Bera
Tianyi Zhou
124
135
0
14 Mar 2020
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action
  Video Understanding
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Mathew Monfort
Bowen Pan
K. Ramakrishnan
A. Andonian
Barry A. McNamara
A. Lascelles
Quanfu Fan
Dan Gutfreund
Rogerio Feris
A. Oliva
VLM
96
68
0
01 Nov 2019
Acquisition of Localization Confidence for Accurate Object Detection
Acquisition of Localization Confidence for Accurate Object Detection
Borui Jiang
Ruixuan Luo
Jiayuan Mao
Tete Xiao
Yuning Jiang
ObjD
75
852
0
30 Jul 2018
Unified Perceptual Parsing for Scene Understanding
Unified Perceptual Parsing for Scene Understanding
Tete Xiao
Yingcheng Liu
Bolei Zhou
Yuning Jiang
Jian Sun
OCLVOS
197
1,904
0
26 Jul 2018
MegDet: A Large Mini-Batch Object Detector
MegDet: A Large Mini-Batch Object Detector
Chao Peng
Tete Xiao
Zeming Li
Yuning Jiang
Xiangyu Zhang
Kai Jia
Gang Yu
Jian Sun
ObjD
199
318
0
20 Nov 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
128
3,688
0
08 Jun 2017
Open Vocabulary Scene Parsing
Open Vocabulary Scene Parsing
Hang Zhao
Xavier Puig
Bolei Zhou
Sanja Fidler
Antonio Torralba
VLM3DV
96
120
0
26 Mar 2017
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
381
27,275
0
20 Mar 2017
COCO-Stuff: Thing and Stuff Classes in Context
COCO-Stuff: Thing and Stuff Classes in Context
Holger Caesar
J. Uijlings
V. Ferrari
150
1,396
0
12 Dec 2016
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Nayeon Lee
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
491
22,158
0
09 Dec 2016
Pyramid Scene Parsing Network
Pyramid Scene Parsing Network
Hengshuang Zhao
Jianping Shi
Xiaojuan Qi
Xiaogang Wang
Jiaya Jia
VOSSSeg
665
12,046
0
04 Dec 2016
12
Next