ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18153
  4. Cited By

REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders

23 May 2025
Savya Khosla
Sethuraman TV
Barnett Lee
Alexander Schwing
Derek Hoiem
    VGen
ArXivPDFHTML

Papers citing "REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders"

33 / 33 papers shown
Title
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Michael Tschannen
A. Gritsenko
Xiao Wang
Muhammad Ferjad Naeem
Ibrahim Alabdulmohsin
...
Basil Mustafa
Olivier J. Hénaff
Jeremiah Harmsen
Andreas Steiner
Xiaohua Zhai
VLM
92
54
0
21 Feb 2025
PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization
PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization
Bing Fan
Yunhe Feng
Yapeng Tian
Yuewei Lin
Yan Huang
Heng Fan
81
1
0
11 Feb 2025
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization
  Using Region-Based Representations
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations
Savya Khosla
S. Vallecorsa
Alex Schwing
Derek Hoiem
93
1
0
02 Dec 2024
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLM
SyDa
VLM
62
666
0
06 Aug 2024
Region-Based Representations Revisited
Region-Based Representations Revisited
Michal Shlapentokh-Rothman
Ansel Blume
Yao Xiao
Yuqun Wu
TV Sethuraman
Heyi Tao
Jae Yong Lee
Wilfredo Torres
Yu-Xiong Wang
Derek Hoiem
73
7
0
04 Feb 2024
Single-Stage Visual Query Localization in Egocentric Videos
Single-Stage Visual Query Localization in Egocentric Videos
Hanwen Jiang
Santhosh Kumar Ramakrishnan
Kristen Grauman
69
14
0
15 Jun 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
55
119
0
18 May 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
231
3,205
0
14 Apr 2023
Segment Everything Everywhere All at Once
Segment Everything Everywhere All at Once
Xueyan Zou
Jianwei Yang
Hao Zhang
Feng Li
Linjie Li
Jianfeng Wang
Lijuan Wang
Jianfeng Gao
Yong Jae Lee
MLLM
VLM
34
469
0
13 Apr 2023
Segment Anything
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
218
7,047
0
05 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
54
1,028
0
27 Mar 2023
Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual
  Query Localization
Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Mengmeng Xu
Yanghao Li
Cheng-Yang Fu
Guohao Li
Tao Xiang
Juan-Manuel Perez-Rua
53
14
0
18 Nov 2022
TokenMix: Rethinking Image Mixing for Data Augmentation in Vision
  Transformers
TokenMix: Rethinking Image Mixing for Data Augmentation in Vision Transformers
Jihao Liu
B. Liu
Hang Zhou
Hongsheng Li
Yu Liu
ViT
42
67
0
18 Jul 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViT
VLM
257
517
0
22 Feb 2022
Not All Patches are What You Need: Expediting Vision Transformers via
  Token Reorganizations
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations
Youwei Liang
Chongjian Ge
Zhan Tong
Yibing Song
Jue Wang
P. Xie
ViT
42
247
0
16 Feb 2022
Masked-attention Mask Transformer for Universal Image Segmentation
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
Alex Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
158
2,315
0
02 Dec 2021
Extract Free Dense Labels from CLIP
Extract Free Dense Labels from CLIP
Chong Zhou
Chen Change Loy
Bo Dai
VLM
CLIP
92
467
0
02 Dec 2021
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
Lingchen Meng
Hengduo Li
Bor-Chun Chen
Shiyi Lan
Zuxuan Wu
Yu-Gang Jiang
Ser-Nam Lim
ViT
40
227
0
30 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
345
7,600
0
11 Nov 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
315
1,056
0
13 Oct 2021
Token Pooling in Vision Transformers
Token Pooling in Vision Transformers
D. Marin
Jen-Hao Rick Chang
Anurag Ranjan
Anish K. Prabhu
Mohammad Rastegari
Oncel Tuzel
ViT
92
68
0
08 Oct 2021
Localizing Objects with Self-Supervised Transformers and no Labels
Localizing Objects with Self-Supervised Transformers and no Labels
Oriane Siméoni
Gilles Puy
Huy V. Vo
Simon Roburin
Spyros Gidaris
Andrei Bursuc
P. Pérez
Renaud Marlet
Jean Ponce
ViT
197
198
0
29 Sep 2021
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Bowen Cheng
Alex Schwing
Alexander Kirillov
VLM
ViT
77
1,517
0
13 Jul 2021
BEiT: BERT Pre-Training of Image Transformers
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
94
2,785
0
15 Jun 2021
DynamicViT: Efficient Vision Transformers with Dynamic Token
  Sparsification
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
Yongming Rao
Wenliang Zhao
Benlin Liu
Jiwen Lu
Jie Zhou
Cho-Jui Hsieh
ViT
46
681
0
03 Jun 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
510
5,920
0
29 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
394
28,659
0
26 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
116
40,217
0
22 Oct 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
220
12,847
0
26 May 2020
Representation Learning with Contrastive Predictive Coding
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord
Yazhe Li
Oriol Vinyals
DRL
SSL
153
10,152
0
10 Jul 2018
Encoder-Decoder with Atrous Separable Convolution for Semantic Image
  Segmentation
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
Liang-Chieh Chen
Yukun Zhu
George Papandreou
Florian Schroff
Hartwig Adam
SSeg
59
13,005
0
07 Feb 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
216
129,831
0
12 Jun 2017
Semantic Understanding of Scenes through the ADE20K Dataset
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
289
1,850
0
18 Aug 2016
1