ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.08790
  4. Cited By
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
  Visual Models

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

19 April 2022
Chunyuan Li
Haotian Liu
Liunian Harold Li
Pengchuan Zhang
J. Aneja
Jianwei Yang
Ping Jin
Houdong Hu
Zicheng Liu
Yong Jae Lee
Jianfeng Gao
ArXivPDFHTML

Papers citing "ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models"

50 / 111 papers shown
Title
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via D\mathbf{\texttt{D}}Dual-H\mathbf{\texttt{H}}Head O\mathbf{\texttt{O}}Optimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
58
0
0
12 May 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Weicai Yan
Wang Lin
Zirun Guo
Ye Wang
Fangming Feng
Xiaoda Yang
Zhigang Wang
Tao Jin
DiffM
150
2
0
30 Apr 2025
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Xu Zheng
Ziqiao Weng
Yuanhuiyi Lyu
Lutao Jiang
Haiwei Xue
Bin Ren
Danda Pani Paudel
N. Sebe
Luc Van Gool
Xuming Hu
3DV
42
5
0
23 Mar 2025
DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection
Chiara Cappellino
G. Mancusi
Matteo Mosconi
Angelo Porrello
Simone Calderara
Rita Cucchiara
ObjD
VLM
86
0
0
12 Mar 2025
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
Xiang Liu
Zhaoxiang Liu
Huan Hu
Zezhou Chen
Kohou Wang
Kai Wang
Shiguo Lian
43
1
0
10 Mar 2025
RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models
Wenhui Zhu
Xin Li
Xiwen Chen
Peijie Qiu
Vamsi Krishna Vasa
...
Yanxi Chen
Natasha Lepore
Oana Dumitrascu
Yi Su
Yalin Wang
LM&MA
74
1
0
06 Mar 2025
Object-centric Binding in Contrastive Language-Image Pretraining
Object-centric Binding in Contrastive Language-Image Pretraining
Rim Assouel
Pietro Astolfi
Florian Bordes
M. Drozdzal
Adriana Romero Soriano
OCL
VLM
CoGe
103
0
0
19 Feb 2025
Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting
Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting
Keyi Zhu
Jiajia Li
Kaixiang Zhang
Chaaran Arunachalam
Siddhartha Bhattacharya
R. Lu
Zhaojian Li
87
0
0
03 Feb 2025
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
Sheng Zhang
Yanbo Xu
Naoto Usuyama
Hanwen Xu
J. Bagga
...
Carlo Bifulco
M. Lungren
Tristan Naumann
Sheng Wang
Hoifung Poon
LM&MA
MedIm
154
205
0
10 Jan 2025
AnySynth: Harnessing the Power of Image Synthetic Data Generation for
  Generalized Vision-Language Tasks
AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks
Y. Li
Fan Ma
Yi Yang
DiffM
151
2
0
24 Nov 2024
Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object
  Detection Considering Text Describability
Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object Detection Considering Text Describability
Yusuke Hosoya
Masanori Suganuma
Takayuki Okatani
ObjD
21
0
0
20 Oct 2024
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of
  MLLMs
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu
Linchao Zhu
Yi Yang
27
3
0
16 Oct 2024
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large
  Language and Vision Models
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models
Juseong Jin
Chang Wook Jeong
33
3
0
13 Oct 2024
FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal
  Models
FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models
Diego A. B. Moreira
Alef Iury Ferreira
Jhessica Silva
G. O. D. Santos
Luiz Pereira
...
Nádia Da Silva
Simone Tiemi Hashiguti
Jefersson A. dos Santos
Hélio Pedrini
Sandra Avila
VLM
25
1
0
28 Sep 2024
Making Large Vision Language Models to be Good Few-shot Learners
Making Large Vision Language Models to be Good Few-shot Learners
Fan Liu
Wenwen Cai
Jian Huo
Chuanyi Zhang
Delong Chen
Jun Zhou
57
0
0
21 Aug 2024
Linking Robustness and Generalization: A k* Distribution Analysis of
  Concept Clustering in Latent Space for Vision Models
Linking Robustness and Generalization: A k* Distribution Analysis of Concept Clustering in Latent Space for Vision Models
Shashank Kotyan
Pin-Yu Chen
Danilo Vasconcellos Vargas
OOD
45
0
0
17 Aug 2024
SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language
  Pre-trained Models
SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models
Yang Zhou
Yongjian Wu
Jiya Saiyin
Bingzheng Wei
Maode Lai
Eric Chang
Yan Xu
VLM
51
0
0
16 Jul 2024
Exploring the Spectrum of Visio-Linguistic Compositionality and
  Recognition
Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition
Youngtaek Oh
Pyunghwan Ahn
Jinhyung Kim
Gwangmo Song
Soonyoung Lee
In So Kweon
Junmo Kim
CoGe
48
2
0
13 Jun 2024
Multi-Modal Generative Embedding Model
Multi-Modal Generative Embedding Model
Feipeng Ma
Hongwei Xue
Guangting Wang
Yizhou Zhou
Fengyun Rao
Shilin Yan
Yueyi Zhang
Siying Wu
Mike Zheng Shou
Xiaoyan Sun
VLM
39
3
0
29 May 2024
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Edison Marrese-Taylor
Hamed Damirchi
Anton Van Den Hengel
VLM
43
1
0
27 May 2024
Understanding Retrieval-Augmented Task Adaptation for Vision-Language
  Models
Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models
Yifei Ming
Yixuan Li
VLM
39
7
0
02 May 2024
A Progressive Framework of Vision-language Knowledge Distillation and
  Alignment for Multilingual Scene
A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene
Wenbo Zhang
Yifan Zhang
Jianfeng Lin
Binqiang Huang
Jinlu Zhang
Wenhao Yu
VLM
44
2
0
17 Apr 2024
Watch Your Step: Optimal Retrieval for Continual Learning at Scale
Watch Your Step: Optimal Retrieval for Continual Learning at Scale
Truman Hickok
Dhireesha Kudithipudi
37
1
0
16 Apr 2024
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Qing Jiang
Feng Li
Zhaoyang Zeng
Tianhe Ren
Shilong Liu
Lei Zhang
VLM
32
37
0
21 Mar 2024
GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting
  Editing
GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
Jing Wu
Jiawang Bian
Xinghui Li
Guangrun Wang
Ian D Reid
Philip Torr
V. Prisacariu
3DGS
27
33
0
13 Mar 2024
Fine-grained Prompt Tuning: A Parameter and Memory Efficient Transfer
  Learning Method for High-resolution Medical Image Classification
Fine-grained Prompt Tuning: A Parameter and Memory Efficient Transfer Learning Method for High-resolution Medical Image Classification
Yijin Huang
Pujin Cheng
Roger Tam
Xiaoying Tang
VLM
MedIm
40
1
0
12 Mar 2024
Real-time Transformer-based Open-Vocabulary Detection with Efficient
  Fusion Head
Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head
Tiancheng Zhao
Peng Liu
Xuan He
Lu Zhang
Kyusong Lee
ObjD
43
8
0
11 Mar 2024
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
Nabil Ibtehaz
Ning Yan
Masood S. Mortazavi
Daisuke Kihara
ViT
32
3
0
07 Mar 2024
Zero-shot Generalizable Incremental Learning for Vision-Language Object
  Detection
Zero-shot Generalizable Incremental Learning for Vision-Language Object Detection
Jieren Deng
Haojian Zhang
Kun Ding
Jianhua Hu
Xingxuan Zhang
Yunkuan Wang
VLM
ObjD
82
4
0
04 Mar 2024
The Neglected Tails in Vision-Language Models
The Neglected Tails in Vision-Language Models
Shubham Parashar
Zhiqiu Lin
Tian Liu
Xiangjue Dong
Yanan Li
Deva Ramanan
James Caverlee
Shu Kong
VLM
40
33
0
23 Jan 2024
An Open and Comprehensive Pipeline for Unified Object Grounding and
  Detection
An Open and Comprehensive Pipeline for Unified Object Grounding and Detection
Xiangyu Zhao
Yicheng Chen
Shilin Xu
Xiangtai Li
Xinjiang Wang
Yining Li
Haian Huang
ObjD
AI4CE
45
29
0
04 Jan 2024
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Few-shot Adaptation of Multi-modal Foundation Models: A Survey
Fan Liu
Tianshu Zhang
Wenwen Dai
Wenwen Cai
Wenwen Cai Xiaocong Zhou
Delong Chen
VLM
OffRL
31
23
0
03 Jan 2024
Generating Enhanced Negatives for Training Language-Based Object
  Detectors
Generating Enhanced Negatives for Training Language-Based Object Detectors
Shiyu Zhao
Long Zhao
Vijay Kumar B.G
Yumin Suh
Dimitris N. Metaxas
Manmohan Chandraker
S. Schulter
ObjD
VLM
39
5
0
29 Dec 2023
3VL: Using Trees to Improve Vision-Language Models' Interpretability
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
49
4
0
28 Dec 2023
Unveiling Backbone Effects in CLIP: Exploring Representational Synergies
  and Variances
Unveiling Backbone Effects in CLIP: Exploring Representational Synergies and Variances
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Ehsan Abbasnejad
Hamed Damirchi
Ignacio M. Jara
Felipe Bravo-Marquez
Anton Van Den Hengel
VLM
54
1
0
22 Dec 2023
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
P. Nguyen
T.D. Ngo
E. Kalogerakis
Chuang Gan
Anh Tran
Cuong Pham
Khoi Duc Minh Nguyen
ISeg
38
51
0
17 Dec 2023
Osprey: Pixel Understanding with Visual Instruction Tuning
Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan
Wentong Li
Jian Liu
Dongqi Tang
Xinjie Luo
Chi Qin
Lei Zhang
Jianke Zhu
MLLM
VLM
56
78
0
15 Dec 2023
General Object Foundation Model for Images and Videos at Scale
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu
Yi-Xin Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOS
VLM
38
39
0
14 Dec 2023
Exploration of visual prompt in Grounded pre-trained open-set detection
Exploration of visual prompt in Grounded pre-trained open-set detection
Qibo Chen
Weizhong Jin
Shuchang Li
Mengdi Liu
Li Yu
Jian Jiang
Xiaozheng Wang
VLM
21
0
0
14 Dec 2023
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style
  Models on Dense Captions
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Jack Urbanek
Florian Bordes
Pietro Astolfi
Mary Williamson
Vasu Sharma
Adriana Romero Soriano
CLIP
3DV
33
41
0
14 Dec 2023
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language
  Understanding
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng
Sicheng Xie
Zuyao You
Shiyi Lan
Zuxuan Wu
VLM
CoGe
MLLM
33
18
0
30 Nov 2023
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines
Zero-shot Retrieval: Augmenting Pre-trained Models with Search Engines
Hamed Damirchi
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Javen Qinfeng Shi
Stephen Gould
Anton Van Den Hengel
VLM
47
0
0
29 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating
  Video-based Large Language Models
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
27
58
0
27 Nov 2023
Language Semantic Graph Guided Data-Efficient Learning
Language Semantic Graph Guided Data-Efficient Learning
Wenxuan Ma
Shuang Li
Lincan Cai
Jingxuan Kang
45
4
0
15 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
56
105
0
09 Nov 2023
Recognize Any Regions
Recognize Any Regions
Haosen Yang
Chuofan Ma
Bin Wen
Yi-Xin Jiang
Zehuan Yuan
Xiatian Zhu
ObjD
VLM
51
3
0
02 Nov 2023
Localizing Active Objects from Egocentric Vision with Symbolic World
  Knowledge
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge
Te-Lin Wu
Yu Zhou
Nanyun Peng
29
8
0
23 Oct 2023
CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP
  Performance on Low-Resource Languages
CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages
G. O. D. Santos
Diego A. B. Moreira
Alef Iury Ferreira
Jhessica Silva
Luiz Pereira
...
H. Maia
Nádia Da Silva
Esther Colombini
Hélio Pedrini
Sandra Avila
VLM
CLIP
34
4
0
20 Oct 2023
MarineDet: Towards Open-Marine Object Detection
MarineDet: Towards Open-Marine Object Detection
Haixin Liang
Ziqiang Zheng
Zeyu Ma
Sai-Kit Yeung
30
4
0
03 Oct 2023
GeRA: Label-Efficient Geometrically Regularized Alignment
GeRA: Label-Efficient Geometrically Regularized Alignment
Dustin Klebe
Tal Shnitzer
Mikhail Yurochkin
Leonid Karlinsky
Justin Solomon
13
2
0
01 Oct 2023
123
Next