ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.10428
  4. Cited By
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
v1v2v3 (latest)

A Region-Prompted Adapter Tuning for Visual Abductive Reasoning

18 March 2023
Hao Zhang
Yeo Keat Ee
Basura Fernando
    VLM
ArXiv (abs)PDFHTML

Papers citing "A Region-Prompted Adapter Tuning for Visual Abductive Reasoning"

45 / 45 papers shown
Title
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with
  Instruction Tuning
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Zebang Cheng
Zhi-Qi Cheng
Jun-Yan He
Jingdong Sun
Kai Wang
Yuxiang Lin
Zheng Lian
Xiaojiang Peng
Alexander G. Hauptmann
MLLM
94
39
0
17 Jun 2024
FedHPL: Efficient Heterogeneous Federated Learning with Prompt Tuning
  and Logit Distillation
FedHPL: Efficient Heterogeneous Federated Learning with Prompt Tuning and Logit Distillation
Yuting Ma
Lechao Cheng
Yaxiong Wang
Zhun Zhong
Xiaohua Xu
Meng Wang
FedML
78
4
0
27 May 2024
Revisiting the Power of Prompt for Visual Tuning
Revisiting the Power of Prompt for Visual Tuning
Yuzhu Wang
Lechao Cheng
Chaowei Fang
Dingwen Zhang
Manni Duan
Meng Wang
VLM
122
16
0
04 Feb 2024
ProS: Prompting-to-simulate Generalized knowledge for Universal
  Cross-Domain Retrieval
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval
Kaipeng Fang
Jingkuan Song
Lianli Gao
Pengpeng Zeng
Zhi-Qi Cheng
Xiyao Li
Hengtao Shen
VLM
60
11
0
19 Dec 2023
PEFT-Ref: A Modular Reference Architecture and Typology for
  Parameter-Efficient Finetuning Techniques
PEFT-Ref: A Modular Reference Architecture and Typology for Parameter-Efficient Finetuning Techniques
Mohammed Sabry
Anya Belz
95
8
0
24 Apr 2023
What does CLIP know about a red circle? Visual prompt engineering for
  VLMs
What does CLIP know about a red circle? Visual prompt engineering for VLMs
Aleksandar Shtedritski
Christian Rupprecht
Andrea Vedaldi
VLMMLLM
106
161
0
13 Apr 2023
Reproducible scaling laws for contrastive language-image learning
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti
Romain Beaumont
Ross Wightman
Mitchell Wortsman
Gabriel Ilharco
Cade Gordon
Christoph Schuhmann
Ludwig Schmidt
J. Jitsev
VLMCLIP
128
819
0
14 Dec 2022
InternImage: Exploring Large-Scale Vision Foundation Models with
  Deformable Convolutions
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
164
691
0
10 Nov 2022
Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual
  Understanding With Multilingual Language Models
Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual Understanding With Multilingual Language Models
Lifu Tu
Caiming Xiong
Yingbo Zhou
VLMAAMLLRM
135
28
0
22 Oct 2022
Tiny-Attention Adapter: Contexts Are More Important Than the Number of
  Parameters
Tiny-Attention Adapter: Contexts Are More Important Than the Number of Parameters
Hongyu Zhao
Hao Tan
Hongyuan Mei
MoE
70
18
0
18 Oct 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLMCLIPOffRL
177
1,309
0
04 May 2022
Attention in Attention: Modeling Context Correlation for Efficient Video
  Classification
Attention in Attention: Modeling Context Correlation for Efficient Video Classification
Y. Hao
Shuo Wang
P. Cao
Xinjian Gao
Tong Xu
Jinmeng Wu
Xiangnan He
90
41
0
20 Apr 2022
PromptDet: Towards Open-vocabulary Detection using Uncurated Images
PromptDet: Towards Open-vocabulary Detection using Uncurated Images
Chengjian Feng
Yujie Zhong
Zequn Jie
Xiangxiang Chu
Haibing Ren
Xiaolin K. Wei
Weidi Xie
Lin Ma
VPVLMVLM
60
156
0
30 Mar 2022
Visual Prompt Tuning
Visual Prompt Tuning
Menglin Jia
Luming Tang
Bor-Chun Chen
Claire Cardie
Serge Belongie
Bharath Hariharan
Ser-Nam Lim
VLMVPVLM
158
1,645
0
23 Mar 2022
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive
  Reasoning
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Jack Hessel
Jena D. Hwang
Jinho Park
Rowan Zellers
Chandra Bhagavatula
Anna Rohrbach
Kate Saenko
Yejin Choi
ReLM
209
51
0
10 Feb 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
555
4,413
0
28 Jan 2022
RegionCLIP: Region-based Language-Image Pretraining
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLMCLIP
151
580
0
16 Dec 2021
VL-Adapter: Parameter-Efficient Transfer Learning for
  Vision-and-Language Tasks
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLMVPVLM
112
356
0
13 Dec 2021
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
141
908
0
22 Nov 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLMVPVLMVLM
283
224
0
24 Sep 2021
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
502
10,526
0
17 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLMLRM
104
383
0
04 Jun 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjDVLM
187
890
0
26 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
592
4,093
0
18 Apr 2021
Factual Probing Is [MASK]: Learning vs. Learning to Recall
Factual Probing Is [MASK]: Learning vs. Learning to Recall
Zexuan Zhong
Dan Friedman
Danqi Chen
56
412
0
12 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
1.0K
29,926
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLMCLIP
463
3,901
0
11 Feb 2021
Knowledge-Routed Visual Question Reasoning: Challenges for Deep
  Representation Embedding
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao
Bailin Li
Xiaodan Liang
Keze Wang
Liang Lin
86
36
0
14 Dec 2020
Multi-Label Contrastive Learning for Abstract Visual Reasoning
Multi-Label Contrastive Learning for Abstract Visual Reasoning
Mikolaj Malkiñski
Jacek Mańdziuk
55
40
0
03 Dec 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
892
42,463
0
28 May 2020
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLMMLLM
252
2,493
0
20 Aug 2019
Abductive Commonsense Reasoning
Abductive Commonsense Reasoning
Chandra Bhagavatula
Ronan Le Bras
Chaitanya Malaviya
Keisuke Sakaguchi
Ari Holtzman
Hannah Rashkin
Doug Downey
Scott Yih
Yejin Choi
ReLMLRM
85
463
0
15 Aug 2019
Attention in Natural Language Processing
Attention in Natural Language Processing
Andrea Galassi
Marco Lippi
Paolo Torroni
GNN
66
480
0
04 Feb 2019
Parameter-Efficient Transfer Learning for NLP
Parameter-Efficient Transfer Learning for NLP
N. Houlsby
A. Giurgiu
Stanislaw Jastrzebski
Bruna Morrone
Quentin de Laroussilhe
Andrea Gesmundo
Mona Attariyan
Sylvain Gelly
223
4,529
0
02 Feb 2019
From Recognition to Cognition: Visual Commonsense Reasoning
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRMBDLOCLReLM
184
883
0
27 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,229
0
11 Oct 2018
Object Detection with Deep Learning: A Review
Object Detection with Deep Learning: A Review
Zhong-Qiu Zhao
Peng Zheng
Shou-tao Xu
Xindong Wu
ObjD
134
4,017
0
15 Jul 2018
MAttNet: Modular Attention Network for Referring Expression
  Comprehension
MAttNet: Modular Attention Network for Referring Expression Comprehension
Licheng Yu
Zhe Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Joey Tianyi Zhou
Tamara L. Berg
ObjD
117
831
0
24 Jan 2018
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
372
27,253
0
20 Mar 2017
Modeling Context in Referring Expressions
Modeling Context in Referring Expressions
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
133
1,277
0
31 Jul 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
234
5,765
0
23 Feb 2016
Visual7W: Grounded Question Answering in Images
Visual7W: Grounded Question Answering in Images
Yuke Zhu
Oliver Groth
Michael S. Bernstein
Li Fei-Fei
106
887
0
11 Nov 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMatObjD
533
62,409
0
04 Jun 2015
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
434
43,875
0
01 May 2014
Rich feature hierarchies for accurate object detection and semantic
  segmentation
Rich feature hierarchies for accurate object detection and semantic segmentation
Ross B. Girshick
Jeff Donahue
Trevor Darrell
Jitendra Malik
ObjD
295
26,223
0
11 Nov 2013
1