Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.09808
Cited By
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
17 November 2022
Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
Chun Yuan
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
Jifeng Dai
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks"
48 / 48 papers shown
Title
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
99
48
0
03 Jan 2025
EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation
Milos Vukadinovic
Xiu Tang
N. Yuan
Paul Cheng
Debiao Li
Susan Cheng
B. He
David Ouyang
37
11
0
13 Oct 2024
Universal Medical Image Representation Learning with Compositional Decoders
Kaini Wang
Ling Yang
Siping Zhou
Guangquan Zhou
Wentao Zhang
Bin Cui
Shuo Li
SSL
MedIm
33
0
0
30 Sep 2024
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
Min Yang
Zichen Zhang
Limin Wang
AI4TS
39
0
0
27 Sep 2024
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology
Xin Jiang
Junwei Zheng
Ruiping Liu
Jiahang Li
Jiaming Zhang
Sven Matthiesen
Rainer Stiefelhagen
VLM
28
0
0
21 Sep 2024
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li
Junfeng Wu
Weizhi Zhao
Song Bai
Xiang Bai
41
1
0
23 Jul 2024
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Yangzhou Liu
Yue Cao
Zhangwei Gao
Weiyun Wang
Zhe Chen
...
Lewei Lu
Xizhou Zhu
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
54
23
0
22 Jul 2024
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Roman Bachmann
Oğuzhan Fatih Kar
David Mizrahi
Ali Garjani
Mingfei Gao
David Griffiths
Jiaming Hu
Afshin Dehghan
Amir Zamir
MoE
VLM
MLLM
41
14
0
13 Jun 2024
All in One Framework for Multimodal Re-identification in the Wild
He Li
Mang Ye
Ming Zhang
Bo Du
35
9
0
08 May 2024
Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
Donggyun Kim
Seongwoong Cho
Semin Kim
Chong Luo
Seunghoon Hong
VLM
45
2
0
29 Apr 2024
In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation
Han Xue
Qianru Sun
Li-Na Song
Wenjun Zhang
Zhiwu Huang
MLLM
44
0
0
15 Apr 2024
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Haiyang Wang
Hao Tang
Li Jiang
Shaoshuai Shi
Muhammad Ferjad Naeem
Hongsheng Li
Bernt Schiele
Liwei Wang
VLM
43
10
0
14 Mar 2024
Masked AutoDecoder is Effective Multi-Task Vision Generalist
Han Qiu
Jiaxing Huang
Peng Gao
Lewei Lu
Xiaoqin Zhang
Shijian Lu
51
4
0
12 Mar 2024
Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yueping Jiang
MLLM
42
18
0
12 Mar 2024
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang
I-Hau Yeh
Hongpeng Liao
57
1,151
0
21 Feb 2024
Supervised Fine-tuning in turn Improves Visual Foundation Models
Xiaohu Jiang
Yixiao Ge
Yuying Ge
Dachuan Shi
Chun Yuan
Ying Shan
VLM
CLIP
46
8
0
18 Jan 2024
PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning
Hangyu Mao
Rui Zhao
Ziyue Li
Zhiwei Xu
Hao Chen
Yiqun Chen
Bin Zhang
Zhen Xiao
Junge Zhang
Jiangjin Yin
OffRL
16
8
0
26 Dec 2023
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu
Yi-Xin Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOS
VLM
35
39
0
14 Dec 2023
Uni3DL: Unified Model for 3D and Language Understanding
Xiang Li
Jian Ding
Zhaoyang Chen
Mohamed Elhoseiny
32
3
0
05 Dec 2023
Towards More Unified In-context Visual Understanding
Dianmo Sheng
Dongdong Chen
Zhentao Tan
Qiankun Liu
Qi Chu
Jianmin Bao
Tao Gong
Bin Liu
Shengwei Xu
Nenghai Yu
MLLM
VLM
40
10
0
05 Dec 2023
APoLLo: Unified Adapter and Prompt Learning for Vision Language Models
Sanjoy Chowdhury
Sayan Nag
Dinesh Manocha
VLM
33
17
0
04 Dec 2023
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
Lingchen Meng
Shiyi Lan
Hengduo Li
Jose M. Alvarez
Zuxuan Wu
Yu-Gang Jiang
VLM
ISeg
MLLM
28
6
0
24 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
27
64
0
07 Nov 2023
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Wei-Ge Chen
Irina Spiridonova
Jianwei Yang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
13
34
0
01 Nov 2023
OpenForest: A data catalogue for machine learning in forest monitoring
Arthur Ouaknine
T. Kattenborn
Etienne Laliberté
David Rolnick
48
5
0
01 Nov 2023
Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model
Joanna Hong
Se Jin Park
Y. Ro
VLM
11
6
0
23 Oct 2023
Lightweight In-Context Tuning for Multimodal Unified Models
Yixin Chen
Shuai Zhang
Boran Han
Jiaya Jia
24
2
0
08 Oct 2023
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists
Yulu Gan
Sungwoo Park
Alexander Schubert
Anthony Philippakis
Ahmed Alaa
VLM
32
22
0
30 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng-Wei Zhang
Han Hu
Dongdong Chen
Baining Guo
DiffM
VLM
51
93
0
07 Sep 2023
Unified and Dynamic Graph for Temporal Character Grouping in Long Videos
Xiujun Shu
Wei Wen
Liangsheng Xu
Ruizhi Qiao
Taian Guo
Hanjun Li
Bei Gan
Tianlin Li
Xing Sun
37
0
0
27 Aug 2023
Towards a Visual-Language Foundation Model for Computational Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Ivy Liang
...
Andrew Zhang
L. Le
Georg Gerber
Anil V. Parwani
Faisal Mahmood
VLM
MedIm
40
46
0
24 Jul 2023
Review of Large Vision Models and Visual Prompt Engineering
Jiaqi Wang
Zheng Liu
Lin Zhao
Zihao Wu
Chong Ma
...
Bao Ge
Yixuan Yuan
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
LRM
55
146
0
03 Jul 2023
JourneyDB: A Benchmark for Generative Image Understanding
Keqiang Sun
Junting Pan
Yuying Ge
Hao Li
Haodong Duan
...
Yi Wang
Jifeng Dai
Yu Qiao
Limin Wang
Hongsheng Li
54
102
0
03 Jul 2023
3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic Segmentation on VSPW
Shijie Chang
Zeqi Hao
Ben Kang
Xiaoqi Zhao
Jiawen Zhu
Zhe Chen
Lihe Zhang
Lu Zhang
Huchuan Lu
21
1
0
04 Jun 2023
Unifying (Machine) Vision via Counterfactual World Modeling
Daniel M. Bear
Kevin T. Feigelis
Honglin Chen
Wanhee Lee
R. Venkatesh
Klemen Kotar
Alex Durango
Daniel L. K. Yamins
VGen
23
13
0
02 Jun 2023
Multi-modal Queried Object Detection in the Wild
Yifan Xu
Mengdan Zhang
Chaoyou Fu
Peixian Chen
Xiaoshan Yang
Ke Li
Changsheng Xu
ObjD
VLM
30
30
0
30 May 2023
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
Kai Zhang
Jun Yu
Eashan Adhikarla
Rong-Er Zhou
Zhilin Yan
...
Xun Chen
Yong Chen
Quanzheng Li
Hongfang Liu
Lichao Sun
LM&MA
MedIm
34
157
0
26 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Jiaheng Liu
15
1
0
19 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLM
VLM
35
457
0
18 May 2023
Musketeer: Joint Training for Multi-task Vision Language Model with Task Explanation Prompts
Zhaoyang Zhang
Yantao Shen
Kunyu Shi
Zhaowei Cai
Jun Fang
Siqi Deng
Hao Yang
Davide Modolo
Z. Tu
Stefano Soatto
VLM
28
2
0
11 May 2023
Personalize Segment Anything Model with One Shot
Renrui Zhang
Zhengkai Jiang
Ziyu Guo
Shilin Yan
Junting Pan
Xianzheng Ma
Hao Dong
Peng Gao
Hongsheng Li
MLLM
VLM
33
207
0
04 May 2023
Transformer-Based Visual Segmentation: A Survey
Xiangtai Li
Henghui Ding
Haobo Yuan
Wenwei Zhang
Jiangmiao Pang
Guangliang Cheng
Kai-xiang Chen
Ziwei Liu
Chen Change Loy
ViT
MedIm
42
132
0
19 Apr 2023
A Billion-scale Foundation Model for Remote Sensing Images
Keumgang Cha
Junghoon Seo
Taekyung Lee
32
64
0
11 Apr 2023
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
32
110
0
21 Dec 2022
Geodesic Multi-Modal Mixup for Robust Fine-Tuning
Changdae Oh
Junhyuk So
Hoyoon Byun
Yongtaek Lim
Minchul Shin
Jong-June Jeon
Kyungwoo Song
33
26
0
08 Mar 2022
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
290
1,084
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
304
3,708
0
11 Feb 2021
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
267
3,375
0
09 Mar 2020
1