Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.04219
Cited By
OtterHD: A High-Resolution Multi-modality Model
7 November 2023
Bo-wen Li
Peiyuan Zhang
Jingkang Yang
Yuanhan Zhang
Fanyi Pu
Ziwei Liu
VLM
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OtterHD: A High-Resolution Multi-modality Model"
12 / 12 papers shown
Title
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
26
0
0
11 May 2025
Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization
Huiyi Chen
Jiawei Peng
Kaihua Tang
Xin Geng
Xu Yang
27
0
0
19 Apr 2025
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Conghui He
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
48
26
0
22 Oct 2024
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
Ziyue Wang
Chi Chen
Fuwen Luo
Yurui Dong
Yuanchi Zhang
Yuzhuang Xu
Xiaolong Wang
Peng Li
Yang Liu
LRM
40
3
0
07 Oct 2024
GeNet: A Multimodal LLM-Based Co-Pilot for Network Topology and Configuration
Beni Ifland
Elad Duani
Rubin Krief
Miro Ohana
Aviram Zilberman
...
Ortal Lavi
Hikichi Kenji
A. Shabtai
Yuval Elovici
Rami Puzis
35
3
0
11 Jul 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Byung-Kwan Lee
Chae Won Kim
Beomchan Park
Yonghyun Ro
MLLM
LRM
41
18
0
24 May 2024
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Ruyi Xu
Yuan Yao
Zonghao Guo
Junbo Cui
Zanlin Ni
Chunjiang Ge
Tat-Seng Chua
Zhiyuan Liu
Maosong Sun
Gao Huang
VLM
MLLM
37
103
0
18 Mar 2024
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri
Tianjun Feng
Anh Nguyen
C. Bezemer
MLLM
VLM
LRM
30
9
0
08 Dec 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
281
4,244
0
30 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
392
4,137
0
28 Jan 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
244
1,024
0
13 Oct 2021
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
88
152
0
17 Sep 2021
1