Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.00977
Cited By
Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model
3 June 2024
Kezhen Chen
Rahul Thapa
Rahul Chalamala
Ben Athiwaratkun
Shuaiwen Leon Song
James Zou
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model"
6 / 6 papers shown
Title
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs
Sheng Zhang
Yanbo Xu
Naoto Usuyama
Hanwen Xu
J. Bagga
...
Carlo Bifulco
M. Lungren
Tristan Naumann
Sheng Wang
Hoifung Poon
LM&MA
MedIm
154
205
0
10 Jan 2025
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Haotian Zhang
Haoxuan You
Philipp Dufter
Bowen Zhang
Chen Chen
...
Tsu-jui Fu
William Yang Wang
Shih-Fu Chang
Zhe Gan
Yinfei Yang
ObjD
MLLM
104
44
0
11 Apr 2024
RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training
Zheng Yuan
Qiao Jin
Chuanqi Tan
Zhengyun Zhao
Hongyi Yuan
Fei Huang
Songfang Huang
52
27
0
01 Mar 2023
RepsNet: Combining Vision with Language for Automated Medical Reports
A. Tanwani
Joelle Barral
Daniel Freedman
MedIm
35
20
0
27 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
211
1,106
0
20 Sep 2022
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation
Yasuhide Miura
Yuhao Zhang
Emily Bao Tsai
C. Langlotz
Dan Jurafsky
MedIm
154
156
0
20 Oct 2020
1