Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.04746
Cited By
v1
v2
v3 (latest)
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
7 December 2023
M. S. Seyfioglu
Wisdom O. Ikezogwo
Fatemeh Ghezloo
Ranjay Krishna
Linda G. Shapiro
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos"
46 / 46 papers shown
Title
Medical Large Vision Language Models with Multi-Image Visual Ability
Xikai Yang
Juzheng Miao
Yuchen Yuan
Jiaze Wang
Qi Dou
Jinpeng Li
Pheng Ann Heng
27
0
0
25 May 2025
Discovering Pathology Rationale and Token Allocation for Efficient Multimodal Pathology Reasoning
Zhe Xu
Cheng Jin
Yihui Wang
Ziyi Liu
Hao Chen
93
0
0
21 May 2025
Generative Models in Computational Pathology: A Comprehensive Survey on Methods, Applications, and Challenges
Yuan Zhang
Xinfeng Zhang
Xiaoming Qi Xinyu Wu
Feng Chen
Guanyu Yang
Huazhu Fu
MedIm
LM&MA
AI4CE
145
0
0
16 May 2025
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner
Wenchuan Zhang
Penghao Zhang
Jingru Guo
Tao Cheng
Jie Chen
Shuwan Zhang
Zhang Zhang
Yuhao Yi
Hong Bu
AI4TS
LRM
94
0
0
16 May 2025
VideoPath-LLaVA: Pathology Diagnostic Reasoning Through Video Instruction Tuning
T. Vuong
J. T. Kwak
VGen
66
0
0
07 May 2025
Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design
Vasudev Sharma
Ahmed Alagha
Abdelhakim Khellaf
Vincent Quoc-Huy Trinh
Mahdi S. Hosseini
121
0
0
30 Apr 2025
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
Shahad Albastaki
Anabia Sohail
I. I. Ganapathi
B. Alawode
Asim Khan
Sajid Javed
Naoufel Werghi
Mohammed Bennamoun
Arif Mahmood
136
0
0
26 Apr 2025
GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology
S. Kapse
Pushpak Pati
Srikar Yellapragada
Srijan Das
Rajarsi R. Gupta
Joel H. Saltz
Dimitris Samaras
Prateek Prasanna
VLM
99
1
0
01 Apr 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
111
0
0
12 Mar 2025
From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine
Lukas Buess
Matthias Keicher
Nassir Navab
Andreas Maier
Soroosh Tayebi Arasteh
LM&MA
257
2
0
13 Feb 2025
PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology
Fatemeh Ghezloo
M. S. Seyfioglu
Rustin Soraki
Wisdom O. Ikezogwo
Beibin Li
Tejoram Vivekanandan
J. Elmore
Ranjay Krishna
Linda G. Shapiro
158
7
0
13 Feb 2025
Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis
Shengxuming Zhang
Weihan Li
Tianhong Gao
Jiacong Hu
Haoming Luo
Xiuming Zhang
Jing Zhang
Mingli Song
Zunlei Feng
LM&MA
153
0
0
12 Dec 2024
Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering
Awais Naeem
Tianhao Li
Huang-Ru Liao
Jiawei Xu
Aby M. Mathew
...
A. Jaiswal
Raffi A. Salibian
Ziniu Hu
Tianlong Chen
Ying Ding
132
0
0
26 Nov 2024
HumanVLM: Foundation for Human-Scene Vision-Language Model
Dawei Dai
Xu Long
Li Yutang
Zhang YuanHui
Shuyin Xia
VLM
MLLM
120
2
0
05 Nov 2024
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
Ying Chen
Guoan Wang
Yuanfeng Ji
Yanjun Li
Jin Ye
Tianbin Li
Bin Zhang
Nana Pei
Rongshan Yu
Yu Qiao
VLM
LM&MA
98
5
0
15 Oct 2024
Exploring the Feasibility of Multimodal Chatbot AI as Copilot in Pathology Diagnostics: Generalist Model's Pitfall
Mianxin Liu
Jianfeng Wu
Fang Yan
Hongjun Li
Wei Wang
Shaoting Zhang
Zhe Wang
LM&MA
79
0
0
04 Sep 2024
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CE
VLM
124
7
0
23 Aug 2024
PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding
Dawei Dai
Yuanhui Zhang
Long Xu
Qianlan Yang
Xiaojing Shen
Shuyin Xia
Guoyin Wang
LM&MA
VLM
125
11
0
18 Aug 2024
Cost-effective Instruction Learning for Pathology Vision and Language Analysis
Kaitao Chen
Mianxin Liu
Fang Yan
Lei Ma
Xiaoming Shi
...
Xiaosong Wang
Lifeng Zhu
Zhe Wang
Mu Zhou
Shaoting Zhang
76
4
0
25 Jul 2024
Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning
Chen Shen
Chunfeng Lian
Wanqing Zhang
Fan Wang
Jianhua Zhang
...
Hongshu Mu
Hao Wu
Xinggong Liang
Jianhua Ma
Zhenyuan Wang
88
1
0
20 Jul 2024
PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration
Yuxuan Sun
Yunlong Zhang
Yixuan Si
Chenglu Zhu
Zhongyi Shui
Kai Zhang
Jingxiong Li
Xingheng Lyu
Tao Lin
Lin Yang
LM&MA
VLM
MedIm
89
12
0
28 Jun 2024
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model
Jiahao Huo
Yibo Yan
Boren Hu
Yutao Yue
Xuming Hu
LRM
MLLM
79
8
0
17 Jun 2024
Octopi: Object Property Reasoning with Large Tactile-Language Models
Samson Yu
Kelvin Lin
Anxing Xiao
Jiafei Duan
Harold Soh
LRM
76
31
0
05 May 2024
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar
Zhenshi Li
Feng-Xue Gu
Xue-liang Zhang
Pengfeng Xiao
128
62
0
04 Feb 2024
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
Yuxuan Sun
Hao Wu
Chenglu Zhu
Sunyi Zheng
Qizi Chen
...
Mengyue Zheng
Jingxiong Li
Xinheng Lyu
Tao Lin
Lin Yang
LM&MA
76
18
0
29 Jan 2024
Mistral 7B
Albert Q. Jiang
Alexandre Sablayrolles
A. Mensch
Chris Bamford
Devendra Singh Chaplot
...
Teven Le Scao
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoE
LRM
99
2,237
0
10 Oct 2023
Improved Baselines with Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Yuheng Li
Yong Jae Lee
VLM
MLLM
130
2,807
0
05 Oct 2023
Med-Flamingo: a Multimodal Medical Few-shot Learner
Michael Moor
Qian Huang
Shirley Wu
Michihiro Yasunaga
C. Zakka
Yashodhara Dalmia
E. Reis
Pranav Rajpurkar
J. Leskovec
LM&MA
MedIm
78
269
0
27 Jul 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
...
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
103
270
0
13 Jul 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology
Wisdom O. Ikezogwo
M. S. Seyfioglu
Fatemeh Ghezloo
Dylan Stefan Chan Geva
Fatwir Sheikh Mohammed
Pavan Kumar Anand
Ranjay Krishna
Linda G. Shapiro
CLIP
VLM
286
125
0
20 Jun 2023
MIMIC-IT: Multi-Modal In-Context Instruction Tuning
Yue Liu
Yuanhan Zhang
Liangyu Chen
Jinghao Wang
Fanyi Pu
Jingkang Yang
Cuiping Li
Ziwei Liu
MLLM
VLM
88
240
0
08 Jun 2023
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Chunyuan Li
Cliff Wong
Sheng Zhang
Naoto Usuyama
Haotian Liu
Jianwei Yang
Tristan Naumann
Hoifung Poon
Jianfeng Gao
LM&MA
MedIm
118
792
0
01 Jun 2023
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Xiaoman Zhang
Chaoyi Wu
Ziheng Zhao
Weixiong Lin
Ya Zhang
Yanfeng Wang
Weidi Xie
LM&MA
123
181
0
17 May 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Peng Gao
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
99
587
0
28 Apr 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
162
2,060
0
20 Apr 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
567
4,910
0
17 Apr 2023
What does CLIP know about a red circle? Visual prompt engineering for VLMs
Aleksandar Shtedritski
Christian Rupprecht
Andrea Vedaldi
VLM
MLLM
100
160
0
13 Apr 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,699
0
15 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.5K
13,420
0
27 Feb 2023
Connecting Vision and Language with Video Localized Narratives
P. Voigtlaender
Soravit Changpinyo
Jordi Pont-Tuset
Radu Soricut
V. Ferrari
VGen
117
22
0
22 Feb 2023
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
99
213
0
07 Jan 2022
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
104
383
0
04 Jun 2021
PathVQA: 30000+ Questions for Medical Visual Question Answering
Xuehai He
Yichen Zhang
Luntian Mou
Eric Xing
P. Xie
LM&MA
54
242
0
07 Mar 2020
Connecting Vision and Language with Localized Narratives
Jordi Pont-Tuset
J. Uijlings
Soravit Changpinyo
Radu Soricut
V. Ferrari
ObjD
91
251
0
06 Dec 2019
FaceNet: A Unified Embedding for Face Recognition and Clustering
Florian Schroff
Dmitry Kalenichenko
James Philbin
3DH
388
13,145
0
12 Mar 2015
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
424
43,814
0
01 May 2014
1