ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.12561
  4. Cited By
Retrieval-Augmented Multimodal Language Modeling

Retrieval-Augmented Multimodal Language Modeling

22 November 2022
Michihiro Yasunaga
Armen Aghajanyan
Weijia Shi
Rich James
J. Leskovec
Percy Liang
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
    RALM
ArXivPDFHTML

Papers citing "Retrieval-Augmented Multimodal Language Modeling"

50 / 80 papers shown
Title
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding
Yuyang Ji
Haohan Wang
LRM
39
0
0
14 Apr 2025
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
Ryota Tanaka
Taichi Iki
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
52
0
0
14 Apr 2025
Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction
Leveraging LLMs for Multimodal Retrieval-Augmented Radiology Report Generation via Key Phrase Extraction
Kyoyun Choi
Byungmu Yoon
Soobum Kim
Jonggwon Park
38
0
0
10 Apr 2025
A Multimedia Analytics Model for the Foundation Model Era
A Multimedia Analytics Model for the Foundation Model Era
M. Worring
Jan Zahálka
Stef van den Elzen
M. T. Fischer
Daniel A. Keim
VGen
HAI
37
0
0
08 Apr 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
63
0
0
18 Mar 2025
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation
Yinuo Liu
Zenghui Yuan
Guiyao Tie
Jiawen Shi
Lichao Sun
Lichao Sun
Neil Zhenqiang Gong
43
1
0
08 Mar 2025
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
L. Yang
Zhaochen Yu
Bin Cui
Mengdi Wang
ReLM
LRM
AI4CE
98
10
0
10 Feb 2025
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Yuzhu Cai
Sheng Yin
Yuxi Wei
Chenxin Xu
Weibo Mao
Felix Juefei Xu
Siheng Chen
Yanfeng Wang
EGVM
84
2
0
03 Jan 2025
A Unified Framework for Context-Aware IoT Management and State-of-the-Art IoT Traffic Anomaly Detection
A Unified Framework for Context-Aware IoT Management and State-of-the-Art IoT Traffic Anomaly Detection
Daniel Adu Worae
Athar Sheikh
Spyridon Mastorakis
36
1
0
31 Dec 2024
Leveraging Retrieval-Augmented Tags for Large Vision-Language
  Understanding in Complex Scenes
Leveraging Retrieval-Augmented Tags for Large Vision-Language Understanding in Complex Scenes
Antonio Carlos Rivera
Anthony Moore
Steven Robinson
VLM
LRM
76
0
0
16 Dec 2024
DIR: Retrieval-Augmented Image Captioning with Comprehensive
  Understanding
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding
Hao Wu
Zhihang Zhong
Xiao Sun
DiffM
72
0
0
02 Dec 2024
Autoregressive Models in Vision: A Survey
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
M. Zhang
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
48
9
0
08 Nov 2024
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page
  Multi-document Understanding
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Jaemin Cho
Debanjan Mahata
Ozan Irsoy
Yujie He
Joey Tianyi Zhou
VLM
32
9
0
07 Nov 2024
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image
  Segmentation
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
Seongsu Ha
Chaeyun Kim
Donghwa Kim
Junho Lee
Sangho Lee
Joonseok Lee
50
2
0
03 Nov 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
66
4
0
10 Oct 2024
Retrieval-Augmented Decision Transformer: External Memory for In-context
  RL
Retrieval-Augmented Decision Transformer: External Memory for In-context RL
Thomas Schmied
Fabian Paischer
Vihang Patil
M. Hofmarcher
Razvan Pascanu
Sepp Hochreiter
OffRL
44
6
0
09 Oct 2024
MLLM as Retriever: Interactively Learning Multimodal Retrieval for
  Embodied Agents
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Junpeng Yue
Xinru Xu
Börje F. Karlsson
Zongqing Lu
39
0
0
04 Oct 2024
A Comprehensive Survey of Retrieval-Augmented Generation (RAG):
  Evolution, Current Landscape and Future Directions
A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions
Shailja Gupta
Rajesh Ranjan
Surya Narayan Singh
3DV
VLM
AILaw
35
18
0
03 Oct 2024
Data-Centric AI Governance: Addressing the Limitations of Model-Focused
  Policies
Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies
Ritwik Gupta
Leah Walker
Rodolfo Corona
Stephanie Fu
Suzanne Petryk
Janet Napolitano
Trevor Darrell
Andrew W. Reddie
ELM
37
3
0
25 Sep 2024
One missing piece in Vision and Language: A Survey on Comics Understanding
One missing piece in Vision and Language: A Survey on Comics Understanding
Emanuele Vivoli
Andrey Barsky
Mohamed Ali Souibgui
Artemis LLabres
Marco Bertini
Dimosthenis Karatzas
39
3
0
14 Sep 2024
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
Tsung-Han Wu
Giscard Biamby
Jerome Quenum
Ritwik Gupta
Joseph E. Gonzalez
Trevor Darrell
David M. Chan
VLM
43
7
0
18 Jul 2024
Addressing Image Hallucination in Text-to-Image Generation through
  Factual Image Retrieval
Addressing Image Hallucination in Text-to-Image Generation through Factual Image Retrieval
Youngsun Lim
Hyunjung Shim
DiffM
HILM
MQ
43
3
0
15 Jul 2024
Evaluating Copyright Takedown Methods for Language Models
Evaluating Copyright Takedown Methods for Language Models
Boyi Wei
Weijia Shi
Yangsibo Huang
Noah A. Smith
Chiyuan Zhang
Luke Zettlemoyer
Kai Li
Peter Henderson
49
19
0
26 Jun 2024
Consistency-diversity-realism Pareto fronts of conditional image
  generative models
Consistency-diversity-realism Pareto fronts of conditional image generative models
Pietro Astolfi
Marlene Careil
Melissa Hall
Oscar Manas
Matthew Muckley
Jakob Verbeek
Adriana Romero Soriano
M. Drozdzal
51
10
0
14 Jun 2024
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal
  Language Models
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Yushi Hu
Weijia Shi
Xingyu Fu
Dan Roth
Mari Ostendorf
Luke Zettlemoyer
Noah A. Smith
Ranjay Krishna
LRM
53
38
0
13 Jun 2024
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language
  Models
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Ling Yang
Zhaochen Yu
Tianjun Zhang
Shiyi Cao
Minkai Xu
Wentao Zhang
Joseph E. Gonzalez
Bin Cui
LLMAG
LM&Ro
LRM
KELM
45
34
0
06 Jun 2024
Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits
  Multimodal Reasoning
Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning
Cheng Tan
Jingxuan Wei
Linzhuang Sun
Zhangyang Gao
Siyuan Li
Bihui Yu
Ruifeng Guo
Stan Z. Li
ReLM
LRM
3DV
69
6
0
31 May 2024
Quest: Query-centric Data Synthesis Approach for Long-context Scaling of
  Large Language Model
Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model
Chaochen Gao
Xing Wu
Qingfang Fu
Songlin Hu
SyDa
34
5
0
30 May 2024
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision
  Models
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Byung-Kwan Lee
Chae Won Kim
Beomchan Park
Yonghyun Ro
MLLM
LRM
38
18
0
24 May 2024
UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge
UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge
Chuanhao Li
Zhen Li
Chenchen Jing
Shuo Liu
Wenqi Shao
Yuwei Wu
Ping Luo
Yu Qiao
Kaipeng Zhang
ELM
29
3
0
23 May 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based
  Visual Question Answering
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding
Kaixuan Ren
Jiabin Huang
Siwen Luo
S. Han
43
1
0
19 Apr 2024
kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually
  Expanding Large Vocabularies
kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies
Zhongrui Gui
Shuyang Sun
Runjia Li
Jianhao Yuan
Zhaochong An
Karsten Roth
Ameya Prabhu
Philip H. S. Torr
VLM
CLL
32
6
0
15 Apr 2024
DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning
DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning
Jonathan Lebensold
Maziar Sanjabi
Pietro Astolfi
Adriana Romero Soriano
Kamalika Chaudhuri
Mike Rabbat
Chuan Guo
DiffM
31
4
0
21 Mar 2024
Reliable, Adaptable, and Attributable Language Models with Retrieval
Reliable, Adaptable, and Attributable Language Models with Retrieval
Akari Asai
Zexuan Zhong
Danqi Chen
Pang Wei Koh
Luke Zettlemoyer
Hanna Hajishirzi
Wen-tau Yih
KELM
RALM
46
53
0
05 Mar 2024
Follow My Instruction and Spill the Beans: Scalable Data Extraction from
  Retrieval-Augmented Generation Systems
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
Zhenting Qi
Hanlin Zhang
Eric Xing
Sham Kakade
Hima Lakkaraju
SILM
44
18
0
27 Feb 2024
Exploring ChatGPT for Next-generation Information Retrieval:
  Opportunities and Challenges
Exploring ChatGPT for Next-generation Information Retrieval: Opportunities and Challenges
Yizheng Huang
Jimmy X. Huang
35
10
0
17 Feb 2024
Learning on Multimodal Graphs: A Survey
Learning on Multimodal Graphs: A Survey
Ciyuan Peng
Jiayuan He
Feng Xia
30
6
0
07 Feb 2024
DFA-RAG: Conversational Semantic Router for Large Language Model with
  Definite Finite Automaton
DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton
Yiyou Sun
Junjie Hu
Wei Cheng
Haifeng Chen
RALM
AI4CE
34
1
0
06 Feb 2024
Retrieval Augmented End-to-End Spoken Dialog Models
Retrieval Augmented End-to-End Spoken Dialog Models
Mingqiu Wang
Izhak Shafran
H. Soltau
Wei Han
Yuan Cao
Dian Yu
Laurent El Shafey
RALM
AuLLM
24
11
0
02 Feb 2024
HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents
  QA
HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents QA
Xinyue Chen
Pengyu Gao
Jiangjiang Song
Xiaoyang Tan
52
2
0
01 Feb 2024
In-context Learning with Retrieved Demonstrations for Language Models: A
  Survey
In-context Learning with Retrieved Demonstrations for Language Models: A Survey
an Luo
Xin Xu
Yue Liu
Panupong Pasupat
Mehran Kazemi
RALM
31
55
0
21 Jan 2024
Retrieval-Augmented Egocentric Video Captioning
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu
Yifei Huang
Junlin Hou
Guo Chen
Yue Zhang
Rui Feng
Weidi Xie
EgoV
51
29
0
01 Jan 2024
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Alicia Golden
Samuel Hsia
Fei Sun
Bilge Acun
Basil Hosmer
...
Zachary DeVito
Jeff Johnson
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
VLM
DiffM
27
8
0
22 Dec 2023
Retrieval-Augmented Generation for Large Language Models: A Survey
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao
Yun Xiong
Xinyu Gao
Kangxiang Jia
Jinliu Pan
Yuxi Bi
Yi Dai
Jiawei Sun
Meng Wang
Haofen Wang
3DV
RALM
59
1,516
1
18 Dec 2023
Stable Rivers: A Case Study in the Application of Text-to-Image
  Generative Models for Earth Sciences
Stable Rivers: A Case Study in the Application of Text-to-Image Generative Models for Earth Sciences
C. Kupferschmidt
A. Binns
K. L. Kupferschmidt
G. W. Taylor
DiffM
16
1
0
13 Dec 2023
Self-Infilling Code Generation
Self-Infilling Code Generation
Lin Zheng
Jianbo Yuan
Zhi Zhang
Hongxia Yang
Lingpeng Kong
24
2
0
29 Nov 2023
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name
  Memory for Open-World Comprehension
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Jiaxuan Li
D. Vo
Akihiro Sugimoto
Hideki Nakayama
KELM
VLM
42
23
0
27 Nov 2023
Holistic Evaluation of Text-To-Image Models
Holistic Evaluation of Text-To-Image Models
Tony Lee
Michihiro Yasunaga
Chenlin Meng
Yifan Mai
Joon Sung Park
...
Jun-Yan Zhu
Fei-Fei Li
Jiajun Wu
Stefano Ermon
Percy Liang
149
126
0
07 Nov 2023
In-context Pretraining: Language Modeling Beyond Document Boundaries
In-context Pretraining: Language Modeling Beyond Document Boundaries
Weijia Shi
Sewon Min
Maria Lomeli
Chunting Zhou
Margaret Li
...
Victoria Lin
Noah A. Smith
Luke Zettlemoyer
Scott Yih
Mike Lewis
LRM
RALM
SyDa
32
48
0
16 Oct 2023
Thought Propagation: An Analogical Approach to Complex Reasoning with
  Large Language Models
Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models
Junchi Yu
Ran He
Rex Ying
LRM
61
24
0
06 Oct 2023
12
Next