Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.14649
Cited By
v1
v2 (latest)
RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
18 March 2025
Wenqi Jiang
Suvinay Subramanian
Cat Graves
Gustavo Alonso
Amir Yazdanbakhsh
Vidushi Dadu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving"
24 / 24 papers shown
Title
Patchwork: A Unified Framework for RAG Serving
Bodun Hu
Luis Pabon
Saurabh Agarwal
Aditya Akella
69
0
0
01 May 2025
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge
Ha Trinh
Newman Cheng
Joshua Bradley
Alex Chao
Apurva Mody
Steven Truitt
Dasha Metropolitansky
Robert Osazuwa Ness
Jonathan Larson
RALM
255
439
0
20 Feb 2025
ChunkRAG: Novel LLM-Chunk Filtering Method for RAG Systems
Ishneet Sukhvinder Singh
Ritvik Aggarwal
Ibrahim Allahverdiyev
Muhammad Taha
Aslihan Akalin
Kevin Zhu
Sean O'Brien
110
10
0
25 Oct 2024
Accelerating Inference of Networks in the Frequency Domain
Chenqiu Zhao
Guanfang Dong
Anup Basu
110
19
0
06 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
145
19
0
06 Oct 2024
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Jinhyuk Lee
Anthony Chen
Zhuyun Dai
Dheeru Dua
Devendra Singh Sachan
...
Jeremy R. Cole
Sebastian Riedel
Iftekhar Naim
Ming-Wei Chang
Kelvin Guu
RALM
LRM
95
37
0
19 Jun 2024
CRAG -- Comprehensive RAG Benchmark
Xiao Yang
Kai Sun
Hao Xin
Yushi Sun
Nikita Bhalla
...
Nirav Shah
Rakesh Wanga
Anuj Kumar
Wen-tau Yih
Xin Luna Dong
86
32
0
07 Jun 2024
Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
Maciej Besta
Aleš Kubíček
Robert Gerstenberger
Marcin Chrapek
Roman Niggli
...
Joanna Gajda
Piotr Nyczyk
Jürgen Müller
H. Niewiadomski
Torsten Hoefler
85
20
0
07 Jun 2024
vTrain: A Simulation Framework for Evaluating Cost-effective and Compute-optimal Large Language Model Training
Jehyeon Bang
Yujeong Choi
Myeongwoo Kim
Yongdeok Kim
Minsoo Rhu
57
18
0
27 Nov 2023
A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators
Dan Zhang
Safeen Huda
Ebrahim M. Songhori
Kartik Prabhu
Quoc V. Le
Anna Goldie
Azalia Mirhoseini
76
53
0
26 May 2021
Nearest Neighbor Machine Translation
Urvashi Khandelwal
Angela Fan
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
73
286
0
01 Oct 2020
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Gautier Izacard
Edouard Grave
RALM
147
1,182
0
02 Jul 2020
Pre-training via Paraphrasing
M. Lewis
Marjan Ghazvininejad
Gargi Ghosh
Armen Aghajanyan
Sida I. Wang
Luke Zettlemoyer
AIMat
87
161
0
26 Jun 2020
REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming-Wei Chang
RALM
145
2,118
0
10 Feb 2020
GGNN: Graph-based GPU Nearest Neighbor Search
F. Groh
Lukas Ruppert
P. Wieschollek
Hendrik P. A. Lensch
GNN
65
44
0
02 Dec 2019
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Samyam Rajbhandari
Jeff Rasley
Olatunji Ruwase
Yuxiong He
ALM
AI4CE
84
919
0
04 Oct 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.3K
12,316
0
27 Aug 2019
ELI5: Long Form Question Answering
Angela Fan
Yacine Jernite
Ethan Perez
David Grangier
Jason Weston
Michael Auli
AI4MH
ELM
108
624
0
22 Jul 2019
SciBERT: A Pretrained Language Model for Scientific Text
Iz Beltagy
Kyle Lo
Arman Cohan
168
2,986
0
26 Mar 2019
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Jinhyuk Lee
Wonjin Yoon
Sungdong Kim
Donghyeon Kim
Sunkyu Kim
Chan Ho So
Jaewoo Kang
OOD
182
5,674
0
25 Jan 2019
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
237
2,692
0
09 May 2017
Billion-scale similarity search with GPUs
Jeff Johnson
Matthijs Douze
Hervé Jégou
257
3,741
0
28 Feb 2017
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset
Payal Bajaj
Daniel Fernando Campos
Nick Craswell
Li Deng
Jianfeng Gao
...
Mir Rosenberg
Xia Song
Alina Stoica
Saurabh Tiwary
Tong Wang
RALM
160
2,745
0
28 Nov 2016
Fast k Nearest Neighbor Search using GPU
Vincent Garcia
E. Debreuve
Michel Barlaud
96
580
0
09 Apr 2008
1