ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.05821
66
18

Optimizing LLM Queries in Relational Data Analytics Workloads

9 March 2024
Shu Liu
Asim Biswal
Audrey Cheng
Xiangxi Mo
Shiyi Cao
Joseph E. Gonzalez
Ion Stoica
Matei A. Zaharia
Ion Stoica
Joseph E. Gonzalez
Matei Zaharia
ArXivPDFHTML
Abstract

Batch data analytics is a growing application for Large Language Models (LLMs). LLMs enable users to perform a wide range of natural language tasks, such as classification, entity extraction, and translation, over large datasets. However, LLM inference is highly costly and slow: for example, an NVIDIA L4 GPU running Llama3-8B can only process 6 KB of text per second, taking about a day to handle 15 GB of data; processing a similar amount of data costs around 10KonOpenAI′sGPT−4o.Inthispaper,weproposenoveltechniquesthatcansignificantlyreducethecostofLLMcallsforrelationaldataanalyticsworkloads.Ourkeycontributionisdevelopingefficientalgorithmsforreorderingtherowsandthefieldswithineachrowofaninputtabletomaximizekey−value(KV)cachereusewhenperformingLLMserving.Assuch,ourapproachcanbeeasilyappliedtoexistinganalyticssystemsandservingplatforms.Ourevaluationshowsthatoursolutioncanyieldupto3.4ximprovementinjobcompletiontimeonabenchmarkofdiverseLLM−basedqueriesusingLlama3models.Oursolutionalsoachievesa3210K on OpenAI's GPT-4o. In this paper, we propose novel techniques that can significantly reduce the cost of LLM calls for relational data analytics workloads. Our key contribution is developing efficient algorithms for reordering the rows and the fields within each row of an input table to maximize key-value (KV) cache reuse when performing LLM serving. As such, our approach can be easily applied to existing analytics systems and serving platforms. Our evaluation shows that our solution can yield up to 3.4x improvement in job completion time on a benchmark of diverse LLM-based queries using Llama 3 models. Our solution also achieves a 32% cost savings under OpenAI and Anthropic pricing models.10KonOpenAI′sGPT−4o.Inthispaper,weproposenoveltechniquesthatcansignificantlyreducethecostofLLMcallsforrelationaldataanalyticsworkloads.Ourkeycontributionisdevelopingefficientalgorithmsforreorderingtherowsandthefieldswithineachrowofaninputtabletomaximizekey−value(KV)cachereusewhenperformingLLMserving.Assuch,ourapproachcanbeeasilyappliedtoexistinganalyticssystemsandservingplatforms.Ourevaluationshowsthatoursolutioncanyieldupto3.4ximprovementinjobcompletiontimeonabenchmarkofdiverseLLM−basedqueriesusingLlama3models.Oursolutionalsoachievesa32

View on arXiv
@article{liu2025_2403.05821,
  title={ Optimizing LLM Queries in Relational Data Analytics Workloads },
  author={ Shu Liu and Asim Biswal and Amog Kamsetty and Audrey Cheng and Luis Gaspar Schroeder and Liana Patel and Shiyi Cao and Xiangxi Mo and Ion Stoica and Joseph E. Gonzalez and Matei Zaharia },
  journal={arXiv preprint arXiv:2403.05821},
  year={ 2025 }
}
Comments on this paper