ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.19747
53
0

HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

27 February 2025
Taiqiang Wu
Chenchen Ding
Wenyong Zhou
Yuxin Cheng
Xincheng Feng
Shuqi Wang
Chufan Shi
Z. Liu
Ngai Wong
ArXivPDFHTML
Abstract

Low-rank adaptation (LoRA) is a predominant parameter-efficient finetuning method to adapt large language models (LLMs) for downstream tasks. In this paper, we first propose to deploy the LoRA-finetuned LLMs on the hybrid compute-in-memory (CIM) architecture (i.e., pretrained weights onto RRAM and LoRA onto SRAM). To address performance degradation from RRAM's inherent noise, we design a novel Hardware-aware Low-rank Adaption (HaLoRA) method, aiming to train a LoRA branch that is both robust and accurate by aligning the training objectives under both ideal and noisy conditions. Experiments finetuning LLaMA 3.2 1B and 3B demonstrate HaLoRA's effectiveness across multiple reasoning tasks, achieving up to 22.7 improvement in average score while maintaining robustness at various noise levels.

View on arXiv
@article{wu2025_2502.19747,
  title={ HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture },
  author={ Taiqiang Wu and Chenchen Ding and Wenyong Zhou and Yuxin Cheng and Xincheng Feng and Shuqi Wang and Chufan Shi and Zhengwu Liu and Ngai Wong },
  journal={arXiv preprint arXiv:2502.19747},
  year={ 2025 }
}
Comments on this paper