# LoRA (Low-Rank Adaptation) Confidence: high Last verified: 2026-05-22 Generation: human_only ## TL;DR LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method for large language models, introduced by Hu et al. from Microsoft in 2021 (arXiv:2106.09685, 19,123 citations as of May 2026). Instead of updating all model parameters during fine-tuning, LoRA injects small, trainable low-rank matrices into the model's weight layers, reducing trainable parameters by up to 10,000x while maintaining near full-fine-tuning performance. The method has become the dominant fine-tuning approach in the open-source LLM community, with 13,547 GitHub stars. ## Core Explanation Standard fine-tuning of a large model (e.g., Llama 70B) requires updating all 70 billion parameters per training step — computationally prohibitive for most individuals and small teams. LoRA's key insight is that weight updates during fine-tuning have low "intrinsic rank" — meaning the effective change to a weight matrix W can be represented as a low-rank decomposition: ``` W' = W + ΔW where ΔW = B · A ``` - W: Original weight matrix (frozen, d × k) - A: Down-projection matrix (trainable, r × k, where r << d,k) - B: Up-projection matrix (trainable, d × r) For a typical LoRA configuration (r=8 or r=16 on attention projection matrices), the number of trainable parameters drops from billions to millions. This enables fine-tuning 70B models on a single consumer GPU. ## Key Advantages - **Parameter efficiency**: Train 0.1-1% of original parameters - **No inference latency**: LoRA weights can be merged into the base model (W' = W + BA) - **Task switching**: Multiple LoRA adapters can be hot-swapped without reloading the base model - **Storage efficiency**: LoRA adapter is typically 5-50 MB vs. 140+ GB for full model weights ## Variants | Variant | Innovation | | --------------------------------- | ------------------------------------------------------------------------------ | | **QLoRA** (Dettmers et al., 2023) | Adds 4-bit quantization to LoRA, enabling 65B fine-tuning on a single 48GB GPU | | **DoRA** (Liu et al., 2024) | Decomposes weight updates into magnitude + direction for better alignment | | **LoRA+** | Different learning rates for A and B matrices improve convergence | ## Further Reading - [LoRA Paper](https://arxiv.org/abs/2106.09685): Original paper by Hu et al. - [LoRA GitHub](https://github.com/microsoft/LoRA): Official Microsoft implementation (13K+ stars) - [QLoRA](https://arxiv.org/abs/2305.14314): Quantized LoRA for low-resource fine-tuning