Parameter Efficient Fine Tuning of LLMs Using Unsloth
Parameter efficient fine tuning adapts large language models to specific tasks without retraining all weights. By training only a small set of parameters, tools like Unsloth make LoRA and DoRA practical on a single GPU or Google Colab.

Full fine tuning updates every parameter in a model, which is expensive and often unnecessary. PEFT focuses training on a small number of added parameters while keeping the base model frozen. This is especially useful when you want to specialize a model for a narrow domain, iterate quickly, or deploy multiple task specific adapters without storing multiple full model copies. Unsloth is designed to make fine tuning faster and more memory efficient. It supports quantized model loading, efficient gradient checkpointing, and optimized training paths for common PEFT methods. In practice, it helps you run experiments that would otherwise require larger GPUs, while keeping the developer experience straightforward and compatible with the Hugging Face ecosystem.
LoRA, or Low Rank Adaptation, injects small low rank matrices into specific layers, typically attention and feedforward projections. During training, only these new matrices are updated and the base model stays unchanged. DoRA, or Decomposed Rank Adaptation, builds on LoRA by separating the update into magnitude and direction components, which can improve optimization behavior and stability in some settings.
Practical Tips and LoRA vs DoRA Example
Before training, it helps to confirm your task format, label constraints, and sequence length distribution. During training, start with conservative batch sizes, use gradient accumulation for stability, and monitor loss for divergence. After training, save adapters and keep the base model unchanged so you can reuse it across multiple tasks.
- Freeze the base model and train only adapters
- Target attention and MLP projection layers first
- Use 4 bit loading when GPU memory is limited
- Use gradient accumulation to increase effective batch size
- Save adapters per task instead of saving full model copies