Retrieval-Augmented QA with QLoRA + DSPy

  • Built an end-to-end RAG pipeline using DSPy prompt optimization on a QLoRA 4-bit Llama-2-7B, improving SQuAD token-F1 from 0.42 to 0.81 under 2 GiB of GPU memory.
  • Combined Sentence Transformers retrieval with BootstrapFewShot prompt compilation to automatically select few-shot demonstrations.

Read the write-up on Medium →

updated_at 01-12-2025