Retrieval-Augmented QA with QLoRA + DSPy
- Built an end-to-end RAG pipeline using DSPy prompt optimization on a QLoRA 4-bit Llama-2-7B, improving SQuAD token-F1 from 0.42 to 0.81 under 2 GiB of GPU memory.
- Combined Sentence Transformers retrieval with BootstrapFewShot prompt compilation to automatically select few-shot demonstrations.