Audio-Language Reasoning Agent

  • Engineered a multimodal retrieve-then-reason pipeline projecting audio and text into a shared CLAP embedding space with FAISS vector search.
  • Routed retrieved context to Qwen2.5-3B for zero-shot audio classification, achieving 94% top-1 accuracy on ESC-50.

Read the write-up on Medium →

updated_at 01-10-2025