RAG is dead, long live agentic retrieval

Video touches a question I keep bumping into lately: do I actually need to learn RAG?

Very roughly: RAG is a way to preprocess your data into embeddings — high-dimensional vectors tied to chunks of text. During inference you retrieve the nearest chunks and feed them to the model so it has the “right” facts in front of it. If you have a text database with things that really shouldn’t be distorted, this helps reduce hallucinations.

The idea is elegant. The reality is messy. There are a lot of moving parts and a lot of “policies”: how to chunk, how to clean, how to embed, how to index, how to re-rank, what to cache, how to evaluate… It quickly becomes a whole universe where it’s easy to get lost.

In my current project I went the other way: no RAG. I just gave the agent tools like grep, cat, and a couple of small Python helpers it wrote for itself. According to the video, for agents working with codebases this is basically the SOTA approach: don’t overbuild retrieval, give the agent strong, direct access to the source of truth.

Extremely short version of the video:

📀For codebases: give the agent tools.
📀For other domains: RAG is still very relevant.

P.S. The phrase on the screenshot is from Yandex’s video summarization. It’s unintentionally hilarious.