If I understand the paper right... At indexing time: - run LLM over every data p...

glesperance · on July 2, 2024

It really depends on the job you're trying to accomplish. I'd venture saying that it's way too early for horizontal / massive scale RAG apps.

Most solutions will want to focus on a very specific vertical application where the dataset is much more constrained. That we're this makes more sense.

Also a lot of alpha in data augmentation.

malux85 · on July 2, 2024

It depends on your latency requirements, not every RAG task has a user waiting for an immediate response, for my use case it doesn't matter if an answer takes even 10's of minutes to generate

hirako2000 · on July 2, 2024

Given the cost of running an LLM for 10 minutes, yes it does matter. That's about 15 dollars.

The overall answer better be very good and relevant every time for this to tech to make sense.

malux85 · on July 2, 2024

Oh that’s a good point, I have my own GPU rack and run locally because it’s cheaper to do so, so I hadn’t considered that…