Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I understand the paper right...

At indexing time:

- run LLM over every data point multiple times ("gleanings") for entity extraction and constructing a graph index

- run an LLM over the graph multiple times to create clusters ("communities")

At query time:

- Run the LLM across all clusters, creating an answer from each and score them

- Run the LLM across all but the lowest scoring answers to produce a "global answer"

...aren't the compute requirements here untenable for any decent sized dataset?



It really depends on the job you're trying to accomplish. I'd venture saying that it's way too early for horizontal / massive scale RAG apps.

Most solutions will want to focus on a very specific vertical application where the dataset is much more constrained. That we're this makes more sense.

Also a lot of alpha in data augmentation.


It depends on your latency requirements, not every RAG task has a user waiting for an immediate response, for my use case it doesn't matter if an answer takes even 10's of minutes to generate


Given the cost of running an LLM for 10 minutes, yes it does matter. That's about 15 dollars.

The overall answer better be very good and relevant every time for this to tech to make sense.


Oh that’s a good point, I have my own GPU rack and run locally because it’s cheaper to do so, so I hadn’t considered that…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: