Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>The victim does not have to be in the public channel for the attack to work

Oh boy this is gonna be good.

>Note also that the citation [1] does not refer to the attacker’s channel. Rather, it only refers to the private channel that the user put their API key in. This is in violation of the correct citation behavior, which is that every message which contributed to an answer should be cited.

I really don't understand why anyone expects LLM citations to be correct. It has always seemed to me like they're more of a human hack, designed to trick the viewer into believing the output is more likely correct, without improving the correctness at all. If anything it seems likely to worsen the response's accuracy, as it adds processing cost/context size/etc.

This all also smells to me like it's inches away from Slack helpfully adding link expansion to the AI responses (I mean, why wouldn't they?)..... and then you won't even have to click the link to exfiltrate, it'll happen automatically just by seeing it.



I do find citations helpful because I can check if the LLM just hallucinated.

It's not that seeing a citation makes me trust it, it's that I can fact check it.

Kagi's FastGPT is the first LLM I've enjoyed using because I can treat it as a summary of sources and then confirm at a primary source. Rather than sifting through increasingly irrelevant sources that pollute the internet.


> I really don't understand why anyone expects LLM citations to be correct

It can be done if you do something like:

1. Take user’s prompt, ask LLM to convert the prompt into a elastic search query (for example)

2. Use elastic search (or similar) to find sources that contain the keywords

3. Ask LLM to limit its response to information on that page

4. Insert the citations based on step 2 which you know are real sources

Or at least that’s my naive way of how I would design it.

The key is limiting the LLM’s knowledge to information in the source. Then the only real concern is hallucination and the value of the information surfaced by Elastic Search

I realize this approach also ignores benefits (maybe?) of allowing it full reign on the entire corpus of information, though.


It also doesn't prevent it from hallucinating something wholesale from the rest of the corpus it was trained on. Sometimes this is a huge source of incorrect results due to almost-but-not-quite matching public data.

But yes, a complete list of "we fed it this" is useful and relatively trustworthy in ways that "ask the LLM to cite what it used" is absolutely not.


Why would you expect step 3 to work?


That's the neat part, it doesn't




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: