Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Coming back to this. LORA training is only on the attention layer, and this was sufficient for memorization , per the article. So we wouldn't update all the model's weights in some kind of constant context one-shot learning scheme.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: