I played around making these things before, it's a fun exercise. Interesting to ...

llm_trw · on Dec 1, 2024

I ended up training a bert on nothing but python for the embedding search. The results were crap. Then I used an llm to write a new docstring for each class/function definition in the training data and the results were better than state of the art.

There's so much wide open space to explore. It's a shame that everyone is wasting their time with the biggest possible models they can afford.

digdugdirk · on Dec 1, 2024

Do you have any more detailed info on this process? I've played around with using LLMs, but nothing in the training realm. I'd love to see a writeup or guide to the process you used there.

llm_trw · on Dec 1, 2024

No and it won't do you much good even if I did.

The tools have broken again since then - thanks tensorflow data loaders - and my code only works against a version of python that's no longer supported in LTS Ubuntu/Debian10+.

I have been mulling about running a subscription service where you get up to date code that works on topics like the above. If you're interested drop me a line at my profile email and I'll add you to a mailing list when/if I ever get around to doing it.