Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Only seems to have done that in a couple places, like the MatMul. The softmax kernel (https://github.com/ScalingIntelligence/good-kernels/blob/mai...) seem to be entirely bog-standard, and the layernorm kernels are only slightly more interesting.


I looked at the softmax kernel and the cast that it does from a float* to a float4* is extremely brittle -- it's trivial to break by offsetting the input slightly.

Very likely a kernel for a standard library could not employ such a trick that relies on alignment of input pointers. Certainly not without a fallback.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: