Except that CUDA is low level, so it's not hard to shim above it and write inter...

ethbr0 · on March 26, 2023

> this will play out like OpenGL vs Direct3d in reverse

Is that also like OpenCL vs CUDA in reverse?

WanderPanda · on March 26, 2023

I feel like abstractions don't work if we want to get the maximum performance. Afaik Tensor cores are not usable from opencl and on the other hand even in the CUDA universe cuBLAS (hand optimized) seems to outperform cutlass (using abstractions)