IME performant Cython is quite hard to write. Simply renaming your file to *.pyx speeds it up, very much finger in the air, by factor 2x on compute-heavy tasks.
Then you sprinkle some cdef around etc and you get a bit faster again. You rewrite your algo a bit, so it's more "stateful C" style, which is not so much the Python way, and it gets a little faster. But not that much.
So then to make real gains you have to go into the weeds of what is going on. Look at the Cython bottlenecks, usually the spots where Cython has to revert to interacting with the Python interpreter. You may go down the rabbit holes of Cython directives, switching off things like overflow checks etc. IME this is a lot of trial and error and isn't always intuitive. All of this is done in a language that, by this point, is superficially similar to Python but might as well not be. The slowness comes no longer from algorithmic logic or Python semantics but from places where Cython escapes out to the Python interpreter.
At this point, C++ may offer a respite, if you are familiar with the language. Because performance tradeoffs are very obvious in code right in front of you. You get no head start in terms of Pythonic syntax, but otherwise you are writing pure C++ and its so much easier to reason with the performance.
I would imagine that very well written Cython is close in performance to C++ but for someone who knows a bit of C++ and only occasionally writes Cython, the former is much easier to make fast.
I write performant cython all the time, as a glue language. Write your "business logic" in Python. Write your class definitions and heavyweight algorithms in C++. Write your API in Cython. If you're writing your business logic and heavyweight algorithms all in cython, you're in for some misery.
Then you sprinkle some cdef around etc and you get a bit faster again. You rewrite your algo a bit, so it's more "stateful C" style, which is not so much the Python way, and it gets a little faster. But not that much.
So then to make real gains you have to go into the weeds of what is going on. Look at the Cython bottlenecks, usually the spots where Cython has to revert to interacting with the Python interpreter. You may go down the rabbit holes of Cython directives, switching off things like overflow checks etc. IME this is a lot of trial and error and isn't always intuitive. All of this is done in a language that, by this point, is superficially similar to Python but might as well not be. The slowness comes no longer from algorithmic logic or Python semantics but from places where Cython escapes out to the Python interpreter.
At this point, C++ may offer a respite, if you are familiar with the language. Because performance tradeoffs are very obvious in code right in front of you. You get no head start in terms of Pythonic syntax, but otherwise you are writing pure C++ and its so much easier to reason with the performance.
I would imagine that very well written Cython is close in performance to C++ but for someone who knows a bit of C++ and only occasionally writes Cython, the former is much easier to make fast.