As a web developer, I've often heard neckbeards bickering about Python's performance, but haven't had a real point-of-reference to understand how bad it can be until recently.
I've started working on a side project that processes geo data in AppEngine. My dataset includes many long lists of numbers (lats, longs, altitudes, timestamps, etc.). A 700 route dataset is about 25MB in a sqlite database, but trying to access any significant portion of it quickly maxes out the 4GB of RAM available on either of my dev machines (which is more than I could reasonably expect to be provisioned in the cloud). I mentioned this as a potential bug to the relevant Googler at I/O this year and he basically said "that's not us, that's Python."
It's mindboggling how quickly you can burn through your RAM in CPython. Hopefully you can prove something that will eventually make its way back into CPython and lift everyone's boats. Unfortunately, even if Falcon helped on my dev machine, I can't imagine it being taken up on cloud platforms like AppEngine.
If you don't have a lot of experience with Python you may not know how some of the "magic" parts actually work inside. Some of the things you can do however will end up allocating a lot of memory with extra copies of data that you don't need. If you do that a lot, you will chew through lots of memory. There are often two or more ways of accomplishing the same thing, with one way creating duplicates of data, and the other not. There are valid applications for both, and if you're only dealing with small amounts of data then the differences don't really matter.
Where a lot of people who are new to Python run into problems is that the language is deceptively easy. They try writing Python code that's simply a direct analogue of how they would write Java or C#. The resulting code will run, but it often be slow and a lot more verbose than necessary. Very often the way you would do something in Java or C# is the worst possible way to do it in Python. Conversely, the best way to do it in Python often has no direct analogue in Java or C#. With Python, the learning curve is shallow, but it's very long, and there's lots to learn if you want to reap all the benefits.
Without knowing what your data or algorithms are, it's pretty difficult to give any sensible detailed advice. However, if you are dealing with long "lists" of numbers, perhaps what you really want is long "arrays" of numbers. Lists and arrays are not the same thing in Python.
It's easy to burn through RAM in Python because it's easy to keep unnecessary data around. Iterating through a large dataset is better than storing it all (and all the subsequent 'filtered' data) in memory at once.
I haven't. I don't have any complex math in mind (yet), just some simple transformations. The problem is that even something as simple as checking a list for potential duplicates becomes really RAM intensive for sufficiently large lists. (I'm not even doing deep equality, just comparing metadata.)
I still have plenty more work to do on the project. I think I'll end up fanning out each list iteration into a series of smaller chunks to keep me from blowing through all the RAM on any one request.
Numpy supports lots of array math, but another way to think of it is as an api for working directly with memory (and values stored as platform types instead of python objects).
I often find this problem with python, although usually it is the parsing code; the code that loads all the data up, that actually uses the high-water-mark of RAM.
For example, I had a 100MB JSON file that I tried to use the stdlib json library to load. It quickly used >8GB (my machine's RAM) and started paging, dragging everything to a halt. This is partly because the stdlib JSON parser is written in python.
Now, if you switch to a small, clever implementation called cjson[1], it can load the whole thing without bumping 3-400MB in RAM, and the high watermark is the data at the end. Much better!
So, in summary, be careful that the important part of your code is the one that uses all the RAM - and that it's not some "hello world" quality stdlib code that's killing you. If it is, and there isn't a cjson for the job, I've found wrapping C/C++ libraries with Cython[2] a simple way to solve the problem without too much hassle (generally only a couple of days work at a time if you're tight, and only wrap the functions you actually need to use yourself.)
[1] https://pypi.python.org/pypi/python-cjson - although there's a 1.5.1 out there somewhere with a fix for a bug that loses precision on floats...which is the only one I use personally. It's so hard to find that I keep a copy of the source in my Dropbox for when I need it!
[2] http://cython.org/ - although of course actually using cython means you can't take advantage of pypy, IronPython, and other "faster" implementations because you're tied to the cpython C interface forever.
I've started working on a side project that processes geo data in AppEngine. My dataset includes many long lists of numbers (lats, longs, altitudes, timestamps, etc.). A 700 route dataset is about 25MB in a sqlite database, but trying to access any significant portion of it quickly maxes out the 4GB of RAM available on either of my dev machines (which is more than I could reasonably expect to be provisioned in the cloud). I mentioned this as a potential bug to the relevant Googler at I/O this year and he basically said "that's not us, that's Python."
It's mindboggling how quickly you can burn through your RAM in CPython. Hopefully you can prove something that will eventually make its way back into CPython and lift everyone's boats. Unfortunately, even if Falcon helped on my dev machine, I can't imagine it being taken up on cloud platforms like AppEngine.