Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Python Patterns: An Optimization Anecdote (python.org)
80 points by llambda on Dec 18, 2011 | hide | past | favorite | 11 comments


Bear in mind this is extremely outdated article and some of the advice should be taken with very big grain of salt, as things have changed since 2002. The most significant improvement of Python (in this context) is the introduction of generators, which seem like an almost obvious choice for a problem such as this. The solutions that immediately came to my mind, for example, were this:

    import itertools
    def f2_1(list):
        return str(itertools.imap(chr, list))
and this:

    def f2_2(list):
        return str(chr(c) for c in list)
They are both about twice as fast (in 2.7.2 CPython) as the suggested array solution (f2_1 performs slightly better for smaller lists), and their readability is rather significantly better.


To avoid confusing newbies, str() does not concatenate a list (or generator, etc.) of strings into a single string. You probably meant to call ''.join() instead.

  >>> import itertools
  >>> def f2_1(list):
  ...         return str(itertools.imap(chr, list))
  ... 
  >>> f2_1(range(10))
  '<itertools.imap object at 0x10048e850>'
  >>> def f2_2(list):
  ...         return str(chr(c) for c in list)
  ... 
  >>> f2_2(range(10))
  '<generator object <genexpr> at 0x10046b780>'


I'm a total newbie at Python, but wouldn't

  ''.join(map(chr, list))
avoid the quadratic time of doing "string += something" (and also be a prettier solution IMO)?

(In Python 2.x doing ''.join(chr(n) for n in list) might be a better solution to avoid the intermediate list created by map)


The article is from 2002, so it's entire line of reasoning is quite outdated.


Just a note that there was good discussion about this before [1].

Remember that this essay is quite old (2002) and Python has changed significantly. [2]

[1] http://news.ycombinator.com/item?id=1859417

[2] http://keithdevens.com/weblog/archive/2002/Apr/10/AnOptimiza...


Thanks for the links. The most interesting thing in the previous HN thread was learning that this article is a) really old (probably outdated), and b) written by Guido!


It's troubling to read something about Python optimization/performance without mention that this is atrocious in a loop

string = string + s

That is such a code smell I cringed every time I saw it.

Should append to list and join at the end (once). Which, I bet (I, or you, don't know anything until we actually profile), dumping the += is significant reason why last two examples where much faster.


I still follow that rule, perhaps somewhat anachronistically. This has been long optimized in CPython so it's not quadratic:

$ python -V

Python 2.5.2

$ python -m timeit -s 'v=["a"]' 'v.append("a")'

1000000 loops, best of 3: 0.199 usec per loop

$ python -m timeit -s 's="a"' 's+="a"'

10000000 loops, best of 3: 0.127 usec per loop

The string append is actually FASTER now. Tried with strings of length 5 and it is still faster.


Here are performance numbers for a variety of approaches using a list of length 256. You can see that having a local reference to the global chr function (as discussed in the article) can give a slight speed boost. Using join() in conjunction with map() performs fairly well.

  python -m timeit -s 'l = range(256);' '"".join([chr(i) for i in l])'
  # 10000 loops, best of 3: 80.5 usec per loop
  python -m timeit -s 'l = range(256); lchr = chr' '"".join([lchr(i) for i in l])'
  # 10000 loops, best of 3: 75 usec per loop
  python -m timeit -s 'import itertools; l = range(256); lchr = chr' '"".join(itertools.imap(chr, l))'
  # 10000 loops, best of 3: 52.8 usec per loop
  python -m timeit -s 'l = range(256); lchr = chr' '"".join(map(chr, l))'
  # 10000 loops, best of 3: 48.9 usec per loop
  python -m timeit -s 'l = range(256); lchr = chr' '"".join(lchr(i) for i in l)'
  # 10000 loops, best of 3: 87.2 usec per loop
  python -m timeit -s 'l = range(256); lchr = chr' '
  s = ""
  for i in l:
    s += lchr(i)
  '
  # 10000 loops, best of 3: 64.2 usec per loop


This[1] was my bible when I first wondered how best to build strings in Python.

[1] http://www.skymind.com/~ocrow/python_string/


", ".join([0,1,2,3,4,5,6,7,8,9]) works well. Why not that?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: