Linked lists and arrays typically have different perf characteristics (although ...

tptacek · on Dec 19, 2011

That's a sensible point, but interpreter and runtime overhead crushes any cost savings you might get from adjusting links instead of the whole array backing store.

(Ruby arrays, at least in MRI, are basically STL vectors).

1000 inserts to the middle of a 1,000,000 element Ruby array happens so quickly you can barely perceive the delay. The same insert pattern to a basic Ruby linked list sets my machine on fire.

kenjackson · on Dec 19, 2011

Wow, that sounds painful. So do people not use things like graphs or trees in Ruby due to perf? I actually used to do Python development -- about 13 or so years ago, but had to quit due to perf just being abysmal for applications intended for customers (and started focusing on C++). Is Ruby today similarly bad for apps that make heavy use of data structures like trees/graphs as Python more than a decade ago?

EDIT: Updated to remove the term 'toy-apps'

jrockway · on Dec 20, 2011

The key to understanding languages whose runtimes interpret an optree is to realize that all performance depends on the number of ops. When you implement a linked list in pure Ruby, that's a lot of operations that the runtime has to keep track of. When you insert into an array, the actual work is done (in C) in a single operation. Less bookkeeping, less overhead. If you implement both arrays and linked lists in pure Ruby, you'll see the performance you expect. If you implement arrays in C and linked lists in Ruby, the array will probably be faster for any workload. (But implement the linked list in C, and you'll see the performance you expect again. It's computer science, not magic.)

mattmanser · on Dec 20, 2011

13 years ago. When the Pentium II was around. This is what I don't get about questions about linked lists.

They're so utterly irrelevant in 99% of startups today, where knowing how to lay out your classes and methods for maintainability is so much more important than any trivial and pointless performance gain.

The complexity of solutions is at such a higher level now than it's ever been and yet we're still worrying about structures that shaves less than a ms off something that's called once every 10 seconds.

The reason why ruby and python have grown in dominance in the last few years is because you don't actually have to worry about this any more.

kenjackson · on Dec 20, 2011

I don't generally ask about linked lists, but I will say that if you can't reason about them, it's a pretty big red flag. They're an extremely simple data structure.

With that said, a lot of startups (and web apps in general) are effectively consumer CRUD apps w/ nice images and transitions. Perf generally isn't important until scale becomes an issue (and until then you're main perf bottlenekck is some DB and/or network code that someone who cared about perf wrote).

So I agree. It makes sense to use effectively a domain specific language/framework like RoR when you're in that domain. And linked lists may not be applicable. But I'd still be weary of hiring people for whom reasoning about them is off-limits.

tptacek · on Dec 19, 2011

I'd respond to this, but the "toy apps" wording makes me think this conversation will be unproductive.

kenjackson · on Dec 19, 2011

Updated to remove the term.

tptacek · on Dec 19, 2011

Well, to start with, the issue here isn't that Ruby doesn't efficiently support the "modify-in-the-middle" access pattern. It's just that Ruby does this with the Array class. Similarly, even if you intend to insert into the middle, you still probably still want to use NSMutableArray in Objective C.

Even on fairly large graphs --- say, graphs at the scale of basic blocks in a program, but (obviously) not on the scale of "recommendation graph at Amazon" --- you shouldn't be burning huge numbers of cycles seeking through reference links. In naive reference-link implementations, Ruby sorely increases the constant factors for graph analysis, but computing a minimum cost spanning tree isn't going to kill you (especially because the intermediate data structures you'd use to do it would be native-code Arrays and Hashes). On the other hand, linked lists more or less pessimize Ruby, forcing it to do nothing but expensive interpreter looping, pointer chasing, and object management.

Having said all that, for serious work, I'd keep my data structures outboard, probably in Redis.

Other people would probably answer "just write a graph library in C; for instance, you could write a C wrapper around void-star-specialized boost::graph", then bridge it to Ruby with FFI.

Ruby is as good at C for I/O bound problem subsets. This is a lot of problems, and so having a language as pleasant to write in as Ruby is a win (you could say the same for Node.js and maybe even Python). It is obviously not the only language you'll ever need, though.