You say nothing beats shell scripting for: - Filesystem operations. How many tim...

barrkel · on Aug 14, 2013

Stringing together commands is the primary reason I write most of my scripts in bash; because my commands are generally written in multiple different languages. Also, the fact that the shell is a REPL, means scripts can be prototyped or even assembled from shell history fairly easily, then refactored into something better resembling usability.

Re performance, I usually run Cygwin on Windows. Starting up too many processes is not a mistake I tend to make, because forking in Cygwin is hopelessly slow. Similarly, spaces in paths are common, and I've learned to be fairly religious about quoting, to the point of using print0, xargs -0 etc.

My scripts regularly deal with millions of files, and pipes that transfer tens of gigabytes. I rely on being able to string together sort and uniq to do set operations over multi-gigabyte files with constant memory usage; such scripts are not trivially rewritten in languages like Python without using non-standard libraries. When performance becomes an issue, the solution is a lot more heavyweight in terms of development time.

_pvxk · on Aug 14, 2013

Fair enough; Unix certainly gives you enough ways to shoot yourself and your customers in the foot. I probably should've added the qualifier "wrt. succinctness" (that's e.g. the reason I move from awk to Python when I need more complex data structures than awk's arrays, which can often do the job, but not without looking ugly).

By the way, your "ls * .foo" does not even do the same as "find -name '*.foo'" due to a superfluous space ;-)

But the fact that most people don't know about xargs -P (or gnu parallel), or spawn too many processes, or use ugly hacks like pidfiles/start-stop-daemon, is not a reason to throw out all the good stuff shell scripting has to offer.

_pvxk · on Aug 14, 2013

By the way,

> "cat myfile | grep foo | awk '...'" […] indeed this is one of the most frequent performance sinks in shell scripts

Is it really? I would've thought loops were a more common performance sink. I can't imagine how that useless use of cat has _that_ much of an effect, unless you're running a whole bunch of copies of this script. It looks ugly in the process table and it does not let the real command (here: grep) move back and forth in the file, but I've never noticed performance improvements from removing uuoc's.

jzwinck · on Aug 16, 2013

My original phrasing may have been unclear; with the elision you made the thrust becomes quite different. I intended to convey two thoughts: that Useless Use of Cat and other filters is pervasive because it is so easy, and secondly, that spawning tons of unnecessary processes is a common performance waste. The specific example with just three processes (cat, grep, awk) is not such a big deal--the bad cases are when stuff like that happens within loops (but the performance loss is not due to the loops, but rather the spawning).

e12e · on Aug 15, 2013

To be fair, there's (probably) two forks too many: awk can do filtering on regular expressions too.

Not that I think it really makes much of difference in most cases.