- Filesystem operations. How many times have we seen programs using "ls * .foo" when they needed to use "find -name '*.foo'" to avoid command-line length limits? I answered this very question on StackOverflow this week. How many seemingly-capable software shops will churn out shell scripts which misbehave when a path has a space or other "weird" character in it? Even Apple fell prey to that one, about ten years ago, and destroyed some users' data.
- Stringing together commands. This encourages abominations like "cat myfile | grep foo | awk '...'". Just use awk if you're into that, but the shell has a knack for "tricking" people into spawning extra processes that are not really needed (indeed this is one of the most frequent performance sinks in shell scripts). And what about error handling for the several subprocesses? It's usually ignored for N-1 of them.
- Keeping track of backgrounded commands (using multiple threads without even thinking about it). Yes, you can use multiple cores without thinking--that can be cool. But what if you want to do N units of work on many fewer than N cores? You ought to use a pool, but there's no such thing in Bash. Maybe you're clever and use "xargs -P" for this, but most people don't.
- Turning arbitrary programs into "daemons". I use start-stop-daemon for that (it's included in Debian, and I easily wrote a workalike in Python when I had to use a system that didn't support it natively).
Just about the only thing here that shell scripts are really good for is doing things "without even thinking about it." Once when I was asked why shell scripting was not a good idea for production programs, I reviewed a smallish sample Bash script that had been deployed. I found a dozen latent bugs, 50% of which would have never have happened with Python (or Go, or...).
Stringing together commands is the primary reason I write most of my scripts in bash; because my commands are generally written in multiple different languages. Also, the fact that the shell is a REPL, means scripts can be prototyped or even assembled from shell history fairly easily, then refactored into something better resembling usability.
Re performance, I usually run Cygwin on Windows. Starting up too many processes is not a mistake I tend to make, because forking in Cygwin is hopelessly slow. Similarly, spaces in paths are common, and I've learned to be fairly religious about quoting, to the point of using print0, xargs -0 etc.
My scripts regularly deal with millions of files, and pipes that transfer tens of gigabytes. I rely on being able to string together sort and uniq to do set operations over multi-gigabyte files with constant memory usage; such scripts are not trivially rewritten in languages like Python without using non-standard libraries. When performance becomes an issue, the solution is a lot more heavyweight in terms of development time.
Fair enough; Unix certainly gives you enough ways to shoot yourself and your customers in the foot. I probably should've added the qualifier "wrt. succinctness" (that's e.g. the reason I move from awk to Python when I need more complex data structures than awk's arrays, which can often do the job, but not without looking ugly).
By the way, your "ls * .foo" does not even do the same as "find -name '*.foo'" due to a superfluous space ;-)
But the fact that most people don't know about xargs -P (or gnu parallel), or spawn too many processes, or use ugly hacks like pidfiles/start-stop-daemon, is not a reason to throw out all the good stuff shell scripting has to offer.
> "cat myfile | grep foo | awk '...'" […] indeed this is one of the most frequent performance sinks in shell scripts
Is it really? I would've thought loops were a more common performance sink. I can't imagine how that useless use of cat has _that_ much of an effect, unless you're running a whole bunch of copies of this script. It looks ugly in the process table and it does not let the real command (here: grep) move back and forth in the file, but I've never noticed performance improvements from removing uuoc's.
My original phrasing may have been unclear; with the elision you made the thrust becomes quite different. I intended to convey two thoughts: that Useless Use of Cat and other filters is pervasive because it is so easy, and secondly, that spawning tons of unnecessary processes is a common performance waste. The specific example with just three processes (cat, grep, awk) is not such a big deal--the bad cases are when stuff like that happens within loops (but the performance loss is not due to the loops, but rather the spawning).
- Filesystem operations. How many times have we seen programs using "ls * .foo" when they needed to use "find -name '*.foo'" to avoid command-line length limits? I answered this very question on StackOverflow this week. How many seemingly-capable software shops will churn out shell scripts which misbehave when a path has a space or other "weird" character in it? Even Apple fell prey to that one, about ten years ago, and destroyed some users' data.
- Stringing together commands. This encourages abominations like "cat myfile | grep foo | awk '...'". Just use awk if you're into that, but the shell has a knack for "tricking" people into spawning extra processes that are not really needed (indeed this is one of the most frequent performance sinks in shell scripts). And what about error handling for the several subprocesses? It's usually ignored for N-1 of them.
- Keeping track of backgrounded commands (using multiple threads without even thinking about it). Yes, you can use multiple cores without thinking--that can be cool. But what if you want to do N units of work on many fewer than N cores? You ought to use a pool, but there's no such thing in Bash. Maybe you're clever and use "xargs -P" for this, but most people don't.
- Turning arbitrary programs into "daemons". I use start-stop-daemon for that (it's included in Debian, and I easily wrote a workalike in Python when I had to use a system that didn't support it natively).
Just about the only thing here that shell scripts are really good for is doing things "without even thinking about it." Once when I was asked why shell scripting was not a good idea for production programs, I reviewed a smallish sample Bash script that had been deployed. I found a dozen latent bugs, 50% of which would have never have happened with Python (or Go, or...).