Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
This must already exist (lbrandy.com)
105 points by spydez on Oct 14, 2009 | hide | past | favorite | 35 comments


While it's not a perfect solution, the Unix command apropos (a program that searches the summaries of the man pages for utilities) can help immensely:

   $ apropos pipe
   IO::Pipe (3p) - supply object methods for pipes
   perlipc (1) - Perl interprocess communication (signals, fifos, pipes, safe subprocesses, sockets, and semaphores)
   pipe (2) - create descriptor pair for interprocess communication
   ***tee (1) - pipe fitting***
   funzip (1) - filter for extracting from a ZIP archive in a pipe
   pv (1) - monitor the progress of data through a pipe
I find it helpful to run the following once in a while:

   $ for f in $(ls $BINPATH); do whatis $f >> funs; done
(where BINPATH is /bin, /usr/bin, /usr/local/bin, etc.), skim through the list of utilities, and look at the man pages for anything unfamiliar.

As with Emacs, the tool you want probably already exists, and you may already have it installed, you just need to search with the right keyword. (Learning the terminology takes some time, though.) Any system with that many features is difficult to completely remember, but remembering the details can also get pushed onto the system.


Relevant to this discussion is hoogle:

http://haskell.org/hoogle/

It beats the pants off of google/apropos/anything else simply because the type signatures for functions in a purely-functional, statically-typed language practically tell you what the function does!

Hoogle is a search engine for haskell's standard libraries, and lets you search by name, or type signature. It even helps you discover abstractions that you never knew about before:

http://coder.bsimmons.name/blog/2009/04/why-i-love-hoogle/


Apropos and the like could also be a lot more perceptive if they only worked for one language, but it's a major trade-off. On this computer, apropos is a uniform interface to search all the BSD userland utilities, the C, Perl, TCL, and Erlang standard libraries, system design docs (e.g. hier(7)), etc.

Come to think of it, apropos could also search Haskell functions by type signature -- Haddock would just need to generate stub man pages for them. (I don't use Haskell anymore, but someone might find it useful.)

Incidentally, the pipe-able Unix utilities are so useful because they all have the same type (line-buffered text stream -> line-buffered text stream), which is a particularly easy one to use across languages.


Pet peeve: never ever use ls that way -- it's extremely brittle and completely unnecessary.

  $ for f in ${BINPATH}/*; do whatis "$(basename $f)" >> funs; done


You're absolutely right. I knew about that issue with ls, but typing it that way is such an old habit of mine that I didn't think about it.


How about putting the commands into a Makefile and running it with make -j 4? Easy!

For a more heavyweight alternative, there are lots of solutions around for running batches of commands across multiple machines. For most/all of these, multiple commands on a single machine are just one of the simpler cases they can handle. Systems that spring to mind are GridEngine (http://gridengine.sunsource.net), Qube (http://www.pipelinefx.com) and Condor (http://www.cs.wisc.edu/condor). There are plenty of others too (including a lot of home-brewed brewed systems knocking around inside VFX houses).


With regards parallel jobs, what's wrong with sticking a "&" at the end of the line? Everything he listed is pretty trivial.

It comes down to the idea that the command line is a "non-discoverable" interface. If you don't know what you want, there's pretty much no way to find it.

Any suggestions how to fix that? Google solves a lot of them, but not all. Having said that, I've had fair success recently with just typing entire questions into it as if it were a programmer I'm talking to ...


I learned probably around 80% of the shell commands I use through apropos.

For example, if I'd been searching for the "pipe thingy" lbrandy's blog entry starts out searching for, I might have typed "apropos pipe", and lo and behold, there it is at the bottom of the list:

   ...
   pipe(2)                  - create descriptor pair for interprocess communication
   pipe(8)                  - Postfix delivery to external command
   rotatelogs(8)            - Piped logging program to rotate Apache logs
   tee(1)                   - pipe fitting
"pipe fitting" isn't the most useful description, but given how few other things there are, it's not too many man pages to read, especially after culling out the things I already knew about back when I didn't yet know about tee(1).


On my machine (debian lenny) it's listed as:

  tee (2) - duplicating pipe content
tee (1) does not show up in the search for pipe, but it's description is:

  tee (1) - read from standard input and write to standard output and files


& either runs the job in the background once (and relies on your to manually schedule), or runs every job simultaneously and kills the box. What you need is a queue. Hence, the tool.


I can run a set # of jobs at a time, working through the list until the set is exhausted - a simple pool of 'workers'.

It buffers child process output and emits it when the child completes - this keeps concurrent process output from being as confusing.

Those were features I wanted at the time and '&' with wait didn't provide them.


Ah, simply saying "parallelize" put me on the wrong track. I didn't realize you wanted to limit the number running simultaneously. Cool you discovered that xargs can do it - I didn't know that.

I have, however, done this before. I have a "load_wait" command which waits for machine load to drop below a certain level before launching the command. A simple "sleep 1" between instances of that then lets me launch thousands of commands without killing the machine.

    sleep 1 ; load_wait 3 <command1> &
    sleep 1 ; load_wait 3 <command2> &
    sleep 1 ; load_wait 3 <command3> &
    sleep 1 ; load_wait 3 <command4> &
     ...
I then launch those via "system" from inside awk and I'm done. The commands tend to be in muscle memory.


"Google solves a lot of them, but not all."

Interesting that you would say that, as I have a friend who works for Google who would like to use machine learning techniques to make a better command line.

It also seems to me that Google's "one box" interface for Chrome, where pretty much no matter what you want to do you just start typing something in the address bar, is maybe the best approximation for what you would want a really good command line interface to look like.


phrase your processes as make rules. then use gmake -j - this will let you restart properly in the case of failures, as well.


I've had to run those kinds of jobs before and wanted exactly that kind of tool, not being able to find it at the time, I wrote one:

  http://asymmetrical-view.com/personal/code/perl/parallel-jobs.readme

  http://asymmetrical-view.com/personal/code/perl/parallel-jobs
Example 'command file': # specify several downloads to be run in parallel wget http://some.host/software.tar.gz wget http://some.host/database.mdb wget http://some.host/movie-trailer.mpeg wget http://some.host/linux-distrubtion.iso

Run as:

  parllel-jobs --cmdfile=file.cmd --maxjobs=4


It turns out you didn't have to write your own program. Sometimes I know there's somewhere a unix tool that does what I'm looking for, but writing it can be pretty fun.


Wow, xargs can parallelize commands? That's amazing, and could be extremely useful.


xargs + ssh = poor man's hadoop.


Everybody's jumping in with suggestions for a better command line. What about ways to know what you need to know when you need to know it?

Maybe we could figure out a thingy that watches what you're doing and finds ways to do it better.


Doing that well is difficult, and (worse still) when it breaks down, it's typically very irritating. There is a sort of uncanny valley of automated helpfulness, where sincere attempts to assist common interactions just end up feeling like Clippy to end users.


Exactly, I thought that was the point of the article. We need "indexing by meaning" - for unix utilities, for iPhone apps, for words.

For instance, I'm looking for an iPhone app that let's you take pictures of people, add names and details, and let's you browse them by picture. I bet something like it exists, but I can't find it.


Well, one could catalog a whole bunch of such situations in Terminal.app on OS X, just for example. Create a catalog of situations as described in the article, so that those entry patterns can be detected. A "suggestion" or "coach" app could then start bouncing in the Dock, letting you know that you can click on it and get a list of selections. (Or, you could just blow it off.)


Being watched feels kinda creepy. But being able to find the stuff you want is fun.

To a certain extent, this is somewhere where GUIs out-perform command lines - the hierarchical menu system that shows you the functionality available in a way that's fairly easily searchable. Maybe you could implement some sort of tree-structured help on the CLI?


They made such a thing, it's called "clippy" in MS office. Wasn't very popular...

"I see you're trying to create an implementation of half of common lisp."


This is why they invented perl.

  #!/bin/perl
  use Parallel::ForkManager;
  $pm = new Parallel::ForkManager(4); 
  while ($filename = <*.jpg>) {
    $filename =~ s/.jpg//;
    $pm->start and next;
    system("convert $filename.jpg -resize 320x240 $filename-sm.jpg");
    $pm->finish;
  }


So much more readable too.


A sysadmin friend of mine swears by xapply - although it is largely a BSD-only application (and tricky to build on linux, to say the least) it appears to have been the inspiration for xargs's -P option.


See also the Related Work for shmux: http://web.taranis.org/shmux/


awk? cut.


Every time I go to use awk for something I always think, 'I feel like I would be able to do this with cut so much more easily, but I also know that I always feel like that, but am never able to get cut to actually do what I want.' Despite feeling that way, I usually optimistically go spend 30 minutes trying to get cut to do something simple like give me a specific column from the output of ps aux. But it's just 30 minutes of banging my head against the keyboard, and it always reinforces my original strategy, which is to use awk for everything and never, ever use cut.


If your issue with cut is what I susect, you need to flatten the whitespace first with sed.

At that point, I'd just use awk, though. It has better defaults for field delimiters.


> [...] you need to flatten the whitespace first with sed

by "flatten the whitespace" do you mean turning many spaces into one space? If so, I have always used tr with the -s (squeeze) option for that:

    $ echo "one       two     three" | tr -s " " | cut -d " " -f 3
    three


Didn't know about that option, I always did it with sed. Thanks! :)


Apparently, it doesn't reinforce that strategy enough ;-)


Haha, you’re right. The problem is that I really want cut to just do what it sounds like it should do based on its name! Sigh…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: