0xDECAFBAD

It's all spinning wheels and self-doubt until the first pot of coffee.

Building Pipelines with Web Services

So on this day last year, I was excitely thinking about pipelining webservices together like commands in a UNIX command line shell. Lately, I've been doing quite a bit of work at the command line level, more so than I ever have before. And for all the clunkiness and inelegances to be found there, I think the zen has stuck me.



Sure, it's an ass-ugly string of characters that connects commands like find, sort, awk, sed, grep, and ssh together. But, in constructing such monstrosities, I find myself generating new disposable tools at a rate of at least one every minute or so. And, though a few have found themselves graduating into fuller, cleaner, more general tools, I would have been stuck for hours were it not for a quick multi-file grep across a vast plain of comma-separated value files digested by a tag team of sed and awk. Then, like magic, I toss in an incredibly slow yet, at the time, convenient call to mysql on another server behind a firewall via ssh with a SQL call constructed from the regurgitations of said sed and awk brothers.



So, I'm thinking again: How hot would this be if it were web services replacing each of my commands? How hot would it be if there was a STDIN, STDOUT, and STDERR for a whole class of web services? Imagine an enhanced bash or zsh piping these beasts together. For awhile, I thought my XmlRpcFilteringPipe API was the way to go, but lately I've been thinking more in the direction of REST. I have to admit that the XML-RPC API is a bit clunky to use, and besides, no one's really paid it much notice besides using it in the peculiar fashion I do to make my WeblogWithWiki.



How about this for a simpler API: Post data to a URL, receive data in response. There's your STDIN and STDOUT. What about STDERR? Well, I suppose it's an either-or affair, but standard HTTP header error codes can fill in there. What about command line arguments? Use query parameters on the URL to which you're posting. This all seems very web-natural.



Now I just have to write a shell that treats URLs as executable commands.

shortname=web_service_pipelines

Archived Comments

  • What would really be beautifil about this is if you could give it a config file (or somesuch) that maps a url to a command name, so the actual commands you enter wouldn't have to look, or even feel, like urls. I say the more it feels like a regular command line, the better.
  • Hi This is something I was talking to some friends about last year at OSCON. There's a lot of mileage in it, I think. I personally have been doing this sort of thing on and off since then, mostly using off the shelf tools such as wget, curl, GET and POST (the latter two from the Perl LWP bundle; you can do PUT with the POST command too). The one question that remains is how to convey the HTTP status code. I've wrapped the call in a script to put the HTTP status code and headers to STDERR, and any HTTP body to STDOUT, but that's not perfect (what about multipart HTTP bodies, for example, with each part having their own HTTP headers)... In all though it's extremely useful. One thing that this sort of approach does for the mind is dispel the myth that REST is bound in any way to XML (of course it's not, but some people seem to think it's XML specific). Use GET (the command) to GET a representation of some data that's plain text. It comes streaming down STDOUT as perfect fodder for your array of shell tools (grep, sort, etc). Lovely.
  • Great post - I thouroughly agree. Check out some of the tools from Propylon (http://www.propylon.com). Also, Sun submitted a note to the w3c on xml pipeline definitions (http://www.w3.org/TR/xml-pipeline/).
  • Check out AxKit (http://www.axkit.org/). It has XML pipelines that use Providers (which produce XML or XML Infosets) as the source and uses XPATH, XSLT, etc. to process the data. I'm working on an RSS provider, for example, as part of my Weblog software. (The data doesn't /have/ to be XML, but using XML as the interchange format makes things easier.)