On Ignorant Feed Handling
By way of Robert Sayre, I read that Dave Orchard - no relation - wrote:
<ul> <li> <span><i>...In many applications, the software that gets an extension isn't the last piece. So what does it mean for it to ignore the extra content? Should it throw it away? Should it keep it but not fault? I'll call these two models the "Ignore and Discard" and the "Ignore but Retain" models.</i></span> </li> </ul>
My little projects FeedMagick and FeedSpool are attempts to follow the "Ignore but Retain" model for RSS and Atom feed processing. Since both of these applications are decidedly not at the end of the chain, they both strive to do as little harm as possible to the feed content they process while still usefully filtering and slice/dicing it.
I've called these both "ignorant" feed handlers. Rather than disassembling everything from a feed into local data structure, munging in that form, and then reassembing into a feed - these handlers work surgically at the XML level, juggling and slinging elements around, really aware only of feed/item boundaries and the occasional tag found inside an item/entry. Everything else - be it namespaced extensions attributes or elements or semi-sapient arrangements of whitespace - gets preserved in the output for applications down the line. (Well, actually, I think I mangle whitespace, but I'm working on that.)
Because feeds are XML, after all, there's no reason not to work at the basic XML level when you're building a filter or a front-end feed API. Making a transition from XML into local programming environment structures and idiom exposes all sorts of impedance mismatches and assumes perfect knowledge of what to expect in the universe of feeds your app will process. On the other hand, supporting the basics of XML allows you to support pretty much any arbitrary structure of elements blindly, knowing only about a few select tags like feeds and items.
Update: In comments, Dave Johnson mentions: "it’s important to note that the [Microsoft] Feeds API does preserve that which it does not understand (such as iTunes, GeoRSS, extensions you add, etc.)" Thanks for the correction! I thought I'd read that, but jumped the gun in assuming that that was not the case! I really need to get a Windows machine and play with this API myself.
When I first read about the new feed API coming from Microsoft, I had good hopes for it as a Winsock-like universal handler for feeds for Windows apps. And, I'm sure it will be used that way - but unlike Winsock, it appears that the MS feed API mangles data on the way through. This will end up being a dead end for the growth of feed formats, where Winsock was an enabler for future unknown internet applications. The point is, just as you can build a library to handle internet traffic that doesn't care about the content of what it receives - you can do the same with syndication feeds as XML data.