0xDECAFBAD

It's all spinning wheels and self-doubt until the first pot of coffee.

On Ignorant Feed Handling

By way of Robert Sayre, I read that Dave Orchard - no relation - wrote:

 <ul>
 <li>
 <span><i>...In many applications, the software that gets an extension isn't the last piece. So what does it mean for it to ignore the extra content? Should it throw it away? Should it keep it but not fault? I'll call these two models the "Ignore and Discard" and the "Ignore but Retain" models.</i></span>
 </li>
 </ul>

My little projects FeedMagick and FeedSpool are attempts to follow the "Ignore but Retain" model for RSS and Atom feed processing. Since both of these applications are decidedly not at the end of the chain, they both strive to do as little harm as possible to the feed content they process while still usefully filtering and slice/dicing it.

I've called these both "ignorant" feed handlers. Rather than disassembling everything from a feed into local data structure, munging in that form, and then reassembing into a feed - these handlers work surgically at the XML level, juggling and slinging elements around, really aware only of feed/item boundaries and the occasional tag found inside an item/entry. Everything else - be it namespaced extensions attributes or elements or semi-sapient arrangements of whitespace - gets preserved in the output for applications down the line. (Well, actually, I think I mangle whitespace, but I'm working on that.)

Because feeds are XML, after all, there's no reason not to work at the basic XML level when you're building a filter or a front-end feed API. Making a transition from XML into local programming environment structures and idiom exposes all sorts of impedance mismatches and assumes perfect knowledge of what to expect in the universe of feeds your app will process. On the other hand, supporting the basics of XML allows you to support pretty much any arbitrary structure of elements blindly, knowing only about a few select tags like feeds and items.

Update: In comments, Dave Johnson mentions: "it’s important to note that the [Microsoft] Feeds API does preserve that which it does not understand (such as iTunes, GeoRSS, extensions you add, etc.)" Thanks for the correction! I thought I'd read that, but jumped the gun in assuming that that was not the case! I really need to get a Windows machine and play with this API myself.

When I first read about the new feed API coming from Microsoft, I had good hopes for it as a Winsock-like universal handler for feeds for Windows apps. And, I'm sure it will be used that way - but unlike Winsock, it appears that the MS feed API mangles data on the way through. This will end up being a dead end for the growth of feed formats, where Winsock was an enabler for future unknown internet applications. The point is, just as you can build a library to handle internet traffic that doesn't care about the content of what it receives - you can do the same with syndication feeds as XML data.

Archived Comments

  • I'd say FeedTools also falls into the "Ignore but Retain" model. But I actually do translate into local data structures. There are obvious limitations, yes, but since I also maintain the full original XML within the data structure, those limitations are largely defined by how much you care about what I chose to ignore. The only really big problem comes in when you hit the generation methods, because of course, generation on such a data structure will ignore any elements that I don't explicitly include. So you potentially get some silent data loss, though not unexpected data loss. Although, I did also supply hooks into the generation code so that additional generation code could be inserted on the fly, so you're once again only limitted by what you actually needed to support. The only person who really loses out is the guy down the line if you republish stuff. And 99% of the time, he's not going to care so long as you give him the important stuff.

  • The MS Feeds API does some weird things because it tries to normalize the commonly used funky RSS elements like , and to Microsoft Common Feed Format (based on RSS 2.0 and Atom 1.0). I do have some hope that they'll get that right in the end, but they're clearly not there yet. However, it's important to note that the Feeds API does preserve that which it does not understand (such as iTunes, GeoRSS, extensions you add, etc.).