0xDECAFBAD

It's all spinning wheels and self-doubt until the first pot of coffee.

Syndicating Whuffie

... there's excellent knowledge in blogs if only we had the tools to extract it.

What sort of tools? Relevance and reputation based feeds and aggregators for one. The problem of quickly finding what's good from among the great muck of the blogosphere is, if you ask me, a far more urgent problem than seeing the correct authorship or harmonizing dc:date and pubDate before I even read the thing.

... facilitate P2P trading of RSS from desktop to desktop as well as server to desktop -- you subscribe to 1000 feeds, aggregate them, rate them (explicitly or by statistical filtering based on past use patterns) and then rebroadcast your new rated feed. Aggregators could then /use/ redundant items from feedback loops because each RSS source has a reputation rating that weights the contained individual item ranking; repeated items add their rankings.


Yes. This is it. This is what I want to see come next from aggregators and blogs and syndication and all this mess. It's what I've been tinkering with in small steps for most of a year. It's what I intend BookmarkBlogger to facilitate, as well as AmphetaOutlines and the homebrew aggregator I'm hacking around with right now.



At first thought, I'm not sure whether or not building and republishing RSS (or Echo) feeds is where it's at. But, the more I think about it, the more it seems perfectly elegant to me. All the elements are there, except for an extension to capture ratings. Extend aggregators to consume these rating-enriched feeds, and instead of just spooling the items up into your view, extract and assimilate the ratings into a growing matrix of rater versus rated. Apply all the various algorithms to correlate your rating history with that of others to whose ratings you subscribe. Mix in a little Bayes along with other machine learning.



As for the interface... well, that's a toughie. At present, I think I could sneak ratings into my daily routine by monitoring my BookmarkBlogger use and watching the disclosure triangle clicks and link visits in my AmphetaOutlines based news aggregator. I could easily see adding an iTunes-like 5-star rating interface, but unless I get some pretty significant payoff from painstakingly rating things, I'll never use it. At least in iTunes, I get to have playlists of my faves automatically jumbled together, if I remember to use the ratings in the moment.



The cool thing will be when sites like Technorati and Feedster start using these ratings, but the even cooler thing is when all that's on my desktop. This could be easy, though, couldn't it? What do we call it, Syndicated Whuffie?



(Which reminds me: Eventually, we really gotta get back to the subscription problem. All these agents polling files everywhere will get to be nasty. Obviously. This has been talked about already, but little has happened. We need some ?PubSub, maybe some caches and concentrators. All stuff that's been mentioned in passing before, and left by the wayside as unsexy.)

shortname=syndicated_whuffie

Archived Comments

  • You might want to take a look at parss: http://weblogs.at/parss
  • Something else that keeps cropping up and being left by the wayside a lot these days: Bayesian Filtering for RSS feeds. ;)
  • Something else that keeps cropping up and being left by the wayside a lot these days: Bayesian Filtering for RSS feeds. ;)
  • I've been pontificating about this stuff on and off on my own weblog lately... nice to know that others are thinking about it too. :)
  • The best rating systems are the ones that are a side effect of "normal" behaviour. Maybe you should just give a +1 to any item and/or feed that you click through in the aggregator to view in the parent website. this is much more likely to generate meaningful metadata than relying on human rating. Re. Republishing feeds, I already republish RSS for the composite topics on Ecademy. eg WiFi weblogs = http://wifi.ecademy.com/module.php?mod=import&op=rssbundle&id=14 Interestingly, I read the feed into a database cleaning it up in the process. The re-published feed is read out of the database so the items are not necessarily what went in.
  • I'm with Julian. I don't see a huge value here over (a) using a BlogRoll as rating for sites/channels, and (b) individual links as rating for specific items (not good/bad, but worth of comment, which seems more important for this context).
  • I'm with Julian. I don't see a huge value here over (a) using a BlogRoll as rating for sites/channels, and (b) individual links as rating for specific items (not good/bad, but worth of comment, which seems more important for this context).
  • I think this is very important stuff. re. ratings systems - a mix of implicit (comments, links) and explicit (rate this!) is probably what's needed. There are also a lot of possibilities for personal knowledge bases, which could also help in the filtering processes. re. republishing - yep, there's a lot more we can do here : read feed, *process*, republish re. P2P - I don't think there's even a mark on the surface yet... (of course I'm playing with all of these in IdeaGraph ;-)
  • I too have been working on something similar. I call it Intelli-Aggie (name subject to change!). It is what I would call a "content sensitive" RSS aggregator. Instead of learning from your friends's recommendations etc, the system tries to learn from your reading habits, your preferences and interests etc. Just before I came across this entry, I had justed posted an example of the generated news pages. If you are interested, do take a look at http://www.srijith.net/trinetre/archives/2003/07/04/index.shtml#000290 The initial thoughts are scribbled at http://www.srijith.net/trinetre/archives/2003/05/12/index.shtml#000230
  • This is the primary goal of the NewsMonster project actually. The biggest problem right now is getting past Metcalfe's law. All the code is done including the trust metric and the implicit/explicit certification data (ratings) but we don't have enough active users. We are working on leveraging the full trust network via a more expansive transitive trust algoritm but it will be a week or so before I can get that code in and I want to ship 1.0 by monday. The biggest problem right now is that the project has very lofty goals. Most of our users are worried about more daily issue where I want to focus on the reputation components. It is all good though. It will happen soon enough.