<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Further musings toward smarter aggregators</title>
	<atom:link href="http://decafbad.com/blog/2004/12/07/further-smart-aggregator-musings/feed" rel="self" type="application/rss+xml" />
	<link>http://decafbad.com/blog/2004/12/07/further-smart-aggregator-musings</link>
	<description>It's all spinning wheels and self-doubt until the first pot of coffee.</description>
	<pubDate>Mon, 13 Oct 2008 22:50:29 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7-hemorrhage</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Ian Bicking</title>
		<link>http://decafbad.com/blog/2004/12/07/further-smart-aggregator-musings#comment-1418</link>
		<dc:creator>Ian Bicking</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=573#comment-1418</guid>
		<description>&lt;p&gt;There's a package called Reverend out there, which is a Bayesian filter not specialized for spam.  It might be worth looking at.&lt;/p&gt;

&lt;p&gt;Also, in training the system should tell you what it thinks you'll think, and then you'll correct it if it's wrong.  Without this you can overtrain, creating a continual positive feedback, where all you want is a corrective feedback.&lt;/p&gt;

&lt;p&gt;OTOH, there's rating systems, like (I think) on Amazon.  There they correlate your ratings to  other people's ratings.  There you can usefully rate anything, because determining your correlated users is separate from the rating it actually presents you -- it's not determining your preference, simply your demographic.  Bloglines could do this (and they try a little), but you couldn't in isolation.  Anyway, some ideas.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>There&#8217;s a package called Reverend out there, which is a Bayesian filter not specialized for spam.  It might be worth looking at.</p>
<p>Also, in training the system should tell you what it thinks you&#8217;ll think, and then you&#8217;ll correct it if it&#8217;s wrong.  Without this you can overtrain, creating a continual positive feedback, where all you want is a corrective feedback.</p>
<p>OTOH, there&#8217;s rating systems, like (I think) on Amazon.  There they correlate your ratings to  other people&#8217;s ratings.  There you can usefully rate anything, because determining your correlated users is separate from the rating it actually presents you &#8212; it&#8217;s not determining your preference, simply your demographic.  Bloglines could do this (and they try a little), but you couldn&#8217;t in isolation.  Anyway, some ideas.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gordon Weakliem</title>
		<link>http://decafbad.com/blog/2004/12/07/further-smart-aggregator-musings#comment-1419</link>
		<dc:creator>Gordon Weakliem</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=573#comment-1419</guid>
		<description>&lt;p&gt;Have you looked at AmphetaRate (http://amphetarate.sf.net)?  They claim to be using Bayesian training.  I've been using it for about a week, there's a number of things I don't like about it, but it's an interesting system.  It's Perl / PHP, FWIW.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Have you looked at AmphetaRate (http://amphetarate.sf.net)?  They claim to be using Bayesian training.  I&#8217;ve been using it for about a week, there&#8217;s a number of things I don&#8217;t like about it, but it&#8217;s an interesting system.  It&#8217;s Perl / PHP, FWIW.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brett Morgan</title>
		<link>http://decafbad.com/blog/2004/12/07/further-smart-aggregator-musings#comment-1420</link>
		<dc:creator>Brett Morgan</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=573#comment-1420</guid>
		<description>&lt;p&gt;Where to start. Hmmm. The problems with bayes spam/ham sorting for news readers is down to the simplistic stats. The most simplistic statistical measure to take is on individual words.&lt;/p&gt;

&lt;p&gt;For normal email spam/ham there are (or moreover, were) words that stood out as good/bad markers. But in something like an rss feed where you have already selected feeds that are reasonably close to your interest set, the individual words are no longer useful for sorting interesting vs yawn. &lt;/p&gt;

&lt;p&gt;As you can probably tell, I had a go at doing this for an AI course I did last semester, and I wound up with a furball that didn't actually fly. I used python and wxpy to build an aggregator around an IE control with good/bad buttons for marking, my word distributions didn't wind up correlating. &lt;/p&gt;

&lt;p&gt;Is your aggregator open for hacking? I have a few theories that I'd like to testbed, and now owning a mac, my old code is nigh on useless :-)&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Where to start. Hmmm. The problems with bayes spam/ham sorting for news readers is down to the simplistic stats. The most simplistic statistical measure to take is on individual words.</p>
<p>For normal email spam/ham there are (or moreover, were) words that stood out as good/bad markers. But in something like an rss feed where you have already selected feeds that are reasonably close to your interest set, the individual words are no longer useful for sorting interesting vs yawn. </p>
<p>As you can probably tell, I had a go at doing this for an AI course I did last semester, and I wound up with a furball that didn&#8217;t actually fly. I used python and wxpy to build an aggregator around an IE control with good/bad buttons for marking, my word distributions didn&#8217;t wind up correlating. </p>
<p>Is your aggregator open for hacking? I have a few theories that I&#8217;d like to testbed, and now owning a mac, my old code is nigh on useless :-)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Srijith</title>
		<link>http://decafbad.com/blog/2004/12/07/further-smart-aggregator-musings#comment-1421</link>
		<dc:creator>Srijith</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=573#comment-1421</guid>
		<description>&lt;p&gt;Machine learning is not within my list of "do-know-something" areas, but I could suggest a bit of commensense logic to the system to learn from uninterested items. The fact that an item is uninteresting is denoted by the propertiy that you have not clicked on it. While one may not be able to deduce much pattern from a single item as such, it could be used to deduce useful information about the metadata associated with the item. For example, I found that I was reading less and less of items from blogger X's feed. There were days at a stretch where I would not find anything written by X interesting enough. If a cron job was run that could look through the last time I read any of X's item, it could have noticed this trend and thus lowered X's score, thus pushing his item below my "casual-glance-view". While doing this alone would cause one to miss a very interesting item  that X could write in a blue moon, if one uses other metadata associated with the feed item (like keyworkds etc.), it could be brought within my "casual-glance-view".&lt;/p&gt;

&lt;p&gt;Intellie-Aggie tries to use these logics to filter out items, but I have left the codes untouched for far too long to expect any wonders.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Machine learning is not within my list of &#8220;do-know-something&#8221; areas, but I could suggest a bit of commensense logic to the system to learn from uninterested items. The fact that an item is uninteresting is denoted by the propertiy that you have not clicked on it. While one may not be able to deduce much pattern from a single item as such, it could be used to deduce useful information about the metadata associated with the item. For example, I found that I was reading less and less of items from blogger X&#8217;s feed. There were days at a stretch where I would not find anything written by X interesting enough. If a cron job was run that could look through the last time I read any of X&#8217;s item, it could have noticed this trend and thus lowered X&#8217;s score, thus pushing his item below my &#8220;casual-glance-view&#8221;. While doing this alone would cause one to miss a very interesting item  that X could write in a blue moon, if one uses other metadata associated with the feed item (like keyworkds etc.), it could be brought within my &#8220;casual-glance-view&#8221;.</p>
<p>Intellie-Aggie tries to use these logics to filter out items, but I have left the codes untouched for far too long to expect any wonders.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: drunkenbatman</title>
		<link>http://decafbad.com/blog/2004/12/07/further-smart-aggregator-musings#comment-1422</link>
		<dc:creator>drunkenbatman</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=573#comment-1422</guid>
		<description>&lt;p&gt;I'm hoping that aggregators will get smarter too, because due to the info-glut RSS enables the computer is just going to have to get smarter about having a guess as to what we're going to be interested in.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I&#8217;m hoping that aggregators will get smarter too, because due to the info-glut RSS enables the computer is just going to have to get smarter about having a guess as to what we&#8217;re going to be interested in.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ade</title>
		<link>http://decafbad.com/blog/2004/12/07/further-smart-aggregator-musings#comment-1423</link>
		<dc:creator>ade</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=573#comment-1423</guid>
		<description>&lt;p&gt;Take a look at my Aggrevator: http://www.oshineye.com/software/aggrevator.html
which uses scoring to enable users to deal with very large numbers of feeds. I went with this approach after running into re-calculation problems with using Bayesian analysis for ranking entries: http://www.advogato.org/person/ade/diary.html?start=11&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Take a look at my Aggrevator: <a href="http://www.oshineye.com/software/aggrevator.html" rel="nofollow">http://www.oshineye.com/software/aggrevator.html</a><br />
which uses scoring to enable users to deal with very large numbers of feeds. I went with this approach after running into re-calculation problems with using Bayesian analysis for ranking entries: <a href="http://www.advogato.org/person/ade/diary.html?start=11" rel="nofollow">http://www.advogato.org/person/ade/diary.html?start=11</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
