<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Dynamic polling times for news aggregators, II</title>
	<atom:link href="http://decafbad.com/blog/2003/09/29/dynamic-polling-freq-too/feed" rel="self" type="application/rss+xml" />
	<link>http://decafbad.com/blog/2003/09/29/dynamic-polling-freq-too</link>
	<description>It's all spinning wheels and self-doubt until the first pot of coffee.</description>
	<pubDate>Fri, 29 Aug 2008 12:17:37 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7-hemorrhage</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Bill Seitz</title>
		<link>http://decafbad.com/blog/2003/09/29/dynamic-polling-freq-too#comment-1059</link>
		<dc:creator>Bill Seitz</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=484#comment-1059</guid>
		<description>&lt;p&gt;It might be good to step back and prioritize your goals. How important is quickly catching mid-day updates?&lt;/p&gt;

&lt;p&gt;I think looking at averages throws things off, considering how "clumpy" I'd guess most update frequencies are. Some thoughts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;probably makes sense to check each feed at least once per day&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;for each feed, look at the average time-of-day of its first-post-of-the-day. Actually, look at a distribution curve, and pick the time at which there's an 80% chance that the first post will have been made (if it's going to be made at all).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;check at that time; if no posting then check 12 hours later?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;if found posting at first check of day, then start that additive/multiplicative process&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;regardless of the state of that latter calculation, check the next morning at that time-of-first-post prediction&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Parallel idea:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;calc average time between posts like you're doing, but just over the window each day when posts are being made (e.g. 8am-11pm = 15 hrs, not 24 hrs).&lt;/li&gt;
&lt;/ul&gt;
</description>
		<content:encoded><![CDATA[<p>It might be good to step back and prioritize your goals. How important is quickly catching mid-day updates?</p>
<p>I think looking at averages throws things off, considering how "clumpy" I'd guess most update frequencies are. Some thoughts:</p>
<ul>
<li>
<p>probably makes sense to check each feed at least once per day</p>
</li>
<li>
<p>for each feed, look at the average time-of-day of its first-post-of-the-day. Actually, look at a distribution curve, and pick the time at which there's an 80% chance that the first post will have been made (if it's going to be made at all).</p>
</li>
<li>
<p>check at that time; if no posting then check 12 hours later?</p>
</li>
<li>
<p>if found posting at first check of day, then start that additive/multiplicative process</p>
</li>
<li>
<p>regardless of the state of that latter calculation, check the next morning at that time-of-first-post prediction</p>
</li>
</ul>
<p>Parallel idea:</p>
<ul>
<li>calc average time between posts like you're doing, but just over the window each day when posts are being made (e.g. 8am-11pm = 15 hrs, not 24 hrs).</li>
</ul>
]]></content:encoded>
	</item>
	<item>
		<title>By: l.m.orchard</title>
		<link>http://decafbad.com/blog/2003/09/29/dynamic-polling-freq-too#comment-1060</link>
		<dc:creator>l.m.orchard</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=484#comment-1060</guid>
		<description>&lt;p&gt;Ooh, good ideas!  I think a lot of this addresses some of the not-quite-yet thought out concerns I have with the simple averaging.  I'll have to poke around some more with this.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Ooh, good ideas!  I think a lot of this addresses some of the not-quite-yet thought out concerns I have with the simple averaging.  I'll have to poke around some more with this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gnomon</title>
		<link>http://decafbad.com/blog/2003/09/29/dynamic-polling-freq-too#comment-1061</link>
		<dc:creator>Gnomon</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=484#comment-1061</guid>
		<description>&lt;p&gt;I'm afraid that you've credited me with more politeness than I deserve! When I suggested AIMD, I meant that the time between polls should be subject to this scheduling system - that is, the poll interval should increase additively but decrease multiplicatively.&lt;/p&gt;

&lt;p&gt;This is not as polite as your interpretation, which definitely backs off very quickly. My reasoning went like this: weblog posts tend to clump, so the best indicator of an upcoming post is a new post...:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If there aren't any new posts, lengthen the check interval by a little bit and check again later; keep lengthening the check interval up to a certain limit.&lt;/li&gt;
&lt;li&gt;If there is a new post, then it's likely that another new post will follow soon, so substantially decrease the poll time (down to a certain limit) and check again soon.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach is less biased towards decreasing server load and more biased towards detecting quick clumps of updates, which seem to be the norm. I don't know any human webloggers who have such a predictable posting pattern that they are subject to statistical analysis  ;)&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I'm afraid that you've credited me with more politeness than I deserve! When I suggested AIMD, I meant that the time between polls should be subject to this scheduling system - that is, the poll interval should increase additively but decrease multiplicatively.</p>
<p>This is not as polite as your interpretation, which definitely backs off very quickly. My reasoning went like this: weblog posts tend to clump, so the best indicator of an upcoming post is a new post...:</p>
<ul>
<li>If there aren't any new posts, lengthen the check interval by a little bit and check again later; keep lengthening the check interval up to a certain limit.</li>
<li>If there is a new post, then it's likely that another new post will follow soon, so substantially decrease the poll time (down to a certain limit) and check again soon.</li>
</ul>
<p>This approach is less biased towards decreasing server load and more biased towards detecting quick clumps of updates, which seem to be the norm. I don't know any human webloggers who have such a predictable posting pattern that they are subject to statistical analysis  <img src='http://decafbad.com/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: l.m.orchard</title>
		<link>http://decafbad.com/blog/2003/09/29/dynamic-polling-freq-too#comment-1062</link>
		<dc:creator>l.m.orchard</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=484#comment-1062</guid>
		<description>&lt;p&gt;Oh, duh.  Heh, I've got it in reverse then.  Seems like jumping up / creeping back, now that you've explained it to me again, would better suit the posting styles of bloggers for sure!&lt;/p&gt;

&lt;p&gt;Thanks for correcting me!&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>Oh, duh.  Heh, I've got it in reverse then.  Seems like jumping up / creeping back, now that you've explained it to me again, would better suit the posting styles of bloggers for sure!</p>
<p>Thanks for correcting me!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Blake Winton</title>
		<link>http://decafbad.com/blog/2003/09/29/dynamic-polling-freq-too#comment-1063</link>
		<dc:creator>Blake Winton</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=484#comment-1063</guid>
		<description>&lt;p&gt;I was thinking along the same lines as Bill, in that you know that most webloggers have to sleep and eat sometime, so if you  could take advantage of this knowledge, you'ld be ahead of the game.&lt;/p&gt;

&lt;p&gt;My thoughts on how to do it would be to break the day into blocks of time (start with hours, say), and build a histogram of how many posts fit into each block.  Then, you could collapse the series of hours where nothing was posted into a big block, and split any hours where something was posted into two sections, to average out the number of posts per hour.  Then, if my theory is correct, you can poll once per block of time, and would have a reasonable chance of getting a new post.&lt;/p&gt;

&lt;p&gt;Some notable flaws with the algorithm:
1. It fails to account for whole days where there's nothing posted.  This might be overcome by having your initial blocks of time be the days of the week.
2. It fails to account for the relationship (or the average time) between posts.  So if someone posts a lot at 9:00 or 10:00, but never both at 9:00 and 10:00, my theory will still check the 10:00 time even if something was found at 9:00.  I can't think of a way to get around this, off the top of my head.&lt;/p&gt;

&lt;p&gt;Another way of thinking of this idea is pre-calculating a guess at the fall-off, based on previous posts.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I was thinking along the same lines as Bill, in that you know that most webloggers have to sleep and eat sometime, so if you  could take advantage of this knowledge, you'ld be ahead of the game.</p>
<p>My thoughts on how to do it would be to break the day into blocks of time (start with hours, say), and build a histogram of how many posts fit into each block.  Then, you could collapse the series of hours where nothing was posted into a big block, and split any hours where something was posted into two sections, to average out the number of posts per hour.  Then, if my theory is correct, you can poll once per block of time, and would have a reasonable chance of getting a new post.</p>
<p>Some notable flaws with the algorithm:<br />
1. It fails to account for whole days where there's nothing posted.  This might be overcome by having your initial blocks of time be the days of the week.<br />
2. It fails to account for the relationship (or the average time) between posts.  So if someone posts a lot at 9:00 or 10:00, but never both at 9:00 and 10:00, my theory will still check the 10:00 time even if something was found at 9:00.  I can't think of a way to get around this, off the top of my head.</p>
<p>Another way of thinking of this idea is pre-calculating a guess at the fall-off, based on previous posts.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: hebig.org/blog</title>
		<link>http://decafbad.com/blog/2003/09/29/dynamic-polling-freq-too#comment-1064</link>
		<dc:creator>hebig.org/blog</dc:creator>
		<pubDate>Tue, 30 Nov 1999 00:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.decafbad.com/blog/?p=484#comment-1064</guid>
		<description>&lt;p&gt;&lt;strong&gt;Quick Links, September 29&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Holy Moly: One Million Weblogs Tracked - "Technorati is currently tracking about 7,000 new weblogs per day" Linus Torvalds...&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p><strong>Quick Links, September 29</strong></p>
<p>Holy Moly: One Million Weblogs Tracked - "Technorati is currently tracking about 7,000 new weblogs per day" Linus Torvalds...</p>
]]></content:encoded>
	</item>
</channel>
</rss>
