<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Then each went to his own home &#187; Statistics</title>
	<atom:link href="http://www.pui.ch/phred/archives/category/statistics/feed" rel="self" type="application/rss+xml" />
	<link>http://www.pui.ch/phred</link>
	<description>Philipp Kellers weblog</description>
	<lastBuildDate>Tue, 17 Aug 2010 19:58:15 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Delicious statistics</title>
		<link>http://www.pui.ch/phred/archives/2005/12/delicious-statistics.html</link>
		<comments>http://www.pui.ch/phred/archives/2005/12/delicious-statistics.html#comments</comments>
		<pubDate>Fri, 23 Dec 2005 21:09:08 +0000</pubDate>
		<dc:creator>Philipp Keller</dc:creator>
				<category><![CDATA[Del.icio.us]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.pui.ch/phred/?p=36</guid>
		<description><![CDATA[Statistics is a broad mathematical discipline which studies ways to collect, summarize and draw conclusions from data. [Wikipedia]
Statistics help us to draw conclusions from data. In a way this whole tagging thing just popped up and now we are trying to figure out what really is happening. I think statistics can help us to understand [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>Statistics is a broad mathematical discipline which studies ways to collect, summarize and draw conclusions from data. [<a href="http://en.wikipedia.org/wiki/Stats">Wikipedia</a>]</p></blockquote>
<p>Statistics help us to draw conclusions from data. In a way this whole tagging thing just popped up and now we are trying to figure out what really is happening. I think statistics can help us to understand tags.</p>
<p>When I did set up my <a href="http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html">performance test</a> system I wanted to know the metrics of <a href="http://del.icio.us/">delicious</a> so I did <a href="http://www.pui.ch/phred/archives/2005/05/delicious-statistics-that-is-extrapolation.html">try to extrapolate some hand collected data</a> but it didn&#8217;t turn out that well.</p>
<p>After that I started collecting post data from del.icio.us and am happy to announce that I&#8217;ve set up <a href="http://deli.ckoma.net/stats">a site with delicious statistics</a> that is fully automated (my hands can rest now..). There are trends about number of posts per day as well as numbers of tags per post.<br />
<span id="more-36"></span></p>
<div class="caption"><img src="/phred/modules/overall.png" alt="overall post trend" title="overall post trend"/><br />
Overall bookmark post trend</div>
<p>The stats are based on data I extract <a href="http://del.icio.us/rss/">from the most recent posts feed</a>, which I&#8217;m grabbing 6 times an hour (I&#8217;m trying to not be evil: No screen scraping, no grabbing each minute.) I miss a big portion of the posts (actually I record just about 10% of the data) but I guess the stats are precice enough to draw some conclusions.</p>
<h2>Why statistics?</h2>
<p>I&#8217;m fond of del.icio.us (as you may know) and when I&#8217;m fond of a website I urge to know how many people are using it, if the service is attracting or scaring away folk, I feel a need to know what&#8217;s up. Especially after delicious has been acquired by Yahoo, you may ask &#8220;do people stay?&#8221;.</p>
<p>Anyway, that&#8217;s not the only cause for stats. When I set up the<a href="http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html"> performance tests</a> I wanted to have real numbers. <a href="http://lists.del.icio.us/pipermail/discuss/2005-April/003002.html">I also asked</a> on the <a href="http://lists.del.icio.us/mailman/listinfo/discuss">delicious mailing list</a>. That same question was <a href="http://lists.del.icio.us/pipermail/discuss/2005-May/003225.html">asked</a> <a href="http://lists.del.icio.us/pipermail/discuss/2005-September/004012.html">a few times</a>, but no answers..<br />
Now my stats don&#8217;t answer all question. If you&#8217;re asking yourself &#8220;how many inserts has my tag system to scope with if it gets really big&#8221; these will help you. But I cannot do any query-stats, maybe <a href="http://www.alexa.com/data/details/traffic_details?&#038;compare_sites=&#038;y=r&#038;q=&#038;url=del.icio.us">alexa may give you some query trends</a> (maybe you subtract my number from alexas and will get the query stats?).</p>
<h2>First impressions</h2>
<p>From the stats you can see the two downtimes of delicious since August.</p>
<div class="caption"><img src="/phred/modules/downtime2.png" alt="del.icio.us downtime august" title="del.icio.us downtime august" /><br />
del.icio.us downtime august
</div>
<div class="caption"><img src="/phred/modules/downtime1.png" alt="del.icio.us downtime december" title="del.icio.us downtime december" /><br />
del.icio.us downtime december
</div>
<p>You also see that the recent growth of del.icio.us merely started in december. I think it has got to do with the more elaborated look and feel (changed in the middle of november) as well as with the new firefox plugin that does give a more professional touch to the service. This grow is a thank you to Joshua and this team.</p>
<p>Then, take a look at the &#8220;tag hump&#8221; at 10 tags per posts:</p>
<div class="caption"><img src="/phred/modules/tags_december.png" alt="tag distribution december" title="tag distribution december"/><br />
del.icio.us tag distribution december</div>
<p>My first quick investigations show that this is caused by &#8211; you guess it &#8211; tag spammers.<br />
I found <a href="http://del.icio.us/software.download">two</a> <a href="http://del.icio.us/dave77">spammers</a> that constantly post bookmarks with 10 tags (look out, the first link has got chinese characters in it, my firefox slowed down big time). This shows that stats can help finding anomalies such as spam.</p>
<p>I also thought that maybe the <a href="http://ejohn.org/apps/sheep/">lazy sheep bookmarklet</a> can cause such humps but, by default, lazy sheep&#8217;s posts have a maximum of 6 tags. There&#8217;s no irregularity at &#8220;6&#8243; so I guess lazy sheep doen&#8217;t have a big influence (which is a fact I&#8217;m quite happy with).</p>
<p>I think it will be interesting to observe these tag graphs when the bookmark post user interface changes. I believe the interface plays a big role in how people tag and this sort of graphs could prove that.</p>
<h2>Further improvements</h2>
<p>I may give statistics about the number of estimated users (currently tracked: 100k) and number of bookmarks (currently tracked: 500k) but I&#8217;m not yet sure how I can compute numbers that seem accurate.<br />
I plan to come up with a few other del.icio.us services such as tag clusters but I&#8217;m not yet sure if that project comes to an end so I&#8217;ve decided to put up the stats so you&#8217;ll have at least this.. :-)</p>
<h2>Hold on, that&#8217;s too much del.icio.us for me</h2>
<blockquote><p>Uh, all this talk about del.icio.us is too much [<a href="http://blog.simpy.com/blojsom/blog/2005/12/14/Del-icio-us-Kaput.html">Otis</a>]</p></blockquote>
<p>Yeah, you are right. The point is that this stats can be computed from all tagging-powered webservices that serve a &#8220;most recent posts&#8221; feed. If you&#8217;re interested to have a stas on a different service or you want to do del.icio.us stats by your own just leave a comment. If there is enough request, I&#8217;ll comment&#038;refactor the code and will publish it as LGPL.</p>
<h2>Comparing to other services</h2>
<h3>Del.icio.us vs. Yahoo MyWeb 2.0</h3>
<p>Dorrian Porter has <a href="http://dorrianporter.typepad.com/silicon_valley_himalayan_/2005/10/lackluster_grow.html">tracked the number of posts of Yahoo&#8217;s MyWeb2.0</a>:</p>
<div class="caption"><a href="http://dorrianporter.typepad.com/silicon_valley_himalayan_/2005/10/lackluster_grow.html"><img src="/phred/modules/yahoo_posts_per_week.jpg"/></a><br />posts per week on Yahoo&#8217;s MyWeb2.0 (graphic by <a href="http://dorrianporter.typepad.com/silicon_valley_himalayan_/2005/10/lackluster_grow.html">Dorrian Porter</a>)</div>
<blockquote><p>Newly saved pages have averaged between 10,000 to 20,000 per week</p></blockquote>
<p>These numbers are <strong>per week</strong>. Del.icio.us has got an average of about 55&#8242;000 posts per day! This means that right now the data base at del.icio.us grows about 20 times as fast as the one of Yahoo&#8217;s MyWeb2.0. That leaves no question as to why they have aquired delicious.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pui.ch/phred/archives/2005/12/delicious-statistics.html/feed</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Analyzing tag-connections</title>
		<link>http://www.pui.ch/phred/archives/2005/07/analyzing-tag-connections.html</link>
		<comments>http://www.pui.ch/phred/archives/2005/07/analyzing-tag-connections.html#comments</comments>
		<pubDate>Sun, 17 Jul 2005 18:03:43 +0000</pubDate>
		<dc:creator>Philipp Keller</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Del.icio.us]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Tags]]></category>

		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/07/analyzing-tag-connections.html</guid>
		<description><![CDATA[When you tag an item, for instance a bookmark, you give them different tags, for instance I tagged the bookmark for &#8220;How to Write More Clearly, Think More Clearly, and Learn Complex Material More Easily&#8221; (you know this link if you give attention to delicious popular.. :-)) with 
&#8220;writing&#8221;, &#8220;toread&#8221;, &#8220;productivity&#8221;, &#8220;language&#8221;
Now what instantially pops [...]]]></description>
			<content:encoded><![CDATA[<p>When you tag an item, for instance a bookmark, you give them different tags, for instance I tagged the bookmark for &#8220;<a href="http://www.ai.uga.edu/mc/WriteThinkLearn_files/frame.htm">How to Write More Clearly, Think More Clearly, and Learn Complex Material More Easily</a>&#8221; (you know this link if you give attention to <a href="http://del.icio.us/popular">delicious popular</a>.. :-)) with </p>
<blockquote><p>&#8220;writing&#8221;, &#8220;toread&#8221;, &#8220;productivity&#8221;, &#8220;language&#8221;</p></blockquote>
<p>Now what instantially pops into my mind is, that the tag &#8220;toread&#8221; is quite different from the other tags. In fact it is something I want to do with this bookmark further on. I name this type of tag &#8220;<strong>adjective</strong>&#8221; (I will come back to that name later on..). The other tags I consider as &#8220;<strong>categories</strong>&#8220;.<br />
Now you&#8217;ll probably say &#8220;ah, this is a rare exception&#8221;. This is not true. I often tag items with &#8220;blog&#8221; because it happens that the interesting page I found about my favourite hobby happens to be a blog. Therefore I named this type of tag as &#8220;adjective&#8221; as it is rather a description to the item than it is a category to it.<br />
Other tags used often as adjectives are &#8220;reference&#8221;, &#8220;tutorial&#8221;, &#8220;fun&#8221;, &#8220;cool&#8221;, &#8220;news&#8221;, &#8220;free&#8221;..<br />
<span id="more-33"></span><br />
Now this categorization is not very correct. Sometimes, I use &#8220;blog&#8221; not as a adjective. This is, if I want to bookmark a blog that has no content that interests me but it just looks good. Then, I&#8217;ll probably blog it as &#8220;design blog&#8221;. In that day when I redesign my blog, I want to search for all design blogs I tagged..<br />
You see: it lays all in the connection between those tags, not in the tags itself. This is IMO pretty important.</p>
<h2>What is that for?</h2>
<p></p>
<h3>Clusters</h3>
<p>You probably tried to cluster your bookmarks by using <a href="http://laurie.informatik.uni-bremen.de/clusty/">clusty</a>. What this service does: It tries to put your tags into separate clouds. You know the &#8220;<a href="http://lists.del.icio.us/pipermail/discuss/2005-March/002266.html">tag-bundles</a>&#8221; of delicious? This is something like a &#8220;auto-tag-bundle&#8221; feature. Try it out, if you not already did so and see the problems that arise..<br />
I think the key problem in this cluster-service lies in the fact that this service considers all connections (also the adjectives). But it shouldn&#8217;t do so! Adjectives aren&#8217;t tags I want in my clusters. Adjectives are spread all over my tags, so they should first be cut away from my &#8220;tag-tree&#8221; (the tree that is built out of your tag-connections you built by tagging bookmarks).</p>
<h3>Similar items</h3>
<p>This categorization is also important when you search for &#8220;similar&#8221; items of a bookmark. When I want to search for similar items of that &#8220;how to write more clearly&#8221;-article, I&#8217;ll search for &#8220;writing+productivity+language&#8221; and will leave out the &#8220;toread&#8221; tag (adjective).<br />
Probably this made you realize that categorizing tag-connections is an important task. </p>
<h3>Tag clouds</h3>
<p>Now there are those tag clouds. When I look at <a href="http://kevan.org/extispicious.cgi?name=phred">my taggloud</a> then the &#8220;biggest&#8221; tag is &#8220;resource&#8221;. Now tag clouds are here to easily find bookmarks (I never search my bookmarks for solely &#8220;resource&#8221;) or to have a map of your main interests (&#8220;what is your hobby?&#8221; &#8220;ah, I am a big fan of resources&#8221;.. :-) I am sure you were also annoyed by that. I want those adjective-tags cut away..!</p>
<h2>Synonyms</h2>
<p>Now back to some therory: There is a third type of tag-connections: Synonyms. &#8220;delicious&#8221; and &#8220;del.icio.us&#8221; are classic synonyms. But I consider &#8220;ruby&#8221; and &#8220;rails&#8221; as synonyms too (no, they aren&#8217;t synonyms but up to now they are used as synonyms). You type in the second tag just to be sure that you won&#8217;t search for the second and find nothing.. I don&#8217;t think this category is too important for the cluster-task but I just name it here because I&#8217;ll use it further on.</p>
<h2>Example</h2>
<p>Let&#8217;s go for an example.<br />
Lets consider tags that are connected to the tag &#8220;ajax&#8221;. I gathered some tag-connection-data from delicious (via its <a href="http://del.icio.us/rss/">rss-feed</a>). And I run a query on my statistical data. This is data gathered during the period of one week. It is not complete. But our experiment will work anyway:</p>
<table>
<thead>
<tr>
<td>tag-connection</td>
<td>weight</td>
<td>type</td>
</tr>
</thead>
<tbody>
<tr>
<td><strong>ajax-javascript</strong></td>
<td>234</td>
<td>synonym</td>
</tr>
<tr>
<td><strong>ajax-web</strong></td>
<td>105</td>
<td>category</td>
</tr>
<tr>
<td><strong>ajax-programming</strong></td>
<td>100</td>
<td>category</td>
</tr>
<tr>
<td><strong>ajax-xmlhttprequest</strong></td>
<td>52</td>
<td>synonym</td>
</tr>
<tr>
<td>ajax-css</td>
<td>51</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-design</td>
<td>46</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-php</td>
<td>44</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-development</td>
<td>36</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-xml</td>
<td>34</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-DHTML</td>
<td>33</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-webdev</td>
<td>33</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-webdesign</td>
<td>31</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-google</td>
<td>23</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-HTML</td>
<td>21</td>
<td>adjective</td>
</tr>
<tr>
<td>tutorial</td>
<td>14</td>
<td>adjective</td>
</tr>
</tbody>
</table>
<p>Column &#8220;tag-connection&#8221; is the tag connected to &#8220;ajax&#8221; (i.e. javascript), column &#8220;weight&#8221; depicts the number of times this connection occurred in a bookmark-post on delicious. The tags are ordered by weight. In column &#8220;type&#8221; you see the result of my computations for this tag-connection. Just to make it clear: These are all tags connected to tag &#8220;ajax&#8221; ordered number by occurrence of the connection. If a bookmark-post somebody did on delicious is tagged with &#8220;ajax&#8221; and &#8220;javascript&#8221; that gives one point for the &#8220;weight&#8221;-column for &#8220;ajax-javascript&#8221;.<br />
The outcome is quite good, I think (I must admit that I have taken the example that worked out best :-))<br />
There are some errors, sure: xml-ajax should be a &#8220;category&#8221;-type as well. But we are looking at the usage of these tags not their &#8220;real&#8221; meaning (whatever that is).</p>
<h2>Computation</h2>
<p></p>
<h3>Synonyms</h3>
<p>To compute these categorization I first went for the &#8220;synonyms&#8221;. The connection &#8220;ajax-javascript&#8221; is considered as synonym because &#8220;ajax-javascript&#8221; is &#8220;number one connection&#8221; of all connections where ajax is a part of. And when considering the connections of &#8220;javascript&#8221; (the &#8220;vice-versa-connection&#8221;), ajax is number two.<br />
I consider two tags as synonyms if &#8220;in one direction&#8221; the other tag is number one and in the other &#8220;direction&#8221; the other tag is in the top 10. I made up this rule because I think that in most cases there is one &#8220;stronger&#8221; synonym that is used most of the time when the &#8220;weaker&#8221; one is used. The fact that the tag &#8220;ajax&#8221; is mostly used with tag &#8220;javascript&#8221; could also mean that &#8220;javascript&#8221; is a supercategory of ajax (which it somehow is). To avoid that this sub-super-categogy-connections are considered as synonyms, we go sure that &#8220;ajax&#8221; is also important for &#8220;javascript&#8221; so ajax is not so sub to javascript.. I hope you can follow :-)</p>
<h3>Category/Adjective</h3>
<p>Then I compute the &#8220;category&#8221;. Lets put the values of the above table into a graph.<br />
<img src="/phred/modules/ajax_dist.png" alt="distribution of tags related to ajax" title="distribution of tags related to ajax"/><br />
On the x-axis you see the tags: The tick 1 stands for &#8220;web&#8221;, 2 for &#8220;programming&#8221;, 3 for &#8220;css&#8221;, 4=&#8221;design&#8221;, 5=&#8221;php&#8221; and so on. You see I removed the synonym-connections &#8220;ajax-javascript&#8221; and &#8220;ajax=xmlhttprequest&#8221; as I think they &#8220;disturb&#8221; the distribution.<br />
The y-axis depicts the weight of the connection: ajax-web has weight &#8220;105&#8243;, ajax-programming has weight &#8220;100&#8243; and so on.<br />
The black line is the &#8220;weight&#8221;-column of the table above, the red one is the first <a href="http://en.wikipedia.org/wiki/Derivative">derivative</a>, the blue one the second derivative of the weight function.<br />
This graph makes it clear that &#8220;web&#8221; and &#8220;programming&#8221; are used quite often in combination with &#8220;ajax&#8221;, then, there is quite a &#8220;gap&#8221; followed by the &#8220;adjective tail&#8221;. I consider the &#8220;adjective tail&#8221; as connections to be categorized as &#8220;adjective&#8221;. The tags in this tail are used &#8220;out of context&#8221;: They don&#8217;t really belong to the &#8220;ajax-cluster&#8221;. They sometimes occur together with ajax, but just sometimes. Mostly not. Therefore they are considered as &#8220;adjectives&#8221;.<br />
Now the task is to find this &#8220;gap&#8221;. In my experiments I tried to find the last gap. To find the last gap I started at the end of the tail and searched for the first peak of the first derivative (that is when the second derivative goes from positive to negative) and checked if the peak was high enough. If these to conditions were fulfilled, I snipped the connections into two parts the &#8220;pre-gap&#8221; connections (category) and the &#8220;post-gap&#8221; connections (adjective).<br />
The same computation has to be made for the &#8220;vice-versa&#8221; connection. I considered connections as &#8220;category&#8221; if one of both computations told that it is a &#8220;category&#8221;.</p>
<p><ins datetime="2005-07-18T15:43:36-02:00"></p>
<h2>Further processing: Ambiguous tags</h2>
<p>To achieve good clustering results, I think there is a need of checking if the tag is used in different ways. The prominent example hereof is &#8220;apple&#8221;. Now, when delicious is still restricted to the blogworld, it is clear that apple means Mac-apple. But in future this may change. To recognize if a tag is used in different environments, the algorithm would have to check the &#8220;neighbours of neighbours&#8221; (<a href="http://blog.pietrosperoni.it/2004/09/19/clustering-delicious-tags/">as suggested by Pietro Speroni</a>). That is for ajax: check if the neighbours of &#8220;javascript&#8221; are more or less the same as the neighbours of &#8220;web&#8221;. You see that it all lays in the connections between tags. The tag per se is not well-defined but the tag in connection with another tag defines it quite well. Therefore for clustering I&#8217;m proposing splitting up amiguous tags. That would add much more simplicity to the resulting clusters.</ins></p>
<h2>We are onto something</h2>
<p>I&#8217;m pretty sure we are onto something. I think this is direction it should go. Computations over tag-connection-distributions are cool. Users shouldn&#8217;t insert these infos when posting the bookmarks. Posting should stay easy. I&#8217;m not that sure about this &#8220;synonym&#8221;-computation but I think the &#8220;category&#8221;-computation turned out pretty good. I tried to build some clusters by hand just by considering the category and synonym-connections and I found a completely detached cluster consisting of the tags &#8220;cooking&#8221;, &#8220;health&#8221;, &#8220;recipes&#8221;, &#8220;diet&#8221; and &#8220;food&#8221;. As I said, I think we are onto something..</p>
<h2>Further reading</h2>
<ul>
<li><a href="http://www.rashmisinha.com/archives/05_02/tag-sorting.html">Building tag clusters by hand</a></li>
<li><a href="http://blog.pietrosperoni.it/2004/09/19/clustering-delicious-tags/">Pietro Speronis different approach to clustering tags (with java-mindmap-visualisation!)</a>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.pui.ch/phred/archives/2005/07/analyzing-tag-connections.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>del.icio.us statistics (that is: extrapolation)</title>
		<link>http://www.pui.ch/phred/archives/2005/05/delicious-statistics-that-is-extrapolation.html</link>
		<comments>http://www.pui.ch/phred/archives/2005/05/delicious-statistics-that-is-extrapolation.html#comments</comments>
		<pubDate>Wed, 18 May 2005 20:11:11 +0000</pubDate>
		<dc:creator>Philipp Keller</dc:creator>
				<category><![CDATA[Del.icio.us]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/05/delicious-statistics-that-is-extrapolation.html</guid>
		<description><![CDATA[Once upon a time I asked a question on the del.icio.us mailing list if Joshua could give me some stats of his bookmark webservice.
He didn&#8217;t. And so I thought I&#8217;d do some extrapolation to gain some stats.
And eventually you help me thinking so we together could have some collaborative stats.
Stats 2004
Exactly one year ago, on [...]]]></description>
			<content:encoded><![CDATA[<p>Once upon a time <a href="http://lists.del.icio.us/pipermail/discuss/2005-April/003002.html">I asked a question on the del.icio.us mailing list</a> if Joshua could give me some stats of <a href="http://del.icio.us">his bookmark webservice</a>.<br />
He didn&#8217;t. And so I thought I&#8217;d do some extrapolation to gain some stats.<br />
And eventually you help me thinking so we together could have some collaborative stats.<span id="more-31"></span></p>
<h2>Stats 2004</h2>
<p>Exactly one year ago, on 18.5.2004, Joshua told us: &#8220;There&#8217;s about 400k posts and 200k links.&#8221;</p>
<h2>Total number of posts/bookmarks</h2>
<p>Now, let&#8217;s see: According to a <a href="http://www.erickaakcire.net/delicious/pilot/two.htm">pilot study</a> there are about 90000 delicious users. <a href="http://www.erickaakcire.net/delicious/pilot/nine.htm">On their graph I assume that those 70 people have an average of 800 bookmaks</a>. If there are 90000 Users, say 10000 are using del.icio.us on this regular basis, then we have 9 million posts. That makes sense, I suppose.<br />
Let&#8217;s say each bookmark has got 2 posts (taken the bookmark/post ratio stayed the same as in 2004), that would make a total of 4.5 million bookmarks.</p>
<h2>Posts per tag</h2>
<p>Lets take the most popular tag on del.icio.us right now: design.<br />
Since 7.5., there has been 10000 posts that are tagged with &#8220;design&#8221;, 20000 since 25.4 and 30000 since 12.4. Down to a month that makes about 24000 posts.<br />
Lets say del.ico.us is already running 1 year at this rate, then we&#8217;d have 288000 posts tagged with &#8220;design&#8221;. That is: 3.2% of all posts on del.icio.us are tagged with the most popular tag &#8220;design&#8221;.</p>
<h2>My own tags</h2>
<p><img src='/phred/modules/bookmarks_per_tag.png' alt='Graph on my bookmarks-per-tag ratio' /><br />
My most used tag is &#8220;resource&#8221;. It occurs in 58 out of my 686 bookmarks. That makes an 8.4% occurrence. On the graph you see the distribution of my bookmarks-per-tag ratio, I&#8217;ve got 529 distinctive tags.</p>
<h2>Why all these stats?</h2>
<p>As I am still doing my performance tests <a href="http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html">on different tag schemas</a> and I want to have some real numbers so I can adjust my tests to a realworld example.</p>
<h2>Further reading</h2>
<p>If you are interested in stats on del.icio.us, then take a look at <a href="http://shirky.com/writings/ontology_overrated.html">Clay Shirys post</a>, he has got some graphs on tags distribution.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pui.ch/phred/archives/2005/05/delicious-statistics-that-is-extrapolation.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
