<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Then each went to his own home &#187; Clustering</title>
	<atom:link href="http://www.pui.ch/phred/archives/category/tags/clustering/feed" rel="self" type="application/rss+xml" />
	<link>http://www.pui.ch/phred</link>
	<description>Philipp Kellers weblog</description>
	<lastBuildDate>Tue, 17 Aug 2010 19:58:15 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Improving navigation in tag spaces</title>
		<link>http://www.pui.ch/phred/archives/2007/06/improving-navigation-in-tag-spaces.html</link>
		<comments>http://www.pui.ch/phred/archives/2007/06/improving-navigation-in-tag-spaces.html#comments</comments>
		<pubDate>Thu, 21 Jun 2007 19:46:05 +0000</pubDate>
		<dc:creator>Philipp Keller</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[History]]></category>
		<category><![CDATA[Tags]]></category>

		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2007/06/improving-navigation-in-tag-spaces.html</guid>
		<description><![CDATA[In beginning of May at webtuesday, I gave a presentation about the current problems with tags and what could be done to improve that situation.
Corsin was kind enough to record the presentation (thanks a lot for that!). I&#8217;m not completely happy with the presentation &#8211; especially the part about tag history was way too long. [...]]]></description>
			<content:encoded><![CDATA[<p>In beginning of May <a href="http://webtuesday.ch/meetings/20070508">at webtuesday, I gave a presentation</a> about the current problems with tags and what could be done to improve that situation.<br />
<a href="http://cocaman.ch/">Corsin was kind enough</a> to record the presentation (thanks a lot for that!). I&#8217;m not completely happy with the presentation &#8211; especially the part about tag history was way too long. I&#8217;d suggest to skip that part and read <a href="http://www.pui.ch/phred/archives/2007/05/tag-history-and-gartners-hype-cycles.html">my blog post about this subject</a> (this part probably works better in a blog post than in a presentation). Ah, and the last 3 or 4 minutes are missing but you don&#8217;t really miss something.</p>
<p><embed style="width:400px; height:326px;" id="VideoPlayback" type="application/x-shockwave-flash" src="http://video.google.com/googleplayer.swf?docId=7213509817373019825&#038;hl=en" flashvars=""> </embed></p>
]]></content:encoded>
			<wfw:commentRss>http://www.pui.ch/phred/archives/2007/06/improving-navigation-in-tag-spaces.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>New Job / Presentation at Webtuesday</title>
		<link>http://www.pui.ch/phred/archives/2007/04/new-job-presentation-at-webtuesday.html</link>
		<comments>http://www.pui.ch/phred/archives/2007/04/new-job-presentation-at-webtuesday.html#comments</comments>
		<pubDate>Thu, 26 Apr 2007 18:09:26 +0000</pubDate>
		<dc:creator>Philipp Keller</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Job]]></category>
		<category><![CDATA[Tags]]></category>

		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2007/04/new-job-presentation-at-webtuesday.html</guid>
		<description><![CDATA[I started a new job at local.ch in February &#8211; yeah, it&#8217;s been a while already.
Local.ch is a local search engine for Switzerland, that means I can now work on information retrieval related stuff full time &#8211; which was what I did in my free time already. Being paid for doing the things I like [...]]]></description>
			<content:encoded><![CDATA[<p class="first">I started a new job at <a href="http://www.local.ch">local.ch</a> in February &#8211; yeah, it&#8217;s been a while already.</p>
<p>Local.ch is a local search engine for Switzerland, that means I can now work on information retrieval related stuff full time &#8211; which was what <a href="http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html">I did in my free time already</a>. Being paid for doing the things I like is a gift I don&#8217;t take for granted.</p>
<p>The R&amp;D team <a href="http://weblog.patrice.ch/">consists</a> <a href="http://www.dexter.cc/">of</a> <a href="http://www.keepthebyte.ch/blog.html">about</a> <a href="http://www.sitepoint.com/articlelist/210">10</a> people &#8211; all very talented and smart. Plus, the atmosphere is friendly yet challenging.</p>
<h3>Say bye to tag clouds</h3>
<p class="first">Then, I&#8217;ll <a href="http://www.webtuesday.ch/meetings/20070508">give a talk at webtuesday</a>, Zurich about &quot;Improving navigation in tag spaces&quot;: Why tag clouds don&#8217;t make much sense, why<br />
tagging lost its ground and what could be done to improve the users experience.</p>
<p>The talk will be based on the few blog posts I wrote about this subject plus some newly gained insights.<br />
If you&#8217;re living near Zurich it would be a pleasure to meet you there.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pui.ch/phred/archives/2007/04/new-job-presentation-at-webtuesday.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Automated tag clustering</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated-tag-clustering.html</link>
		<comments>http://www.pui.ch/phred/archives/2006/07/automated-tag-clustering.html#comments</comments>
		<pubDate>Tue, 11 Jul 2006 06:03:37 +0000</pubDate>
		<dc:creator>Philipp Keller</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Del.icio.us]]></category>
		<category><![CDATA[RawSugar]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Tags]]></category>

		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html</guid>
		<description><![CDATA[Grigory Begelman (Technion &#8211; Israel Institute of Technology Computer Science Dpt), Frank Smadja (RawSugar) and I did a paper for www2006 called &#8220;automated tag clustering&#8221;. It deals with why clustering the tag space makes sense and how this could be done.
After the presentation at the tagging workshop at www2006 we felt the need to give [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cs.technion.ac.il/%7Egbeg/">Grigory Begelman</a> (<a href="http://www.cs.technion.ac.il/">Technion &#8211; Israel Institute of Technology Computer Science Dpt)</a>, <a href="http://smadja.us/">Frank Smadja</a> (<a href="http://www.rawsugar.com/">RawSugar</a>) and I did a paper for <a href="http://www2006.org">www2006</a> called &#8220;automated tag clustering&#8221;. It deals with why clustering the tag space makes sense and how this could be done.</p>
<p>After the presentation at the <a href="http://blog.rawsugar.com/wikka/wikka.php?wakka=HomePage">tagging workshop</a> at www2006 we felt the need to give our paper a more www-friendly, I-don&#8217;t-want-to-read-through-those-theoretical-equation-flooded-papers face.</p>
<p>So, here you go: <a href="http://www.pui.ch/phred/automated_tag_clustering/">Automated Tag Clustering: Improving search and exploration in the tag space</a>. To read this document you should have a clue what tags are about, you should also know some tag services as <a href="http://del.icio.us">delicious</a> or <a href="http://www.flickr.com">flickr</a> so you can understand the limitations these services currently have. <span id="more-41"></span><a href="http://www.pui.ch/phred/automated_tag_clustering/#cluster"><img title="clustering the tag space" alt="clustering the tag space" id="image42" src="http://www.pui.ch/phred/wp-content/uploads/2006/07/clusters.png" /></a>If you don&#8217;t want to read through the whole papers, the numerous figures give you a good summary. Finally, to wet your appetite, here a few excerpts of the document:</p>
<blockquote><p>Currently tagging services still provide a relatively marginal value for information discovery and we claim that with the use of clustering techniques this can be greatly improved [from <a href="http://www.pui.ch/phred/automated_tag_clustering/#p_motivation">introduction</a>]</p></blockquote>
<blockquote><p>The whole promise of collaborative tagging is that by exploring the tag space you can discover a lot of useful information you would not find with traditional search engines.  When your information need is not well defined, the idea that you can explore and see what other people tagged with certain tags is very attractive. We believe that tagging will be able to reach a very wide audience only when exploration techniques will be effective. [from <a href="http://www.pui.ch/phred/automated_tag_clustering/#p_exploration">limited exploration</a>]</p></blockquote>
<blockquote><p>Although a great visualization paradigm, we believe that with today&#8217;s tagclouds it is hard to find more than one or two tags to click on. Tags are not grouped, there is too much information, so that you find lot of related tags scattered on the tag cloud.  One or two popular topics and all their related tags tend to dominate the whole cloud.  For example, looking at the del.icio.us tagcloud, one would mostly see tags related to web design and technologies. This is because these topics are overwhelmingly more frequent than anything else. [from <a href="http://www.pui.ch/phred/automated_tag_clustering/#p_exploration">limited exploration</a>]</p></blockquote>
<blockquote><p>Tag <em>web2.0</em> nowadays is so popular and is combined wildly with anything. In fact this tag is so overused that if you look at <a href="http://del.icio.us/tag/bookmarks">tag <em>bookmarks</em> in the del.icio.us dataset</a>, the most used cotag is <em>web2.0</em>[...]. Basing tag similarity on these numbers often doesn&#8217;t make sense at all. The similarity measure should be chosen so the popularity of a tag doesn&#8217;t affect the set of a tags related tags. Don&#8217;t cut the <a href="http://en.wikipedia.org/wiki/Long_tail">long tail</a>. The success of blogs is driven by the importance of the long tail. We all know that it is crucial to support the niches. Tagging applications should empower the long tail too. If you just sort by popularity, you&#8217;d loose all those niches. [from <a href="http://www.pui.ch/phred/automated_tag_clustering/#p_similarity">choosing a similarity measure</a>]</p></blockquote>
<p>We&#8217;d be happy to get any kind of feedback on the article. Just post a comment to this blog post.</p>
<p><strong>Edit (4 years later!)</strong>: A few guys asked me about the source code: <a href="http://pastie.org/1098455">Source code with syntax highlighting</a>, <a href="http://www.pui.ch/phred/archives/cluster.py">download</a>.<br />
You need <a href="http://people.sc.fsu.edu/~jburkardt/c_src/kmetis/kmetis.html">kmetis</a> to make this run, see <code>usage()</code> to see how it should be used.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pui.ch/phred/archives/2006/07/automated-tag-clustering.html/feed</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>www2006 and collaborative tagging workshop</title>
		<link>http://www.pui.ch/phred/archives/2006/04/www2006-and-collaborative-tagging-workshop.html</link>
		<comments>http://www.pui.ch/phred/archives/2006/04/www2006-and-collaborative-tagging-workshop.html#comments</comments>
		<pubDate>Tue, 25 Apr 2006 06:16:47 +0000</pubDate>
		<dc:creator>Philipp Keller</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Projects]]></category>

		<guid isPermaLink="false">http://www.pui.ch/phred/?p=39</guid>
		<description><![CDATA[Just a short note:
Grigory Begelman (Technion &#8211; Israel Institute of Technology Computer Science Dpt), Frank Smadja (RawSugar) and me are giving a presentation at this years www2006 conference in Edinburgh. I&#8217;m very glad our paper was accepted to the Collaborate Web Tagging Workshop. We will talk about automated tag clustering. I will give a demo [...]]]></description>
			<content:encoded><![CDATA[<p>Just a short note:<br />
<a href="http://www.cs.technion.ac.il/%7Egbeg/">Grigory Begelman</a> (<a href="http://www.cs.technion.ac.il/">Technion &#8211; Israel Institute of Technology Computer Science Dpt</a>), <a href="http://smadja.us/">Frank Smadja</a> (<a href="http://www.rawsugar.com">RawSugar</a>) and me are giving a presentation at this years <a href="http://www2006.org/">www2006</a> conference in Edinburgh. I&#8217;m very glad our <a href="http://www.rawsugar.com/www2006/20.pdf">paper</a> was accepted to the <a href="http://www.rawsugar.com/www2006/taggingworkshopschedule.html">Collaborate Web Tagging Workshop</a>. We will talk about automated tag clustering. I will give a demo of clustering popular urls. It&#8217;s like <a href="http://popurls.com/">popurls</a> grouped by categories instead of origin.
</p>
<p>
I will write more about it afterwards as I&#8217;m pretty busy finishing my demo.
</p>
<p>
If you will attend the conference, leave me a note so we could meet somewhen at the conference.<br />
I&#8217;m looking forward to this conference as it will be my first one.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pui.ch/phred/archives/2006/04/www2006-and-collaborative-tagging-workshop.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How tagging could gain ground</title>
		<link>http://www.pui.ch/phred/archives/2005/11/how-tagging-could-gain-ground.html</link>
		<comments>http://www.pui.ch/phred/archives/2005/11/how-tagging-could-gain-ground.html#comments</comments>
		<pubDate>Tue, 29 Nov 2005 20:54:28 +0000</pubDate>
		<dc:creator>Philipp Keller</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Del.icio.us]]></category>
		<category><![CDATA[Tags]]></category>

		<guid isPermaLink="false">http://www.pui.ch/phred/?p=35</guid>
		<description><![CDATA[Is the revolution stuck?
When I first heard about del.icio.us (and after that few days when I didn&#8217;t get it..) I thought: &#8220;This is revolutionary&#8221;. There were many things tags made possible that were just not possible until that day.
Joshua Schachter was the guy that invented tags (or at least that&#8217;s how the story is being [...]]]></description>
			<content:encoded><![CDATA[<h2>Is the revolution stuck?</h2>
<p>When <a href="http://www.pui.ch/phred/archives/2005/02/delicious_is_te.html">I first heard about del.icio.us</a> (and after that few days when I didn&#8217;t get it..) I thought: &#8220;This is revolutionary&#8221;. There were many things tags made possible that were just not possible until that day.</p>
<p><a href="http://burri.to/~joshua/">Joshua Schachter</a> was the guy that invented tags (or at least that&#8217;s how the story is being told). Originally <a href="http://loosewire.typepad.com/blog/2005/01/the_tag_report__3.html">thought as a way to organize ones own bookmarks</a> the social effect became obvious:</p>
<blockquote><p>If everyone tags, the &#8220;community&#8221; profits.</p></blockquote>
<p>Now, we have del.icio.us. Now we organize our bookmarks with tags. <a href="http://www.flickr.com">And our photos</a>.<br />
And our <a href="http://www.librarything.com/">books</a>, <a href="http://www.millionsofgames.com/">our games</a>, <a href="http://myprogs.net/">our software</a>, <a href="http://supr.c.ilio.us/">our tagging sites</a>, and <a href="http://bulldogster.ning.com/">also your bulldogs</a>, if you have any.</p>
<p>However, as we have tagged our whole life, what do we do with it? What is it good for?<br />
I fear the tagging-revolution is about to calm. And I believe that&#8217;s because many people don&#8217;t see the advantages in tagging. I believe that <strong>many many</strong> things can be made possible by using tag-based systems. If we realized this, tagging would get some fresh air and eventually tagging gets mainstream.</p>
<p>Is it just me, or is the tagging revolution really stuck? I desperately miss new, visionary, inventive articles on tags.</p>
<ul>
<li>To all smart people, where are your ideas?</li>
<li>To all programming geeks: Where are your algorithms, your &#8220;proof of concept&#8221; web services?</li>
</ul>
<p>I could stop here with my article, but, hey, I don&#8217;t want to be the grumbling guy that sits and waits for new things coming up, so here I am, trying to expose my brain to you.<br />
In this article I want to take a look at what areas tags are already strong in and how tagging could gain ground in these areas.<br />
<span id="more-35"></span></p>
<h2>Tags help you to organize</h2>
<p>When Joshua came up with the idea of tags, it was purely meant for organizing. It was only when also other people started organizing by tags, when the whole idea of &#8220;folksonomy&#8221; came up.<br />
What does organizing mean? It is like tidying ones room: You put every paper and pencil you have at a place you can remember and seems logical to you so you can easily remember where you have put that thing. Now as we are not limited into physical means when we organize data we have many new possibilities. There are already <a href="http://wiki.osafoundation.org/bin/view/Journal/HierarchyVersusFacetsVersusTags">good articles</a> about this so I won&#8217;t discuss this in detail here.</p>
<p>At the end of the day, the question arises: Is organizing your bookmarks by tags really that good? </p>
<p>Just to make the point I come up with another way to remember things: While browsing, the browser could save all pages in a cache and when you are searching for a page you have visited (which is why you originally bookmark the page anyway), you make a fulltext search through all your cached pages. It&#8217;s a kind of &#8220;Google search&#8221; over pages you have already visited. I know this would have some downsides but it would have some advantages too. I often searched for a page I bookmarked and couldn&#8217;t remember the tag I used. This problem wouldn&#8217;t occur in the &#8220;searching through cache&#8221; system.</p>
<p>What I am trying to say is: <strong>If tagging solely would be for organizing your stuff, it wouldn&#8217;t be worth the trouble</strong>.</p>
<h2>Folksonomy &#8211; Classification of the masses</h2>
<p><a href="http://en.wikipedia.org/wiki/Folksonomy">Folksonomy</a> is &#8211; as I understand it &#8211; the distributed classification of data by the big mass of people who tag stuff. Folksonomy often is said as a new system to build a <a href="http://en.wikipedia.org/wiki/Taxonomy">taxonomy</a>. It&#8217;s like building the <a href="http://dmoz.org/about.html">Open Directory</a> by thousands of people tagging stuff.</p>
<p>What is folksonomy good for? Why do we want to put bookmarks into categories?</p>
<h3>Folksonomy enables to explore</h3>
<p>Where do you go to start building a new expertise? Is it del.icio.us? Is it Google?<br />
Let&#8217;s say your boss tells you that the data your software saves in the database should be encoded. You didn&#8217;t think much about cryptography, it merely was a topic that you &#8220;should know about&#8221; but you were never really interested in cryptography (I&#8217;m speaking for myself here.. :-) ). You don&#8217;t really know where to start. You know you want to know something about cryptography, but you don&#8217;t know exactly what.<br />
A good list of articles or even starting points could shorten your learning curve.<br />
Thereafter, you may want to &#8220;travel through the cryptography universe&#8221;. And to travel means knowing which articles are related to the one you just read and are so enthusiastic about. You need a map of the cryptography universe, you want to know what is left and right, top and bottom, you want to know everything and everyone related to &#8220;cryptography&#8221;.<br />
Now then: What would you do?</p>
<h3>Do tag systems help you to explore?</h3>
<h4>Delicious on cryptography</h4>
<div class="caption"><a href="http://del.icio.us/tag/cryptography"><img src="/phred/modules/delicious_cryptography.png" alt="delicious results on cryptography" title="delicious results on cryptography" /></a><br />
Delicious results on &laquo;cryptography&raquo;</div>
<p>I would go on <a href="http://del.icio.us/tag/cryptography+introduction">del.icio.us/tag/cryptography+introduction</a>. There I find a nice article titled &#8220;<a href="http://www.garykessler.net/library/crypto.html">An Overview of Cryptography</a>&#8220;. I guess I&#8217;m lucky! If I&#8217;d read the article, I&#8217;d probably find out which subtopics exist, how cryptography is related to similar issues and so on. You kind of get this &#8220;map of the cryptography&#8221; universe. But, this is done by only one author. Probably I don&#8217;t trust him (probably I should do so, after reading his <a href="http://www.garykessler.net/resume.html">cv</a>), or you simply do not have time and/or energy to read through 44 pages, although the article looks good. I&#8217;ll probably <a href="http://del.icio.us/tag/cryptography">go back to delicious and find out</a>, that the related tags of &#8220;cryptography&#8221; are:</p>
<ul>
<li>security</li>
<li>reference</li>
<li>encryption</li>
<li>crypto</li>
<li>algorithms</li>
<li>computing</li>
<li>software</li>
<li>nsa</li>
<li>tutorial</li>
<li>kids</li>
<li>education</li>
</ul>
<p>Now this is not very convincing, is it? You argue:</p>
<blockquote><p>Yeah, but this is far better that what I get on Google</p></blockquote>
<p>. </p>
<h4>Google on cryptography</h4>
<div class="caption"><a href="http://www.google.ch/search?q=cryptography"><img src="/phred/modules/google_cryptography.png" alt="Google results on cryptography" title="Google results on cryptography" /></a><br />
Google results on &laquo;cryptography&raquo;</div>
<p><a href="http://www.google.ch/search?q=cryptography">It is</a>. When looking at this Google results I remember that Google is meant for searching when I already know what I search for. But now I am at a different stage. I don&#8217;t know exactly what to search for. I don&#8217;t know, because I don&#8217;t have any expertise in cryptography. BTW: Google does come up with an article that looks like a good introduction into cryptography as well..</p>
<h4>Open directory on security</h4>
<p>What about <a href="http://dmoz.org/about.html">open directory</a>? Let&#8217;s give it a try: After typing in &#8220;cryptography&#8221; I find out that this topic is classified in <a href="http://www.google.com/Top/Science/Math/Applications/Communication_Theory/Cryptography">Science &gt; Math &gt; Applications &gt; Communication_Theory &gt; Communication Theory &gt; Cryptography</a>. Clicking this link you get what you were probably looking for.<br />
You get a nice overview:
<div class="caption"><a href="http://www.google.com/Top/Science/Math/Applications/Communication_Theory/Cryptography"><img src="/phred/modules/google_directory_cryptography.png" alt="Google open directory on cryptography" title="Google open directory on cryptography"/></a><br />
Google open directory on &laquo;cryptography&raquo;</div>
<ul>
<li>Algorithms</li>
<li>Books</li>
<li>Events</li>
<li>Historical</li>
<li>Journals</li>
<li>People</li>
<li>Programming Libraries</li>
<li>Research Groups</li>
<li>Theory</li>
</ul>
<p>Now you stand at a guidepost. You see the &#8220;cryptography universe&#8221;. You probably don&#8217;t see what is left and right to cryptography, but here you have a &#8220;cryptography at a glance&#8221;.<br />
Now it&#8217;s up to you: Do you want to explore &#8220;algorithm land&#8221;, take the shortcut and download the programming library of the language of your choice? Or do you even want to get advice from people that are experts on that matter?<br />
Even if the links provided here don&#8217;t give you what you are looking for, here you get a clue what you should look for.</p>
<h4>Comparing the three</h4>
<p>Let&#8217;s compare browsing to a reallife quest: Finding out where your next conference will take place. Say you want to go to the next <a href="http://conferences.oreillynet.com/etech/">etech conference</a>, you don&#8217;t know where it is and you are not an American citizen.</p>
<div class="caption"><img src="/phred/modules/too_near.png" alt="Ouch, nearly bumped my head into horton plaza!" title="Ouch, nearly bumped my head into horton plaza!"/><br />
Ouch, nearly bumped my head into horton plaza!</div>
<p>On the conference websites they often put a map showing the conference place like 10 meters above surface. This map <strong>is</strong> helpful. But only at the point when you are quite next to the conference. </p>
<div class="caption"><img src="/phred/modules/too_far.png" alt="Help, I cannot breathe out there!" title="Help, I cannot breathe out there!"/><br />
Help, I cannot breathe out there!</div>
<p>Then, when you desperately search for a more general map, you&#8217;ll possibly find a map of how it looks from outer space. Yeah, I know that San Diego is in the US, but I&#8217;d like to know which airport is next to the conference.</p>
<div class="caption"><img src="/phred/modules/web_organization.png" alt="Distances between observer and data" title="Distances between observer and data" /><br />
Distances between observer and data</div>
<p>That&#8217;s quite similar to the views we have with del.icio.us and open directory.<br />
Delicious would tell you: &#8220;the roads nearby are &#8216;union street&#8217;, &#8216;Broadway circle&#8217; and &#8216;Broadway&#8217;&#8230;&#8221;, open directory proclaims: &#8220;we have five continents in the world: &#8216;America&#8217;, &#8216;Asia&#8217;, &#8216;Africa&#8217;, &#8216;Australia&#8217; and &#8216;Europe&#8217;&#8230;&#8221;. Now, I&#8217;m exaggerating a bit but you get the point: Sometimes you need a map that lays between the too detailed and the too general map.<br />
Looking for this type of view is like saying: &#8220;I want a bit more <a href="http://en.wikipedia.org/wiki/Ontology">ontology</a> than tags but not that much <a href="http://en.wikipedia.org/wiki/Taxonomy">taxonomy</a> as open directory&#8221;. That&#8217;s where I&#8217;ve put the question mark. It&#8217;s not that you always want to see the data at that distance but sometimes you desperately want to have that viewpoint.</p>
<p>Now, what has this to do with tagging? I believe that this missing in-between view can be won by analyzing tags.<br />
Have you noticed how flickr does this in-between view?<br />
When you search for love, <a href="http://www.flickr.com/photos/tags/love/clusters/">flickr cluster</a> asks you: &#8220;What do you mean by &#8216;love&#8217;?&#8221;:</p>
<div class="caption"><a href="http://www.flickr.com/photos/tags/love/clusters/"><img src="/phred/modules/flickr_clusters.png" alt="flickr clusters on love" title="flickr clusters on love" /></a><br />
flickr cluster results on &laquo;love&raquo;</div>
<ul>
<li>a <strong>couple</strong> <strong>kiss</strong>ing?</li>
<li>a <strong>mother</strong> holding it&#8217;s <strong>baby</strong>?</li>
<li>a <strong>red</strong> <strong>heart</strong>?</li>
</ul>
<p>&#8220;Wait: Flickr is a bit different from del.icio.us&#8221;, you say. Yup. Flickr uses a <a href="http://www.personalinfocloud.com/2005/02/explaining_and_.html">narrow</a>, del.icio.us a broad <a href="http://www.personalinfocloud.com/2005/02/explaining_and_.html">folksonomy</a> system.<br />
But I believe that the data clusters, flickr creates with it&#8217;s narrow folksonomy data, can also be generated with delicious&#8217; broad folksonomy data. I am programming an algorithm that computes del.icio.us clusters. I&#8217;m still at an early stage but I get clusters like this &#8220;shopping cluster&#8221;:</p>
<div class="caption"><img src='/phred/modules/shopping_cluster.png' alt="shopping cluster" title="shopping cluster" /><br />
&laquo;shopping&raquo; cluster</div>
<p>I realize that even if the cluster data is available, there&#8217;s the question how to navigate through the data. The &#8220;zooming in&#8221; and &#8220;zooming out&#8221; won&#8217;t be as easy as with Google maps.<br />
But anyway, here is the land no one has explored before. I think this is the area we should talk about. Here is room for improvement.</p>
<h3>Folksonomy helps you to stay informed about a certain topic</h3>
<p>Back to what folksonomies are good for: If you have built an expertise in cryptography, you want to stay informed. If <a href="http://en.wikipedia.org/wiki/RSA">RSA</a> is hacked, you certainly want to be informed.<br />
Delicious has got an &#8220;<a href="http://del.icio.us/inbox/phred">Inbox</a>&#8221; where you can subscribe to a tag, e.g. &#8220;cryptography&#8221;.<br />
Each bookmark that is tagged &#8220;cryptography&#8221; gets in your inbox. That&#8217;s a great way to <strong>stay</strong> informed. Alternatively you have a list of <a href="http://del.icio.us/popular/cryptography">of recent popular sites</a> tagged &#8220;cryptography&#8221;. You can subscribe to this lists using RSS and hopefully you get informed timely if RSA is hacked..</p>
<h3>Do tag systems keep you informed?</h3>
<p>I think the comparison with the distance to the data applies here too:<br />
If I&#8217;d <a href="http://del.icio.us/rss/tag/cryptography">subscribe to cryptography</a>, I&#8217;d probably miss some important items, just because the guy who bookmarked it used the tag &#8220;crypto&#8221;. On the other hand, I do not want to be informed about another <a href="http://en.wikipedia.org/wiki/Rijndael">Rijndael</a> algorithm, I want to narrow the incoming links to articles or essays that deal with cryptography.<br />
Delicious already offers to narrow results: I could <a href="http://del.icio.us/rss/tag/cryptography+essay">subscribe to &laquo;cryptography&raquo; and &laquo;essay&raquo;</a>, and, when delicious will support union (and it will, <a href="http://lists.del.icio.us/pipermail/discuss/2005-November/004390.html">as Joshua promises</a>), I also could have <a>subscribe to (cryptography or crypto) and (essay or article)</a> but you see that it doesn&#8217;t really solve the problem.<br />
I imagine that one day you can say:</p>
<blockquote><p>I want to keep being informed about cryptography</p></blockquote>
<p>and the service asks you:</p>
<blockquote><p>Should I keep you informed about</p>
<ul>
<li>new implementations</li>
<li>new articles/essays</li>
<li>security issues</li>
</ul>
</blockquote>
<p>And I believe this is possible. Flickr already asks you this when you are searching for <a href="http://www.flickr.com/photos/tags/love/clusters/">love pictures</a>. I guess it will be based on clusters again.</p>
<h2>Tags help you sharing Lists</h2>
<p>Back to what tags are good for: They help you building lists. Let&#8217;s name a few examples:</p>
<ul>
<li><strong>Wish lists</strong>: I know that <a href="http://www.amazon.com/exec/obidos/wishlist">numerous</a> <a href="http://froogle.google.com/shoppinglist">online</a> <a href="http://www.giftboxhome.com/">shops</a> enable you building whishlists. But I&#8217;d like to have a whishlist that&#8217;s not bound to a company, that I can arrange and rearrange. <a href="http://del.icio.us/mpe/whishlist">Many</a> <a href="http://del.icio.us/janson/wishlist/">are</a> <a href="http://del.icio.us/Lillith_Within/whishlist">already</a> <a href="http://del.icio.us/a9bejo/whishlist">using</a> del.icio.us as a storage of their wish list.</li>
<li><strong>Share your bookmarks</strong>: A friend asked me for some links to javascript WYSIWYG editors. <a href="http://del.icio.us/phred/javascript+editor">I gave him a list</a> of all my bookmarks tagged <code>javascript</code> and <code>editor</code></li>
<li><strong>Offer viewpoints of your data</strong>: Let&#8217;s say your favourite CMS features tagging (<a href="http://dema.ruby.com.br/articles/2005/08/27/easy-tagging-with-rails">featured in many of those new fancy ruby on rails applications</a>), I&#8217;m not speaking about blogs here: To allow &#8220;normal&#8221; visitors to view your data, you&#8217;ll add a navigation providing starting points to your entries; specific locations a visitor can jump in to so he could take bathe in your articles. Probably you would add a link to all items tagged &#8220;references&#8221; and &#8220;networking&#8221; to achieve that.</li>
</ul>
<h3>How can tag lists be improved?</h3>
<p>I&#8217;m often annoyed that I cannot put my del.icio.us links in a specific order. I <a href="http://www.pui.ch/del_list/">did a little script</a> that puts my newest bootkmark at the bottom but it doesn&#8217;t fully solve the problem.<br />
Actually I&#8217;d like being able to compose a <a href="http://en.wikipedia.org/wiki/View_%28database%29">view</a> of tagged bookmarks, i.e. I want to offer a list of all firms our company has built the network for:</p>
<blockquote>
<h3>Networking references</h3>
<h4>Big firms</h4>
<ul>
<li><a href="http://www.ubs.ch">UBS</a></li>
<li><a href="http://www.migros.ch">Migros</a></li>
<li><a href="http://www.abb.ch">ABB</a></li>
</ul>
<h4>Medium-sized firms</h4>
<ul>
<li><a href="http://www.stadlerrail.ch/">Stadlerrail</a></li>
<li><a href="http://www.search.ch/rim.html">Räber Information Management GmbH</a></li>
</ul>
<h4>Small firms</h4>
<ul>
<li><a href="http://www.citrin.ch">Citrin Informatik GmbH</a></li>
<li><a href="http://www.thildykeller.ch">Goldschmiedeatelier Thildy Keller</a></li>
<li><a href="http://www.minifruits.ch">Mini Fruits Trading</a></li>
</ul>
</blockquote>
<p>Nowadays, such a list can&#8217;t be automatically generated from my bookmarks, but it could be, by letting me configure my view as <code>myView = (references+networking, "Networking References", (big_firms, medium-sized_firms, small_firms))</code>.<br />
I know it&#8217;s not a <strong>big</strong> challenge to program such a thing, but nonetheless it doesn&#8217;t exist, as far as I know?</p>
<h2>Bottom line</h2>
<p>It appears to me that there&#8217;s not been much progress being done related to tagging systems lately. What rather became better is the <a href="http://blog.del.icio.us/blog/2005/11/find_the_url_of.html">embedding of tagging systems into already existing technologies such as search</a>. It gives the impression that core issues are done and that there&#8217;s no much room for improvement. In this article I wanted to disprove this.<br />
I think that there&#8217;s much much more than I have written in here, I even believe that todays tagging applications cover just about 5% of all the possible features tagging makes possible. Thus, let&#8217;s gain ground.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pui.ch/phred/archives/2005/11/how-tagging-could-gain-ground.html/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Does del.icio.us scale?</title>
		<link>http://www.pui.ch/phred/archives/2005/08/does-delicious-scale.html</link>
		<comments>http://www.pui.ch/phred/archives/2005/08/does-delicious-scale.html#comments</comments>
		<pubDate>Wed, 31 Aug 2005 06:12:50 +0000</pubDate>
		<dc:creator>Philipp Keller</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Del.icio.us]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Tags]]></category>

		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/08/does-delicious-scale.html</guid>
		<description><![CDATA[Lately it became very silent around del.icio.us. There are some new features but nothing groundbreaking. Either people are used to it and use it as a daily tool and there&#8217;s no need for new things or otherwise folks just don&#8217;t have faith in the future of del.icio.us.
I am a big fan of delicious. I&#8217;ve got [...]]]></description>
			<content:encoded><![CDATA[<p>Lately it became very silent around <a href="http://del.icio.us">del.icio.us</a>. There are <a href="http://blog.del.icio.us/blog/2005/08/we_rolling.html">some</a> <a href="http://blog.del.icio.us/blog/2005/08/search_me.html">new</a> <a href="http://blog.del.icio.us/blog/2005/08/people_who_like.html">features</a> but nothing groundbreaking. Either people are used to it and use it as a daily tool and there&#8217;s no need for new things or otherwise folks just don&#8217;t have faith in the future of del.icio.us.</p>
<p>I am a big fan of delicious. I&#8217;ve got 1.5K bookmarks there, I like it&#8217;s spirit and how open everything is. This article isn&#8217;t meant to criticize, but I think delicious is facing some problems.<br />
<span id="more-34"></span></p>
<h2>Performance scale</h2>
<p>You might have read my article about <a href="http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html">Tag system performance</a>. To summarize my tests: MySQL is just not built for large tag-systems. It just doesn&#8217;t scale. It does scale up to 1 Million items but delicious does have far more posts.<br />
I am pretty sure delicious is still on the MySQL train, this strong believe comes from my performance tests: The mysql-schemas I tested really have the same characteristics as delicious has.<br />
I fear delicious faces a performance dead end: They <a href="http://blog.del.icio.us/blog/2005/06/moving_to_new_s.html">have put more servers in the mix</a>, they cache quite a bit, it still is slow. I strongly believe that for delicious to have a future it must become much faster. For me this is the number one downside of delicious. I dream of a bookmark service that has billions of bookmark-posts yet it still will perform nicely. I think it is time for new tag-systems to come up. On <a href="http://lists.tagschema.com/mailman/listinfo/tagdb">tagdb mailing list</a>, there are very good ideas how large scaled tagging systems should work (e.g. systems powered by <a href="http://lucene.apache.org/">Lucene</a>).</p>
<h2>Popular link scale</h2>
<p>I think one of the coolest feature of delicious is the <a href="http://del.icio.us/popular/">popular</a> page. When you read this page regularly you are up to date.. wait: you are up to date concerning CSS tips and firefox and live hacks. You all know that if delicious would get mainstream that page wouldn&#8217;t be that interesting any more. It already got boring a bit. As someone put it: </p>
<blockquote><p>I particularly cannot look at that CSS link lists anymore</p></blockquote>
<p>I think this page doesn&#8217;t scale. It is stuck. And moreover it&#8217;s a pity that the coolest page on delicious is not about tags. At first glance you don&#8217;t even see what tags a popular link has.<br />
IMHO what is needed here are clusters. Bookmarks go into categories: &#8220;browsers&#8221;, &#8220;programming&#8221;, &#8220;design&#8221; but also &#8220;health&#8221;, &#8220;politics&#8221;. When delicious gets mainstream there most certainly will be &#8220;sports&#8221; or &#8220;stars&#8221;.<br />
One should then have the possibility to subscribe to certain clusters or better make this subscription automatically out of tags in a users bookmarks.</p>
<h2>Bottom line</h2>
<p>I think there are some fundamental things that must be rearranged at delicious, otherwise there will be</p>
<ul>
<li>a) a big competitor (Google? Yahoo? Microsoft?) coming up or </li>
<li>b) people will spread to different bookmark services that concentrate on certain clusters. Probably some meta-sites will arise where you can have an overview over all the different sites</li>
</ul>
<p>I think this problems will arise for every bigger tagsystem. I hope that people will not sniff at tagging systems thinking that they don&#8217;t perform well enough..</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pui.ch/phred/archives/2005/08/does-delicious-scale.html/feed</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Analyzing tag-connections</title>
		<link>http://www.pui.ch/phred/archives/2005/07/analyzing-tag-connections.html</link>
		<comments>http://www.pui.ch/phred/archives/2005/07/analyzing-tag-connections.html#comments</comments>
		<pubDate>Sun, 17 Jul 2005 18:03:43 +0000</pubDate>
		<dc:creator>Philipp Keller</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Del.icio.us]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Tags]]></category>

		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/07/analyzing-tag-connections.html</guid>
		<description><![CDATA[When you tag an item, for instance a bookmark, you give them different tags, for instance I tagged the bookmark for &#8220;How to Write More Clearly, Think More Clearly, and Learn Complex Material More Easily&#8221; (you know this link if you give attention to delicious popular.. :-)) with 
&#8220;writing&#8221;, &#8220;toread&#8221;, &#8220;productivity&#8221;, &#8220;language&#8221;
Now what instantially pops [...]]]></description>
			<content:encoded><![CDATA[<p>When you tag an item, for instance a bookmark, you give them different tags, for instance I tagged the bookmark for &#8220;<a href="http://www.ai.uga.edu/mc/WriteThinkLearn_files/frame.htm">How to Write More Clearly, Think More Clearly, and Learn Complex Material More Easily</a>&#8221; (you know this link if you give attention to <a href="http://del.icio.us/popular">delicious popular</a>.. :-)) with </p>
<blockquote><p>&#8220;writing&#8221;, &#8220;toread&#8221;, &#8220;productivity&#8221;, &#8220;language&#8221;</p></blockquote>
<p>Now what instantially pops into my mind is, that the tag &#8220;toread&#8221; is quite different from the other tags. In fact it is something I want to do with this bookmark further on. I name this type of tag &#8220;<strong>adjective</strong>&#8221; (I will come back to that name later on..). The other tags I consider as &#8220;<strong>categories</strong>&#8220;.<br />
Now you&#8217;ll probably say &#8220;ah, this is a rare exception&#8221;. This is not true. I often tag items with &#8220;blog&#8221; because it happens that the interesting page I found about my favourite hobby happens to be a blog. Therefore I named this type of tag as &#8220;adjective&#8221; as it is rather a description to the item than it is a category to it.<br />
Other tags used often as adjectives are &#8220;reference&#8221;, &#8220;tutorial&#8221;, &#8220;fun&#8221;, &#8220;cool&#8221;, &#8220;news&#8221;, &#8220;free&#8221;..<br />
<span id="more-33"></span><br />
Now this categorization is not very correct. Sometimes, I use &#8220;blog&#8221; not as a adjective. This is, if I want to bookmark a blog that has no content that interests me but it just looks good. Then, I&#8217;ll probably blog it as &#8220;design blog&#8221;. In that day when I redesign my blog, I want to search for all design blogs I tagged..<br />
You see: it lays all in the connection between those tags, not in the tags itself. This is IMO pretty important.</p>
<h2>What is that for?</h2>
<p></p>
<h3>Clusters</h3>
<p>You probably tried to cluster your bookmarks by using <a href="http://laurie.informatik.uni-bremen.de/clusty/">clusty</a>. What this service does: It tries to put your tags into separate clouds. You know the &#8220;<a href="http://lists.del.icio.us/pipermail/discuss/2005-March/002266.html">tag-bundles</a>&#8221; of delicious? This is something like a &#8220;auto-tag-bundle&#8221; feature. Try it out, if you not already did so and see the problems that arise..<br />
I think the key problem in this cluster-service lies in the fact that this service considers all connections (also the adjectives). But it shouldn&#8217;t do so! Adjectives aren&#8217;t tags I want in my clusters. Adjectives are spread all over my tags, so they should first be cut away from my &#8220;tag-tree&#8221; (the tree that is built out of your tag-connections you built by tagging bookmarks).</p>
<h3>Similar items</h3>
<p>This categorization is also important when you search for &#8220;similar&#8221; items of a bookmark. When I want to search for similar items of that &#8220;how to write more clearly&#8221;-article, I&#8217;ll search for &#8220;writing+productivity+language&#8221; and will leave out the &#8220;toread&#8221; tag (adjective).<br />
Probably this made you realize that categorizing tag-connections is an important task. </p>
<h3>Tag clouds</h3>
<p>Now there are those tag clouds. When I look at <a href="http://kevan.org/extispicious.cgi?name=phred">my taggloud</a> then the &#8220;biggest&#8221; tag is &#8220;resource&#8221;. Now tag clouds are here to easily find bookmarks (I never search my bookmarks for solely &#8220;resource&#8221;) or to have a map of your main interests (&#8220;what is your hobby?&#8221; &#8220;ah, I am a big fan of resources&#8221;.. :-) I am sure you were also annoyed by that. I want those adjective-tags cut away..!</p>
<h2>Synonyms</h2>
<p>Now back to some therory: There is a third type of tag-connections: Synonyms. &#8220;delicious&#8221; and &#8220;del.icio.us&#8221; are classic synonyms. But I consider &#8220;ruby&#8221; and &#8220;rails&#8221; as synonyms too (no, they aren&#8217;t synonyms but up to now they are used as synonyms). You type in the second tag just to be sure that you won&#8217;t search for the second and find nothing.. I don&#8217;t think this category is too important for the cluster-task but I just name it here because I&#8217;ll use it further on.</p>
<h2>Example</h2>
<p>Let&#8217;s go for an example.<br />
Lets consider tags that are connected to the tag &#8220;ajax&#8221;. I gathered some tag-connection-data from delicious (via its <a href="http://del.icio.us/rss/">rss-feed</a>). And I run a query on my statistical data. This is data gathered during the period of one week. It is not complete. But our experiment will work anyway:</p>
<table>
<thead>
<tr>
<td>tag-connection</td>
<td>weight</td>
<td>type</td>
</tr>
</thead>
<tbody>
<tr>
<td><strong>ajax-javascript</strong></td>
<td>234</td>
<td>synonym</td>
</tr>
<tr>
<td><strong>ajax-web</strong></td>
<td>105</td>
<td>category</td>
</tr>
<tr>
<td><strong>ajax-programming</strong></td>
<td>100</td>
<td>category</td>
</tr>
<tr>
<td><strong>ajax-xmlhttprequest</strong></td>
<td>52</td>
<td>synonym</td>
</tr>
<tr>
<td>ajax-css</td>
<td>51</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-design</td>
<td>46</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-php</td>
<td>44</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-development</td>
<td>36</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-xml</td>
<td>34</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-DHTML</td>
<td>33</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-webdev</td>
<td>33</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-webdesign</td>
<td>31</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-google</td>
<td>23</td>
<td>adjective</td>
</tr>
<tr>
<td>ajax-HTML</td>
<td>21</td>
<td>adjective</td>
</tr>
<tr>
<td>tutorial</td>
<td>14</td>
<td>adjective</td>
</tr>
</tbody>
</table>
<p>Column &#8220;tag-connection&#8221; is the tag connected to &#8220;ajax&#8221; (i.e. javascript), column &#8220;weight&#8221; depicts the number of times this connection occurred in a bookmark-post on delicious. The tags are ordered by weight. In column &#8220;type&#8221; you see the result of my computations for this tag-connection. Just to make it clear: These are all tags connected to tag &#8220;ajax&#8221; ordered number by occurrence of the connection. If a bookmark-post somebody did on delicious is tagged with &#8220;ajax&#8221; and &#8220;javascript&#8221; that gives one point for the &#8220;weight&#8221;-column for &#8220;ajax-javascript&#8221;.<br />
The outcome is quite good, I think (I must admit that I have taken the example that worked out best :-))<br />
There are some errors, sure: xml-ajax should be a &#8220;category&#8221;-type as well. But we are looking at the usage of these tags not their &#8220;real&#8221; meaning (whatever that is).</p>
<h2>Computation</h2>
<p></p>
<h3>Synonyms</h3>
<p>To compute these categorization I first went for the &#8220;synonyms&#8221;. The connection &#8220;ajax-javascript&#8221; is considered as synonym because &#8220;ajax-javascript&#8221; is &#8220;number one connection&#8221; of all connections where ajax is a part of. And when considering the connections of &#8220;javascript&#8221; (the &#8220;vice-versa-connection&#8221;), ajax is number two.<br />
I consider two tags as synonyms if &#8220;in one direction&#8221; the other tag is number one and in the other &#8220;direction&#8221; the other tag is in the top 10. I made up this rule because I think that in most cases there is one &#8220;stronger&#8221; synonym that is used most of the time when the &#8220;weaker&#8221; one is used. The fact that the tag &#8220;ajax&#8221; is mostly used with tag &#8220;javascript&#8221; could also mean that &#8220;javascript&#8221; is a supercategory of ajax (which it somehow is). To avoid that this sub-super-categogy-connections are considered as synonyms, we go sure that &#8220;ajax&#8221; is also important for &#8220;javascript&#8221; so ajax is not so sub to javascript.. I hope you can follow :-)</p>
<h3>Category/Adjective</h3>
<p>Then I compute the &#8220;category&#8221;. Lets put the values of the above table into a graph.<br />
<img src="/phred/modules/ajax_dist.png" alt="distribution of tags related to ajax" title="distribution of tags related to ajax"/><br />
On the x-axis you see the tags: The tick 1 stands for &#8220;web&#8221;, 2 for &#8220;programming&#8221;, 3 for &#8220;css&#8221;, 4=&#8221;design&#8221;, 5=&#8221;php&#8221; and so on. You see I removed the synonym-connections &#8220;ajax-javascript&#8221; and &#8220;ajax=xmlhttprequest&#8221; as I think they &#8220;disturb&#8221; the distribution.<br />
The y-axis depicts the weight of the connection: ajax-web has weight &#8220;105&#8243;, ajax-programming has weight &#8220;100&#8243; and so on.<br />
The black line is the &#8220;weight&#8221;-column of the table above, the red one is the first <a href="http://en.wikipedia.org/wiki/Derivative">derivative</a>, the blue one the second derivative of the weight function.<br />
This graph makes it clear that &#8220;web&#8221; and &#8220;programming&#8221; are used quite often in combination with &#8220;ajax&#8221;, then, there is quite a &#8220;gap&#8221; followed by the &#8220;adjective tail&#8221;. I consider the &#8220;adjective tail&#8221; as connections to be categorized as &#8220;adjective&#8221;. The tags in this tail are used &#8220;out of context&#8221;: They don&#8217;t really belong to the &#8220;ajax-cluster&#8221;. They sometimes occur together with ajax, but just sometimes. Mostly not. Therefore they are considered as &#8220;adjectives&#8221;.<br />
Now the task is to find this &#8220;gap&#8221;. In my experiments I tried to find the last gap. To find the last gap I started at the end of the tail and searched for the first peak of the first derivative (that is when the second derivative goes from positive to negative) and checked if the peak was high enough. If these to conditions were fulfilled, I snipped the connections into two parts the &#8220;pre-gap&#8221; connections (category) and the &#8220;post-gap&#8221; connections (adjective).<br />
The same computation has to be made for the &#8220;vice-versa&#8221; connection. I considered connections as &#8220;category&#8221; if one of both computations told that it is a &#8220;category&#8221;.</p>
<p><ins datetime="2005-07-18T15:43:36-02:00"></p>
<h2>Further processing: Ambiguous tags</h2>
<p>To achieve good clustering results, I think there is a need of checking if the tag is used in different ways. The prominent example hereof is &#8220;apple&#8221;. Now, when delicious is still restricted to the blogworld, it is clear that apple means Mac-apple. But in future this may change. To recognize if a tag is used in different environments, the algorithm would have to check the &#8220;neighbours of neighbours&#8221; (<a href="http://blog.pietrosperoni.it/2004/09/19/clustering-delicious-tags/">as suggested by Pietro Speroni</a>). That is for ajax: check if the neighbours of &#8220;javascript&#8221; are more or less the same as the neighbours of &#8220;web&#8221;. You see that it all lays in the connections between tags. The tag per se is not well-defined but the tag in connection with another tag defines it quite well. Therefore for clustering I&#8217;m proposing splitting up amiguous tags. That would add much more simplicity to the resulting clusters.</ins></p>
<h2>We are onto something</h2>
<p>I&#8217;m pretty sure we are onto something. I think this is direction it should go. Computations over tag-connection-distributions are cool. Users shouldn&#8217;t insert these infos when posting the bookmarks. Posting should stay easy. I&#8217;m not that sure about this &#8220;synonym&#8221;-computation but I think the &#8220;category&#8221;-computation turned out pretty good. I tried to build some clusters by hand just by considering the category and synonym-connections and I found a completely detached cluster consisting of the tags &#8220;cooking&#8221;, &#8220;health&#8221;, &#8220;recipes&#8221;, &#8220;diet&#8221; and &#8220;food&#8221;. As I said, I think we are onto something..</p>
<h2>Further reading</h2>
<ul>
<li><a href="http://www.rashmisinha.com/archives/05_02/tag-sorting.html">Building tag clusters by hand</a></li>
<li><a href="http://blog.pietrosperoni.it/2004/09/19/clustering-delicious-tags/">Pietro Speronis different approach to clustering tags (with java-mindmap-visualisation!)</a>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.pui.ch/phred/archives/2005/07/analyzing-tag-connections.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
