<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Then each went to his own home &#187; Research</title>
	<atom:link href="http://www.pui.ch/phred/archives/category/research/feed" rel="self" type="application/rss+xml" />
	<link>http://www.pui.ch/phred</link>
	<description>Philipp Kellers weblog</description>
	<lastBuildDate>Wed, 15 Dec 2010 12:37:04 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Automated tag clustering</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated-tag-clustering.html</link>
		<comments>http://www.pui.ch/phred/archives/2006/07/automated-tag-clustering.html#comments</comments>
		<pubDate>Tue, 11 Jul 2006 06:03:37 +0000</pubDate>
		<dc:creator>Philipp Keller</dc:creator>
				<category><![CDATA[Clustering]]></category>
		<category><![CDATA[Del.icio.us]]></category>
		<category><![CDATA[RawSugar]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Tags]]></category>

		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html</guid>
		<description><![CDATA[Grigory Begelman (Technion &#8211; Israel Institute of Technology Computer Science Dpt), Frank Smadja (RawSugar) and I did a paper for www2006 called &#8220;automated tag clustering&#8221;. It deals with why clustering the tag space makes sense and how this could be done.
After the presentation at the tagging workshop at www2006 we felt the need to give [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cs.technion.ac.il/%7Egbeg/">Grigory Begelman</a> (<a href="http://www.cs.technion.ac.il/">Technion &#8211; Israel Institute of Technology Computer Science Dpt)</a>, <a href="http://smadja.us/">Frank Smadja</a> (<a href="http://www.rawsugar.com/">RawSugar</a>) and I did a paper for <a href="http://www2006.org">www2006</a> called &#8220;automated tag clustering&#8221;. It deals with why clustering the tag space makes sense and how this could be done.</p>
<p>After the presentation at the <a href="http://blog.rawsugar.com/wikka/wikka.php?wakka=HomePage">tagging workshop</a> at www2006 we felt the need to give our paper a more www-friendly, I-don&#8217;t-want-to-read-through-those-theoretical-equation-flooded-papers face.</p>
<p>So, here you go: <a href="http://www.pui.ch/phred/automated_tag_clustering/">Automated Tag Clustering: Improving search and exploration in the tag space</a>. To read this document you should have a clue what tags are about, you should also know some tag services as <a href="http://del.icio.us">delicious</a> or <a href="http://www.flickr.com">flickr</a> so you can understand the limitations these services currently have. <span id="more-41"></span><a href="http://www.pui.ch/phred/automated_tag_clustering/#cluster"><img title="clustering the tag space" alt="clustering the tag space" id="image42" src="http://www.pui.ch/phred/wp-content/uploads/2006/07/clusters.png" /></a>If you don&#8217;t want to read through the whole papers, the numerous figures give you a good summary. Finally, to wet your appetite, here a few excerpts of the document:</p>
<blockquote><p>Currently tagging services still provide a relatively marginal value for information discovery and we claim that with the use of clustering techniques this can be greatly improved [from <a href="http://www.pui.ch/phred/automated_tag_clustering/#p_motivation">introduction</a>]</p></blockquote>
<blockquote><p>The whole promise of collaborative tagging is that by exploring the tag space you can discover a lot of useful information you would not find with traditional search engines.  When your information need is not well defined, the idea that you can explore and see what other people tagged with certain tags is very attractive. We believe that tagging will be able to reach a very wide audience only when exploration techniques will be effective. [from <a href="http://www.pui.ch/phred/automated_tag_clustering/#p_exploration">limited exploration</a>]</p></blockquote>
<blockquote><p>Although a great visualization paradigm, we believe that with today&#8217;s tagclouds it is hard to find more than one or two tags to click on. Tags are not grouped, there is too much information, so that you find lot of related tags scattered on the tag cloud.  One or two popular topics and all their related tags tend to dominate the whole cloud.  For example, looking at the del.icio.us tagcloud, one would mostly see tags related to web design and technologies. This is because these topics are overwhelmingly more frequent than anything else. [from <a href="http://www.pui.ch/phred/automated_tag_clustering/#p_exploration">limited exploration</a>]</p></blockquote>
<blockquote><p>Tag <em>web2.0</em> nowadays is so popular and is combined wildly with anything. In fact this tag is so overused that if you look at <a href="http://del.icio.us/tag/bookmarks">tag <em>bookmarks</em> in the del.icio.us dataset</a>, the most used cotag is <em>web2.0</em>[...]. Basing tag similarity on these numbers often doesn&#8217;t make sense at all. The similarity measure should be chosen so the popularity of a tag doesn&#8217;t affect the set of a tags related tags. Don&#8217;t cut the <a href="http://en.wikipedia.org/wiki/Long_tail">long tail</a>. The success of blogs is driven by the importance of the long tail. We all know that it is crucial to support the niches. Tagging applications should empower the long tail too. If you just sort by popularity, you&#8217;d loose all those niches. [from <a href="http://www.pui.ch/phred/automated_tag_clustering/#p_similarity">choosing a similarity measure</a>]</p></blockquote>
<p>We&#8217;d be happy to get any kind of feedback on the article. Just post a comment to this blog post.</p>
<p><strong>Edit (4 years later!)</strong>: A few guys asked me about the source code: <a href="http://pastie.org/1098455">Source code with syntax highlighting</a>, <a href="http://www.pui.ch/phred/archives/cluster.py">download</a>.<br />
You need <a href="http://people.sc.fsu.edu/~jburkardt/c_src/kmetis/kmetis.html">kmetis</a> to make this run, see <code>usage()</code> to see how it should be used.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pui.ch/phred/archives/2006/07/automated-tag-clustering.html/feed</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
	</channel>
</rss>

