<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Automated tag clustering</title>
	<atom:link href="http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html/feed" rel="self" type="application/rss+xml" />
	<link>http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html</link>
	<description>Philipp Kellers weblog</description>
	<pubDate>Fri, 12 Mar 2010 16:02:33 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Philipp Keller</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-130765</link>
		<dc:creator>Philipp Keller</dc:creator>
		<pubDate>Wed, 29 Jul 2009 20:35:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-130765</guid>
		<description>@Noor

The graph was done with GraphViz. You can laben the edges and even influence how long they should be, e.g. look at this example: http://www.graphviz.org/Gallery/undirected/ER.html

The coloring I did by hand with Inkscape</description>
		<content:encoded><![CDATA[<p>@Noor</p>
<p>The graph was done with GraphViz. You can laben the edges and even influence how long they should be, e.g. look at this example: <a href="http://www.graphviz.org/Gallery/undirected/ER.html" rel="nofollow">http://www.graphviz.org/Gallery/undirected/ER.html</a></p>
<p>The coloring I did by hand with Inkscape</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Noor</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-130758</link>
		<dc:creator>Noor</dc:creator>
		<pubDate>Wed, 15 Jul 2009 11:49:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-130758</guid>
		<description>Hi,

I was exploring the tagging world and really like ur work. I have question about the Cluster Graph. how you generate the graph (i mean visually) with weighted edges.

Thanks in advance
---
Noor
Aachen, Germany</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I was exploring the tagging world and really like ur work. I have question about the Cluster Graph. how you generate the graph (i mean visually) with weighted edges.</p>
<p>Thanks in advance<br />
&#8212;<br />
Noor<br />
Aachen, Germany</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shrijeet Paliwal</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-130015</link>
		<dc:creator>Shrijeet Paliwal</dc:creator>
		<pubDate>Wed, 05 Nov 2008 15:58:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-130015</guid>
		<description>Hi everyone,

I tried using kmetis from metis-4.0 suit and discovered it entertains only integer edge weights.

In case one opt for dice similarity the edge weights ought to have floating point values.

I was told by authors that hmetis2.0 supports floating weights. But unfortunately there are no windows binaries for the same yet and hence I could not give it a shot.

I plan to switch to igraph now. I will report the results when available. 

Mean while if any one has any suggestion or comment please respond. 

By the way I am trying to use the tag clustering in my image search system. I am quite optimistic about the results.

Best,
Shrijeet Paliwal
CS@Stony Brook University</description>
		<content:encoded><![CDATA[<p>Hi everyone,</p>
<p>I tried using kmetis from metis-4.0 suit and discovered it entertains only integer edge weights.</p>
<p>In case one opt for dice similarity the edge weights ought to have floating point values.</p>
<p>I was told by authors that hmetis2.0 supports floating weights. But unfortunately there are no windows binaries for the same yet and hence I could not give it a shot.</p>
<p>I plan to switch to igraph now. I will report the results when available. </p>
<p>Mean while if any one has any suggestion or comment please respond. </p>
<p>By the way I am trying to use the tag clustering in my image search system. I am quite optimistic about the results.</p>
<p>Best,<br />
Shrijeet Paliwal<br />
CS@Stony Brook University</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Randolf Rotta</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-123012</link>
		<dc:creator>Randolf Rotta</dc:creator>
		<pubDate>Thu, 17 Apr 2008 14:51:56 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-123012</guid>
		<description>Hi everyone,

as already pointed out METIS is minimizing the edges between different clusters while balancing the size of the clusters. Thus it isn't very useful for community detection.

Most of the community clustering algorithms implemented in igraph optimize the same quality measure "modularity" of Newman. It describes mathematically what we may expect from a good clustering: That the vertices (tags) have stronger connections (edge weight) to the other vertices of their cluster than in a graph where all edges are equally distributed (no structure, called null-model). 

In a perfect world all algorithms should produce the same result. But it's a NP-complete problem and thus they have to guess and differ in how well they achieve the optimization aim. Beware that some implementations in igraph ignore the edge weight which produces additional structure where none is (i.e. edge weight 0.01 vs. 1.0).

The good news is, that it isn't necessary to use eigenvectors to find good modularity clusterings. Some "simpler" algorithms exist with comparable or better results and without manually choosing the number of clusters.


Randolf</description>
		<content:encoded><![CDATA[<p>Hi everyone,</p>
<p>as already pointed out METIS is minimizing the edges between different clusters while balancing the size of the clusters. Thus it isn&#8217;t very useful for community detection.</p>
<p>Most of the community clustering algorithms implemented in igraph optimize the same quality measure &#8220;modularity&#8221; of Newman. It describes mathematically what we may expect from a good clustering: That the vertices (tags) have stronger connections (edge weight) to the other vertices of their cluster than in a graph where all edges are equally distributed (no structure, called null-model). </p>
<p>In a perfect world all algorithms should produce the same result. But it&#8217;s a NP-complete problem and thus they have to guess and differ in how well they achieve the optimization aim. Beware that some implementations in igraph ignore the edge weight which produces additional structure where none is (i.e. edge weight 0.01 vs. 1.0).</p>
<p>The good news is, that it isn&#8217;t necessary to use eigenvectors to find good modularity clusterings. Some &#8220;simpler&#8221; algorithms exist with comparable or better results and without manually choosing the number of clusters.</p>
<p>Randolf</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tamás Nepusz</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-114349</link>
		<dc:creator>Tamás Nepusz</dc:creator>
		<pubDate>Mon, 03 Mar 2008 18:01:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-114349</guid>
		<description>Hi Maria,

igraph implements a different spectral clustering based on the algorithm of Newman (see http://arxiv.org/abs/physics/0605087). It also has implementations for a couple of other clustering algorithms (e.g., the fast greedy modularity optimization of Clauset et al, the walktrap community detection of Latapy &#38; Pons, the spinglass clustering of Reichardt &#38; Bornholdt). If you have similarity measures, then the walktrap community detection or the spectral algorithm of Newman may be of some use to you.</description>
		<content:encoded><![CDATA[<p>Hi Maria,</p>
<p>igraph implements a different spectral clustering based on the algorithm of Newman (see <a href="http://arxiv.org/abs/physics/0605087" rel="nofollow">http://arxiv.org/abs/physics/0605087</a>). It also has implementations for a couple of other clustering algorithms (e.g., the fast greedy modularity optimization of Clauset et al, the walktrap community detection of Latapy &amp; Pons, the spinglass clustering of Reichardt &amp; Bornholdt). If you have similarity measures, then the walktrap community detection or the spectral algorithm of Newman may be of some use to you.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Maria Grineva</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-112180</link>
		<dc:creator>Maria Grineva</dc:creator>
		<pubDate>Thu, 21 Feb 2008 11:16:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-112180</guid>
		<description>Hi Philipp,

I was looking around to find a library/tool that could perform tag clustering. As I understand I need something that deals with graphs. 

I had a look at METIS (pmetis), but it partitions the graph into k equal size parts, and it doesn't seem to perform the task of finding communities in graphs.

I just found that the algorithm you mention (described in the paper "Spectral Clustering Approach to Finding Communities in Graphs") was implemented in igraph library http://cneurocvs.rmki.kfki.hu/igraph/ (at least they say that in their docs).

So my question is - do you know this lib? If yes, what is your recommendation?</description>
		<content:encoded><![CDATA[<p>Hi Philipp,</p>
<p>I was looking around to find a library/tool that could perform tag clustering. As I understand I need something that deals with graphs. </p>
<p>I had a look at METIS (pmetis), but it partitions the graph into k equal size parts, and it doesn&#8217;t seem to perform the task of finding communities in graphs.</p>
<p>I just found that the algorithm you mention (described in the paper &#8220;Spectral Clustering Approach to Finding Communities in Graphs&#8221;) was implemented in igraph library <a href="http://cneurocvs.rmki.kfki.hu/igraph/" rel="nofollow">http://cneurocvs.rmki.kfki.hu/igraph/</a> (at least they say that in their docs).</p>
<p>So my question is - do you know this lib? If yes, what is your recommendation?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Philipp Keller</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-109679</link>
		<dc:creator>Philipp Keller</dc:creator>
		<pubDate>Sat, 09 Feb 2008 15:40:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-109679</guid>
		<description>Niket: No, this can't be done by any automatic process. It's the same with email spam: You can tune your spam filter but the email spammer will notice and change their algorithms and hence you have to change yours, etc.

Although this way is tedious I guess all public services offering user generated content and/or public APIs have to go down that road.</description>
		<content:encoded><![CDATA[<p>Niket: No, this can&#8217;t be done by any automatic process. It&#8217;s the same with email spam: You can tune your spam filter but the email spammer will notice and change their algorithms and hence you have to change yours, etc.</p>
<p>Although this way is tedious I guess all public services offering user generated content and/or public APIs have to go down that road.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Niket Tandon</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-106958</link>
		<dc:creator>Niket Tandon</dc:creator>
		<pubDate>Wed, 23 Jan 2008 04:49:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-106958</guid>
		<description>Thanks a lot Philipp, 

I was able to contact Grigory and he's helping me. Sorry for the late response as I working out Eigenvectors and other mathematics concept, and implementing clustering algorithms.

Reg: Tag Spam- You said: "But the counter attack for spammers is very easy: Don’t use the same tag combinations too often."
So, what you essentially mean is: the spammers would tag resources with unrelated tags to an extent when it is not so large. So that one can't raise any suspicion. 

Do you think that thinking of all these factors, isn't there an opportunity for semantics of tags to come into the picture? Or can this in some way handled by some intelligent similarity measure?

Thanks in advance
Niket.</description>
		<content:encoded><![CDATA[<p>Thanks a lot Philipp, </p>
<p>I was able to contact Grigory and he&#8217;s helping me. Sorry for the late response as I working out Eigenvectors and other mathematics concept, and implementing clustering algorithms.</p>
<p>Reg: Tag Spam- You said: &#8220;But the counter attack for spammers is very easy: Don’t use the same tag combinations too often.&#8221;<br />
So, what you essentially mean is: the spammers would tag resources with unrelated tags to an extent when it is not so large. So that one can&#8217;t raise any suspicion. </p>
<p>Do you think that thinking of all these factors, isn&#8217;t there an opportunity for semantics of tags to come into the picture? Or can this in some way handled by some intelligent similarity measure?</p>
<p>Thanks in advance<br />
Niket.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Philipp Keller</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-103943</link>
		<dc:creator>Philipp Keller</dc:creator>
		<pubDate>Sat, 22 Dec 2007 19:37:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-103943</guid>
		<description>Hi Niket.

I don't know another paper or article that deals exactly with that problem.

About your questions:
[1] What I meant was: When I looked to bookmark spam, the spammers often used the same tag combination, e.g. health-education. They added hundreds of bookmarks with tag "health" and "education". This tag spam caused those two tags to have a very high similarity number. This number was so high that it was over the normal range of similarity. That's how I spotted the "tag-connection-spams". But the counter attack for spammers is very easy: Don't use the same tag combinations too often.

[2] No

[3] I don't know any of these programs. Even metis I never tried out. I'd need to find a library yourself. Maybe &lt;a href="http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/software.htm" rel="nofollow"&gt;this list of Open Source Clustering Algorithms&lt;/a&gt; can help you?

[4] This part of the paper were findings of &lt;a href="http://www.cs.technion.ac.il/~gbeg/index.html" rel="nofollow"&gt;Grigory Begelmann&lt;/a&gt;. You have to ask him about these peculiarities..</description>
		<content:encoded><![CDATA[<p>Hi Niket.</p>
<p>I don&#8217;t know another paper or article that deals exactly with that problem.</p>
<p>About your questions:<br />
[1] What I meant was: When I looked to bookmark spam, the spammers often used the same tag combination, e.g. health-education. They added hundreds of bookmarks with tag &#8220;health&#8221; and &#8220;education&#8221;. This tag spam caused those two tags to have a very high similarity number. This number was so high that it was over the normal range of similarity. That&#8217;s how I spotted the &#8220;tag-connection-spams&#8221;. But the counter attack for spammers is very easy: Don&#8217;t use the same tag combinations too often.</p>
<p>[2] No</p>
<p>[3] I don&#8217;t know any of these programs. Even metis I never tried out. I&#8217;d need to find a library yourself. Maybe <a href="http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/software.htm" rel="nofollow">this list of Open Source Clustering Algorithms</a> can help you?</p>
<p>[4] This part of the paper were findings of <a href="http://www.cs.technion.ac.il/~gbeg/index.html" rel="nofollow">Grigory Begelmann</a>. You have to ask him about these peculiarities..</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Niket Tandon</title>
		<link>http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-102658</link>
		<dc:creator>Niket Tandon</dc:creator>
		<pubDate>Mon, 10 Dec 2007 10:56:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2006/07/automated_tag_clustering.html#comment-102658</guid>
		<description>Hi

Is there any paper/article which tells us comparatively, which clustering algorithm is best suited for tag clustering?

Thanks,
Niket</description>
		<content:encoded><![CDATA[<p>Hi</p>
<p>Is there any paper/article which tells us comparatively, which clustering algorithm is best suited for tag clustering?</p>
<p>Thanks,<br />
Niket</p>
]]></content:encoded>
	</item>
</channel>
</rss>
