<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Tagsystems: performance tests</title>
	<atom:link href="http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html/feed" rel="self" type="application/rss+xml" />
	<link>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html</link>
	<description>Philipp Kellers weblog</description>
	<lastBuildDate>Wed, 18 Jan 2012 20:28:55 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: return</title>
		<link>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html/comment-page-2#comment-134393</link>
		<dc:creator>return</dc:creator>
		<pubDate>Fri, 01 Apr 2011 10:54:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html#comment-134393</guid>
		<description>Hello,
I was looking at the Toxi method, is there a CRUD system/GUI/idea already done to update the db correctly,somewhere?
Thanks in advance</description>
		<content:encoded><![CDATA[<p>Hello,<br />
I was looking at the Toxi method, is there a CRUD system/GUI/idea already done to update the db correctly,somewhere?<br />
Thanks in advance</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mahesh</title>
		<link>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html/comment-page-1#comment-134374</link>
		<dc:creator>Mahesh</dc:creator>
		<pubDate>Fri, 04 Feb 2011 12:52:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html#comment-134374</guid>
		<description>I think full text search is better ,for small scale web applications and maintain query cache like using doctrine like ORMs.

Because mostly people see the performance than the data redundancy. 

Nice comparison</description>
		<content:encoded><![CDATA[<p>I think full text search is better ,for small scale web applications and maintain query cache like using doctrine like ORMs.</p>
<p>Because mostly people see the performance than the data redundancy. </p>
<p>Nice comparison</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Philipp Keller</title>
		<link>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html/comment-page-1#comment-130806</link>
		<dc:creator>Philipp Keller</dc:creator>
		<pubDate>Sat, 30 Jan 2010 21:59:37 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html#comment-130806</guid>
		<description>@c-a: thanks for the hint, I&#039;ve corrected the link</description>
		<content:encoded><![CDATA[<p>@c-a: thanks for the hint, I&#8217;ve corrected the link</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: c-a</title>
		<link>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html/comment-page-1#comment-130804</link>
		<dc:creator>c-a</dc:creator>
		<pubDate>Mon, 18 Jan 2010 23:01:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html#comment-130804</guid>
		<description>«tags doesn’t map to sql at all. so use partial indexing.»[Joshua Schachter at Carson Summit]

FYI the link to Joshua is dead.

Thanks for all the useful information.</description>
		<content:encoded><![CDATA[<p>«tags doesn’t map to sql at all. so use partial indexing.»[Joshua Schachter at Carson Summit]</p>
<p>FYI the link to Joshua is dead.</p>
<p>Thanks for all the useful information.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pat</title>
		<link>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html/comment-page-1#comment-130720</link>
		<dc:creator>Pat</dc:creator>
		<pubDate>Mon, 08 Jun 2009 22:59:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html#comment-130720</guid>
		<description>only 2 years since last comment and 4 since the post. 

But at about the time of this post, I was working for LinkedIn.com   I ran some performance tests comparing &lt;a href=&quot;http://lucene.apache.org&quot; rel=&quot;nofollow&quot;&gt;Lucene&lt;/a&gt;, MySQL FULL Text, Oracle Full Text for searching people&#039;s profile. Hands down Lucene was the winner. 

Ever wonder why there is no obvious way to break a connection in LinkedIn? Its because the Lucene index is incrementally added to. Removing a connection from the search results is an expensive operation.

Of course things change -- would be interesting to see the results of 4 years worth of work on all three products.</description>
		<content:encoded><![CDATA[<p>only 2 years since last comment and 4 since the post. </p>
<p>But at about the time of this post, I was working for LinkedIn.com   I ran some performance tests comparing <a href="http://lucene.apache.org" rel="nofollow">Lucene</a>, MySQL FULL Text, Oracle Full Text for searching people&#8217;s profile. Hands down Lucene was the winner. </p>
<p>Ever wonder why there is no obvious way to break a connection in LinkedIn? Its because the Lucene index is incrementally added to. Removing a connection from the search results is an expensive operation.</p>
<p>Of course things change &#8212; would be interesting to see the results of 4 years worth of work on all three products.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peufeu</title>
		<link>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html/comment-page-1#comment-92353</link>
		<dc:creator>Peufeu</dc:creator>
		<pubDate>Mon, 15 Oct 2007 10:45:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html#comment-92353</guid>
		<description>Efficiently handling tags is similar to attributes in dating sites (ie. +blonde +tits -fat) except there are a lot more tags than profile attributes.

In order to extract any kind of acceptable performance from a SQL database, you will have to forget about LIKE (full table scan), and foreign keys (scuttle/toxi solution).

Basically you need a SQL database which supports one of the following :

- efficient star join support (that means Oracle), 
- Bitmap index support (coming up in Postgres), 
- efficient fulltext search support (ie Postgres)
- vectors (arrays) as column types and specific index methods to make boolean queries (ie. Postgres) on the values contained in said vectors.

MySQL is not part of the solution ; besides MySQL FULLTEXT is lucicrous.

A better solution may be to use a full text search engine. I tried Xapian and found that, on large data sets consisting of up to a million forum posts, it massively outperformed Postgresql&#039;s fulltext search, which itself massively outperformed MySQL&#039;s fulltext search. This can be used for tags, and obviously to search the articles&#039; full text. Obviously, Lucene is also a solution, however it is less user-friendly than Xapian (uses Java, bleh, hard to interface with Python for update scripts, etc).</description>
		<content:encoded><![CDATA[<p>Efficiently handling tags is similar to attributes in dating sites (ie. +blonde +tits -fat) except there are a lot more tags than profile attributes.</p>
<p>In order to extract any kind of acceptable performance from a SQL database, you will have to forget about LIKE (full table scan), and foreign keys (scuttle/toxi solution).</p>
<p>Basically you need a SQL database which supports one of the following :</p>
<p>- efficient star join support (that means Oracle),<br />
- Bitmap index support (coming up in Postgres),<br />
- efficient fulltext search support (ie Postgres)<br />
- vectors (arrays) as column types and specific index methods to make boolean queries (ie. Postgres) on the values contained in said vectors.</p>
<p>MySQL is not part of the solution ; besides MySQL FULLTEXT is lucicrous.</p>
<p>A better solution may be to use a full text search engine. I tried Xapian and found that, on large data sets consisting of up to a million forum posts, it massively outperformed Postgresql&#8217;s fulltext search, which itself massively outperformed MySQL&#8217;s fulltext search. This can be used for tags, and obviously to search the articles&#8217; full text. Obviously, Lucene is also a solution, however it is less user-friendly than Xapian (uses Java, bleh, hard to interface with Python for update scripts, etc).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: orderlord</title>
		<link>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html/comment-page-1#comment-87607</link>
		<dc:creator>orderlord</dc:creator>
		<pubDate>Wed, 26 Sep 2007 05:57:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html#comment-87607</guid>
		<description>What about benchmarks when the queries have ORDER BY (such as ORDER BY date).

For example, when a user wants to see all items with certain tags, sorted newest item first. How is performance then?</description>
		<content:encoded><![CDATA[<p>What about benchmarks when the queries have ORDER BY (such as ORDER BY date).</p>
<p>For example, when a user wants to see all items with certain tags, sorted newest item first. How is performance then?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ryan</title>
		<link>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html/comment-page-1#comment-87321</link>
		<dc:creator>Ryan</dc:creator>
		<pubDate>Tue, 25 Sep 2007 03:19:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html#comment-87321</guid>
		<description>It looks like the first commenter (&quot;Go back to school dude!
Rework the queries for the toxi schema and use JOINs.&quot;) never pursued his argument, and it looks like you never &quot;got&quot; what he was really saying. 

I think what he meant was that a query of this form:
SELECT ... WHERE name IN (&quot;word1&quot;, &quot;word2&quot;, &quot;word3&quot;)
HAVING ...

should instead be rewritten as:
SELECT ... WHERE name = &quot;word1&quot;
INNER JOIN
SELECT ... WHERE name = &quot;word2&quot;
INNER JOIN
SELECT ... WHERE name = &quot;word3&quot;

Same thing for the UNION query. Use actual UNIONS instead of &quot;WHERE name IN ...&quot;.

I think this is supposed to give you a good performnance boost.
Would you consider redoing your benchmarks on the TOXI schema using these queries above?</description>
		<content:encoded><![CDATA[<p>It looks like the first commenter (&#8220;Go back to school dude!<br />
Rework the queries for the toxi schema and use JOINs.&#8221;) never pursued his argument, and it looks like you never &#8220;got&#8221; what he was really saying. </p>
<p>I think what he meant was that a query of this form:<br />
SELECT &#8230; WHERE name IN (&#8220;word1&#8243;, &#8220;word2&#8243;, &#8220;word3&#8243;)<br />
HAVING &#8230;</p>
<p>should instead be rewritten as:<br />
SELECT &#8230; WHERE name = &#8220;word1&#8243;<br />
INNER JOIN<br />
SELECT &#8230; WHERE name = &#8220;word2&#8243;<br />
INNER JOIN<br />
SELECT &#8230; WHERE name = &#8220;word3&#8243;</p>
<p>Same thing for the UNION query. Use actual UNIONS instead of &#8220;WHERE name IN &#8230;&#8221;.</p>
<p>I think this is supposed to give you a good performnance boost.<br />
Would you consider redoing your benchmarks on the TOXI schema using these queries above?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Philipp Keller</title>
		<link>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html/comment-page-1#comment-73758</link>
		<dc:creator>Philipp Keller</dc:creator>
		<pubDate>Sun, 22 Jul 2007 14:39:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html#comment-73758</guid>
		<description>Geoff: Sure, go for the indexes! Have a look at http://www.pui.ch/phred/modules/tag_database_schemas.sql. I added indexes on tag.tagname, bookmark_tag.tag, bookmark_tag.bookmark and bookmark.url. If you can improve the performence by altering that indexes let me know.</description>
		<content:encoded><![CDATA[<p>Geoff: Sure, go for the indexes! Have a look at <a href="http://www.pui.ch/phred/modules/tag_database_schemas.sql" rel="nofollow">http://www.pui.ch/phred/modules/tag_database_schemas.sql</a>. I added indexes on tag.tagname, bookmark_tag.tag, bookmark_tag.bookmark and bookmark.url. If you can improve the performence by altering that indexes let me know.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Geoff</title>
		<link>http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html/comment-page-1#comment-71502</link>
		<dc:creator>Geoff</dc:creator>
		<pubDate>Thu, 12 Jul 2007 19:17:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html#comment-71502</guid>
		<description>This is probably a dumb question but here goes. If I opt for the toxi approach, is there any performance benefit to indexing any of the columns?

G</description>
		<content:encoded><![CDATA[<p>This is probably a dumb question but here goes. If I opt for the toxi approach, is there any performance benefit to indexing any of the columns?</p>
<p>G</p>
]]></content:encoded>
	</item>
</channel>
</rss>

