<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mats Lindh</title>
	<atom:link href="http://e-mats.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://e-mats.org</link>
	<description>Where desperate is just another word for a regular day.</description>
	<lastBuildDate>Thu, 25 Jun 2009 09:12:09 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How To Make Solr Go 45% Faster</title>
		<link>http://e-mats.org/2009/06/how-to-make-solr-go-45-faster/</link>
		<comments>http://e-mats.org/2009/06/how-to-make-solr-go-45-faster/#comments</comments>
		<pubDate>Thu, 25 Jun 2009 09:10:03 +0000</pubDate>
		<dc:creator>Mats</dc:creator>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[cache]]></category>
		<category><![CDATA[documentcache]]></category>
		<category><![CDATA[filtercache]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[queryresultcache]]></category>

		<guid isPermaLink="false">http://e-mats.org/?p=536</guid>
		<description><![CDATA[
		
		
		
		If you&#8217;re still looking for a good reason to spend a few minutes tuning your SOLR caches (documentCache, filterCache and queryResultCache), I&#8217;ll give you two numbers:

avgTimePerRequest : 126.148822
avgTimePerRequest : 70.026436

The first is with the default cache settings, the latter is with a very small change. Yep. That&#8217;s a 45% speed increase. So, the interesting question [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 42px; padding-right: 10px; margin: 0 0 0 10px;">
		<script type="text/javascript">
		<!--
		digg_url = "http://e-mats.org/2009/06/how-to-make-solr-go-45-faster/";
		digg_bgcolor = "";
		digg_skin = "";
		digg_window = "";
		digg_title = "How+To+Make+Solr+Go+45%25+Faster";
		digg_media = "";
		digg_topic = "";
		digg_bodytext = "";
		//-->
		</script>
		<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></div><p>If you&#8217;re still looking for a good reason to spend a few minutes tuning your SOLR caches (documentCache, filterCache and queryResultCache), I&#8217;ll give you two numbers:</p>
<pre>
avgTimePerRequest : 126.148822
avgTimePerRequest : 70.026436
</pre>
<p>The first is with the default cache settings, the latter is with a very small change. Yep. That&#8217;s a 45% speed increase. So, the interesting question is what Iactually changed in the cache configuration &#8211; although I should warn you, the answer is very, very, very complicated:</p>
<p>The cache size. The default size (at least for our current 1.3 installation) is to keep 512 elements in the cache. When someone on the solr-user list asked for an introduction to what the different cache statistics meant, I remembered that I hadn&#8217;t actually tweaked the settings at all. The SOLR server has been running for a year now, so we now have a quite good idea of how it will perform and what kind of queries we are seeing. The stats indicated that a lot more cached entries got evicted than what I were hoping to see, and this gave us a lower cache hit rate (about 50%).</p>
<p>The simple change was to increase the size of the cache (from 512 to 16384), so that we&#8217;re able to keep more documents in memory before evicting them. After running 24 hours with the new setup we&#8217;re now seeing cache hits as 99%, 68% and 67%. The relevant sections of the solrconfig.xml file are:</p>
<div class="geshi no xml">
<ol>
<li class="li1">
<div class="de1"><span class="sc3"><span class="re1">&lt;filterCache</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">class</span>=<span class="st0">&quot;solr.LRUCache&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">size</span>=<span class="st0">&quot;16384&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">initialSize</span>=<span class="st0">&quot;4096&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">autowarmCount</span>=<span class="st0">&quot;4096&quot;</span> <span class="re2">/&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1"><span class="sc3"><span class="re1">&lt;queryResultCache</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">class</span>=<span class="st0">&quot;solr.LRUCache&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">size</span>=<span class="st0">&quot;16384&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">initialSize</span>=<span class="st0">&quot;4096&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">autowarmCount</span>=<span class="st0">&quot;4096&quot;</span> <span class="re2">/&gt;</span></span></div>
</li>
<li class="li1">
<div class="de1"><span class="sc3"><span class="re1">&lt;documentCache</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">class</span>=<span class="st0">&quot;solr.LRUCache&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">size</span>=<span class="st0">&quot;16384&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">initialSize</span>=<span class="st0">&quot;4096&quot;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; <span class="re0">autowarmCount</span>=<span class="st0">&quot;4096&quot;</span> <span class="re2">/&gt;</span></span></div>
</li>
</ol>
</div>
<p>The document cache fills about 4 times as fast as the filter cache, so we might have to tweak the settings further by suiting it even better to our load pattern. </p>
<h2>So what now?</h2>
<p>The next step would be to try to change to the <a href="http://wiki.apache.org/solr/SolrCaching">FastLRUCache</a> which is included with Solr 1.4 (currently in SVN and nightlies). If my memory serves me right the changes are mostly related to locking, so I&#8217;m not sure if we&#8217;ll see any significant better performance.</p>
<p>We&#8217;ll also make further adjustments to the size of each of the caches to better match our usage.</p>
]]></content:encoded>
			<wfw:commentRss>http://e-mats.org/2009/06/how-to-make-solr-go-45-faster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Solr Becoming Slow After a While</title>
		<link>http://e-mats.org/2009/06/solr-becoming-slow-after-a-while/</link>
		<comments>http://e-mats.org/2009/06/solr-becoming-slow-after-a-while/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 12:25:30 +0000</pubDate>
		<dc:creator>Mats</dc:creator>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[disk space]]></category>
		<category><![CDATA[indexing]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[slowdown]]></category>

		<guid isPermaLink="false">http://e-mats.org/2009/06/solr-becoming-slow-after-a-while/</guid>
		<description><![CDATA[
		
		
		
		This is perhaps the most obvious and &#8220;not very helpful&#8221; post for quite a few people, but for those who experience this issue, it&#8217;ll save the day. While doing a test index routine of around 6 million documents, things would get really slow at the moment I passed 1 million documents in the index. Weird. [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 42px; padding-right: 10px; margin: 0 0 0 10px;">
		<script type="text/javascript">
		<!--
		digg_url = "http://e-mats.org/2009/06/solr-becoming-slow-after-a-while/";
		digg_bgcolor = "";
		digg_skin = "";
		digg_window = "";
		digg_title = "Solr+Becoming+Slow+After+a+While";
		digg_media = "";
		digg_topic = "";
		digg_bodytext = "";
		//-->
		</script>
		<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></div><p>This is perhaps the most obvious and &#8220;not very helpful&#8221; post for quite a few people, but for those who experience this issue, it&#8217;ll save the day. While doing a test index routine of around 6 million documents, things would get really slow at the moment I passed 1 million documents in the index. Weird. Optimizing didn&#8217;t help, as it died with an exception after a while.</p>
<p>The reason?</p>
<p>Not enough free disk space. Solr was indexing to a different partition than I thought.</p>
<p>Solved everything.</p>
]]></content:encoded>
			<wfw:commentRss>http://e-mats.org/2009/06/solr-becoming-slow-after-a-while/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Shell Script For Submitting Documents to Solr</title>
		<link>http://e-mats.org/2009/06/shell-script-for-submitting-documents-to-solr/</link>
		<comments>http://e-mats.org/2009/06/shell-script-for-submitting-documents-to-solr/#comments</comments>
		<pubDate>Wed, 03 Jun 2009 13:58:18 +0000</pubDate>
		<dc:creator>Mats</dc:creator>
				<category><![CDATA[Solr]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[curl]]></category>
		<category><![CDATA[scripts]]></category>
		<category><![CDATA[shell]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://e-mats.org/?p=528</guid>
		<description><![CDATA[
		
		
		
		Here&#8217;s a small shell script I&#8217;m using to submit pre-made XML documents to Solr. The documents are usually produce by some other program, before being submitted to the Solr server. This way we submit all the files in an active directory to the server (here all the files in the documents directory (relative to the [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 42px; padding-right: 10px; margin: 0 0 0 10px;">
		<script type="text/javascript">
		<!--
		digg_url = "http://e-mats.org/2009/06/shell-script-for-submitting-documents-to-solr/";
		digg_bgcolor = "";
		digg_skin = "";
		digg_window = "";
		digg_title = "Shell+Script+For+Submitting+Documents+to+Solr";
		digg_media = "";
		digg_topic = "";
		digg_bodytext = "";
		//-->
		</script>
		<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></div><p>Here&#8217;s a small shell script I&#8217;m using to submit pre-made XML documents to Solr. The documents are usually produce by some other program, before being submitted to the Solr server. This way we submit all the files in an active directory to the server (here all the files in the documents directory (relative to the location of the script) will be submitted) .</p>
<p>You&#8217;ll have to update the URL and the directory (documents) below. We usually group together 1.000 documents in a single file, so the commit happens for every thousand documents. If you use autocommit in Solr, you can remove that line. This script requires CURL to talk to the Solr server.</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="re2">URL=</span>http:<span class="sy0">//</span>localhost:<span class="nu0">8080</span><span class="sy0">/</span>solr<span class="sy0">/</span>update</div>
</li>
<li class="li1">
<div class="de1"><span class="kw3">cd</span> documents <span class="sy0">||</span> <span class="kw3">exit</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp;</div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">for</span> i <span class="kw1">in</span> $<span class="br0">&#40;</span> <span class="kw2">ls</span> <span class="br0">&#41;</span>; <span class="kw1">do</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw2">cat</span> <span class="re1">$i</span> <span class="sy0">|</span> curl -X POST -H <span class="st0">&#39;Content-Type: text/xml&#39;</span> -d <span class="sy0">@</span>- <span class="re1">$URL</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; curl <span class="re1">$URL</span> -H <span class="st0">&quot;Content-Type: text/xml&quot;</span> &#8211;data-binary <span class="st0">&#39;&lt;commit waitFlush=&quot;false&quot; waitSearcher=&quot;false&quot;/&gt;&#39;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw3">echo</span> item: <span class="re1">$i</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">done</span></div>
</li>
</ol>
</div>
]]></content:encoded>
			<wfw:commentRss>http://e-mats.org/2009/06/shell-script-for-submitting-documents-to-solr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Making Solr Requests with urllib2 in Python</title>
		<link>http://e-mats.org/2009/05/making-solr-requests-with-urllib2-in-python/</link>
		<comments>http://e-mats.org/2009/05/making-solr-requests-with-urllib2-in-python/#comments</comments>
		<pubDate>Sat, 30 May 2009 11:12:54 +0000</pubDate>
		<dc:creator>Mats</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[http]]></category>
		<category><![CDATA[urllib]]></category>
		<category><![CDATA[urllib2]]></category>
		<category><![CDATA[xml]]></category>

		<guid isPermaLink="false">http://e-mats.org/?p=524</guid>
		<description><![CDATA[
		
		
		
		When making XML requests to Solr (A fulltext document search engine) for indexing, committing, updating or deleting documents, the request is submitted as an HTTP POST containg an XML document to the server. urllib2 supports submitting POST data by using the second parameter to the urlopen() call:



f = urllib2.urlopen&#40;&#34;http://example.com/&#34;, &#34;key=value&#34;&#41;



The first attempt involved simply adding [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 42px; padding-right: 10px; margin: 0 0 0 10px;">
		<script type="text/javascript">
		<!--
		digg_url = "http://e-mats.org/2009/05/making-solr-requests-with-urllib2-in-python/";
		digg_bgcolor = "";
		digg_skin = "";
		digg_window = "";
		digg_title = "Making+Solr+Requests+with+urllib2+in+Python";
		digg_media = "";
		digg_topic = "";
		digg_bodytext = "";
		//-->
		</script>
		<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></div><p>When making XML requests to Solr (A fulltext document search engine) for indexing, committing, updating or deleting documents, the request is submitted as an HTTP POST containg an XML document to the server. urllib2 supports submitting POST data by using the second parameter to the urlopen() call:</p>
<div class="geshi no python">
<ol>
<li class="li1">
<div class="de1">f = <span class="kw3">urllib2</span>.<span class="me1">urlopen</span><span class="br0">&#40;</span><span class="st0">&quot;http://example.com/&quot;</span>, <span class="st0">&quot;key=value&quot;</span><span class="br0">&#41;</span></div>
</li>
</ol>
</div>
<p>The first attempt involved simply adding the XML data as the second parameter, but that made the Solr Webapp return a &#8220;400 &#8211; Bad Request&#8221; error. The reason for Solr barfing is that the urlopen() function sets the <code>Content-Type</code> to <code>application/x-www-form-urlencoded</code>. We can solve this by changing the <code>Content-Type</code> header:</p>
<div class="geshi no python">
<ol>
<li class="li1">
<div class="de1">solrReq = <span class="kw3">urllib2</span>.<span class="me1">Request</span><span class="br0">&#40;</span>updateURL, <span class="st0">&#39;&lt;commit waitFlush=&quot;false&quot; waitSearcher=&quot;false&quot;/&gt;&#39;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">solrReq.<span class="me1">add_header</span><span class="br0">&#40;</span><span class="st0">&quot;Content-Type&quot;</span>, <span class="st0">&quot;text/xml&quot;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">solrPoster = <span class="kw3">urllib2</span>.<span class="me1">urlopen</span><span class="br0">&#40;</span>solrReq<span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">response = solrPoster.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">solrPoster.<span class="me1">close</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
</li>
</ol>
</div>
<p>Other XML-based Solr requests, such as adding and removing documents from the index, will also work by changing the <code>Content-Type</code> header.</p>
<p>The same code will also allow you to use urllib to submit SOAP, XML-RPC-requests and use other protocols that require you to change the complete POST body of the request.</p>
]]></content:encoded>
			<wfw:commentRss>http://e-mats.org/2009/05/making-solr-requests-with-urllib2-in-python/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Making WPtouch and WP Super Cache Play Together</title>
		<link>http://e-mats.org/2009/05/making-wptouch-and-wp-super-cache-play-together/</link>
		<comments>http://e-mats.org/2009/05/making-wptouch-and-wp-super-cache-play-together/#comments</comments>
		<pubDate>Sat, 30 May 2009 08:51:04 +0000</pubDate>
		<dc:creator>Mats</dc:creator>
				<category><![CDATA[Wordpress]]></category>
		<category><![CDATA[plugins]]></category>
		<category><![CDATA[wp super cache]]></category>
		<category><![CDATA[wpsupercache]]></category>
		<category><![CDATA[wptouch]]></category>

		<guid isPermaLink="false">http://e-mats.org/?p=520</guid>
		<description><![CDATA[
		
		
		
		I installed a plugin for Wordpress on the blog yesterday after getting a tip from Morten Røvik about WPtouch. This is a plugin which provides a custom theme for all your visitors that are using mobile devices, such as the iphone and the blackberry line of products. 
The problem is that I&#8217;m already running WP [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 42px; padding-right: 10px; margin: 0 0 0 10px;">
		<script type="text/javascript">
		<!--
		digg_url = "http://e-mats.org/2009/05/making-wptouch-and-wp-super-cache-play-together/";
		digg_bgcolor = "";
		digg_skin = "";
		digg_window = "";
		digg_title = "Making+WPtouch+and+WP+Super+Cache+Play+Together";
		digg_media = "";
		digg_topic = "";
		digg_bodytext = "";
		//-->
		</script>
		<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></div><p>I installed a plugin for Wordpress on the blog yesterday after getting a tip from <a href="http://blogg.senson.no/">Morten Røvik</a> about <a href="http://www.bravenewcode.com/wptouch/">WPtouch</a>. This is a plugin which provides a custom theme for all your visitors that are using mobile devices, such as the iphone and the blackberry line of products. </p>
<p>The problem is that I&#8217;m already running WP Super Cache, a caching plugin that writes all the pages wordpress renders to static HTML files, and this conflicts with plugins that want to change the theme of the page on the fly. After a bit of searching (the WPtouch page mentions WP Cache, the plugin that WP Super Cache builds on) I decided to see if WP Super Cache supports the same exceptions based on user agent that WP Cache does, and lo and behold: WP Super Cache has a configuration setting just for this!</p>
<ol>
<li>Log in to Wordpress</li>
<li>Select &#8220;WP Super Cache&#8221; under &#8220;Settings&#8221;</li>
<li>Select &#8220;Mobile device support. Plugin will enter &#8220;Half-On&#8221; mode.&#8221; in the settings list</li>
<li>Save settings</li>
<li>Delete the contents of the WP Super Cache by clicking &#8220;Delete Cache&#8221; in the WP Super Cache settings page (IMPORTANT if you&#8217;re going to test if things are working!).</li>
</ol>
<p>The &#8220;Half-On&#8221; mode means that WP Super Cache caches the files with a small block of PHP code at the beginning instead of the pure HTML files, so that it can check the user agent of the client before deciding which files to return to the user. This is a performance hit (although much smaller than leaving things uncached), so if you suddenly end up with a very, very large amount of hits in a short time, switch the plugin back to full mode.</p>
]]></content:encoded>
			<wfw:commentRss>http://e-mats.org/2009/05/making-wptouch-and-wp-super-cache-play-together/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adding SVN Revision to a Configuration File</title>
		<link>http://e-mats.org/2009/05/adding-svn-revision-to-a-configuration-file/</link>
		<comments>http://e-mats.org/2009/05/adding-svn-revision-to-a-configuration-file/#comments</comments>
		<pubDate>Fri, 29 May 2009 19:19:53 +0000</pubDate>
		<dc:creator>Mats</dc:creator>
				<category><![CDATA[Hacks]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Scalability]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[configuration]]></category>
		<category><![CDATA[revision]]></category>
		<category><![CDATA[sed]]></category>
		<category><![CDATA[subversion]]></category>
		<category><![CDATA[svn]]></category>
		<category><![CDATA[version number]]></category>

		<guid isPermaLink="false">http://e-mats.org/?p=514</guid>
		<description><![CDATA[
		
		
		
		After a while you realize that the best way to serve almost-never-changing content is to give the content an expire date way ahead in the future. The allows your server and your network pipes to do more sensible stuff than delivering the same old versions of files again and again and again and again. 
A [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 42px; padding-right: 10px; margin: 0 0 0 10px;">
		<script type="text/javascript">
		<!--
		digg_url = "http://e-mats.org/2009/05/adding-svn-revision-to-a-configuration-file/";
		digg_bgcolor = "";
		digg_skin = "";
		digg_window = "";
		digg_title = "Adding+SVN+Revision+to+a+Configuration+File";
		digg_media = "";
		digg_topic = "";
		digg_bodytext = "";
		//-->
		</script>
		<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></div><p>After a while you realize that the best way to serve almost-never-changing content is to give the content an expire date way ahead in the future. The allows your server and your network pipes to do more sensible stuff than delivering the same old versions of files again and again and again and again. </p>
<p>A problem does however surface when you want to update the files and make the visiting user request the new version instead of the old. The trick here is to change the URL for the resource, so that the browser requests the new file. You can do this by appending a version number to the file and either rewriting it behind the scenes to the original file, or by appending a timestamp (or some other item) to the URL as a GET value. The web server ignores this for regular files, but as it identifies a new unique resource, the web browser has to request it again and use the new and improved &#8482; file.</p>
<p>Using the timestamp of the file is a bit cumbersome and requires you to hit the disk one additional time each time you&#8217;re going to show an URL to one of the almost-static resources, but luckily we already have an identifier describing which version the file is in: the SVN revision number (.. if you use subversion, that is). You could use the SVN revision for each file by itself, but we usually decide that the global version number for SVN is good enough. This means that each time you update the live code base through svn up or something like that (remember to block .svn directories and their files if you run your production directory from a SVN branch. This can be discussed over and over, but I&#8217;m growing more and more fond of actually doing just that..). To avoid having to call svnversion each time, it&#8217;s useful to be able to insert the current revision number into the configuration file for the application (or a header file / bootstrap file).</p>
<p>Here&#8217;s an example of how you can insert the current SVN revision into a config file for a PHP application.</p>
<ol>
<li>Create a backup of the current configuration file.</li>
<li>Update the current revision through svn up.</li>
<li>Retrieve the current revision number from svnversion.</li>
<li>Insert the revision number using sed into a temporary copy of the configuration file.</li>
<li>Move the new configuration file into place as the current configuration file.</li>
<li>Party like it&#8217;s 1999!</li>
</ol>
<p>This assumes that you use an array named $config in your configuration file. I suggest that you name it something else, but for simplicity I&#8217;m going with that here. First, create a $config['svn'] entry in your config file. If you have some other naming scheme, you&#8217;re going to have to change the relevant parts below. </p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="co0">#!/bin/bash</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw2">cp</span> .<span class="sy0">/</span>config<span class="sy0">/</span>config.php .<span class="sy0">/</span>config<span class="sy0">/</span>config.backup.php</div>
</li>
<li class="li1">
<div class="de1">svn up</div>
</li>
<li class="li1">
<div class="de1"><span class="re2">VERSION=</span>`svnversion .`</div>
</li>
<li class="li1">
<div class="de1"><span class="kw3">echo</span> <span class="re1">$VERSION</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw2">sed</span> <span class="st0">&quot;s/config<span class="es0">\[</span>&#39;svn&#39;<span class="es0">\]</span> = &#39;[0-9M]*&#39;;/config<span class="es0">\[</span>&#39;svn&#39;<span class="es0">\]</span> = &#39;$VERSION&#39;;/&quot;</span> <span class="sy0">&lt;</span> .<span class="sy0">/</span>config<span class="sy0">/</span>config.php <span class="sy0">&gt;</span> .<span class="sy0">/</span>config<span class="sy0">/</span>config.fixed.php</div>
</li>
<li class="li1">
<div class="de1"><span class="kw2">mv</span> .<span class="sy0">/</span>config<span class="sy0">/</span>config.fixed.php .<span class="sy0">/</span>config<span class="sy0">/</span>config.php</div>
</li>
</ol>
</div>
<p>Save this into a file named upgrade.sh, make it executable by doing <code>chmod u+x upgrade.sh</code> and run it by typing <code>./upgrade.sh</code>.</p>
<p>And this is where you put your hands above your head and wave them about. When you&#8217;re done with that, you can refer to your current SVN revision using $config['svn'] in your PHP application (preferrably in your template or where you build the URLs to your static resources). Simply append ?v=$config['svn'] to your current filenames. When you have a new version available, run ./upgrade.sh (or whatever name you gave the script) again and let your users enjoy the new experience.</p>
]]></content:encoded>
			<wfw:commentRss>http://e-mats.org/2009/05/adding-svn-revision-to-a-configuration-file/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Results of Our Recent Python Competition</title>
		<link>http://e-mats.org/2009/05/506/</link>
		<comments>http://e-mats.org/2009/05/506/#comments</comments>
		<pubDate>Fri, 29 May 2009 09:13:26 +0000</pubDate>
		<dc:creator>Mats</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[competition]]></category>
		<category><![CDATA[kodehjelp]]></category>

		<guid isPermaLink="false">http://e-mats.org/?p=506</guid>
		<description><![CDATA[
		
		
		
		Last week we had yet another competition where the goal is to create the smallest program that solves a particular problem. This time the problem to solve was a simple XML parsing routine with a few extra rules to make the parsing itself easier to implement (The complete rule set). This time python was chosen [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 42px; padding-right: 10px; margin: 0 0 0 10px;">
		<script type="text/javascript">
		<!--
		digg_url = "http://e-mats.org/2009/05/506/";
		digg_bgcolor = "";
		digg_skin = "";
		digg_window = "";
		digg_title = "The+Results+of+Our+Recent+Python+Competition";
		digg_media = "";
		digg_topic = "";
		digg_bodytext = "";
		//-->
		</script>
		<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></div><p>Last week we had yet another competition where the goal is to create the smallest program that solves a particular problem. This time the problem to solve was a simple XML parsing routine with a few extra rules to make the parsing itself easier to implement (<a href="http://e-mats.org/resources/kodehjelp_0x0002.txt">The complete rule set</a>). This time <a href="http://www.python.org/">python</a> was chosen as the required language of the submissions.</p>
<p>The winning contribution from Helge:</p>
<div class="geshi no python">
<ol>
<li class="li1">
<div class="de1"><span class="kw1">from</span> <span class="kw3">sys</span> <span class="kw1">import</span> stdin</div>
</li>
<li class="li1">
<div class="de1">p=<span class="nu0">0</span></div>
</li>
<li class="li1">
<div class="de1">s=<span class="kw2">str</span>.<span class="me1">split</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">for</span> a <span class="kw1">in</span> s<span class="br0">&#40;</span>stdin.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span>,<span class="st0">&#39;&lt;&#39;</span><span class="br0">&#41;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp;a=s<span class="br0">&#40;</span>a,<span class="st0">&#39;&gt;&#39;</span><span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="sy0">;</span>b=a.<span class="me1">strip</span><span class="br0">&#40;</span><span class="st0">&#39;/&#39;</span><span class="br0">&#41;</span><span class="sy0">;</span>p-=a<span class="sy0">&lt;</span>b</div>
</li>
<li class="li1">
<div class="de1">&nbsp;<span class="kw1">if</span><span class="st0">&#39;@&#39;</span><span class="sy0">&lt;</span>a:<span class="kw1">print</span><span class="st0">&#39; &#39;</span><span class="sy0">*</span>p<span class="sy0">*</span><span class="nu0">4</span>+s<span class="br0">&#40;</span>b<span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><span class="sy0">;</span>p+=a==b</div>
</li>
</ol>
</div>
<p>The contribution from Tobias:</p>
<div class="geshi no python">
<ol>
<li class="li1">
<div class="de1"><span class="kw1">from</span> <span class="kw3">sys</span> <span class="kw1">import</span> stdin</div>
</li>
<li class="li1">
<div class="de1">i=stdin.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">s=x=t=<span class="nu0">0</span></div>
</li>
<li class="li1">
<div class="de1">k=i.<span class="me1">find</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">while</span> x<span class="sy0">&lt;</span>len<span class="br0">&#40;</span>i<span class="br0">&#41;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; <span class="kw1">if</span> i<span class="br0">&#91;</span>x<span class="br0">&#93;</span>==<span class="st0">&quot;&lt;&quot;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> i<span class="br0">&#91;</span>x:x<span class="nu0">+4</span><span class="br0">&#93;</span>==<span class="st0">&quot;&lt;!&#8211;&quot;</span>:x=k<span class="br0">&#40;</span><span class="st0">&quot;&#8211;&gt;&quot;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">elif</span> i<span class="br0">&#91;</span>x<span class="nu0">+1</span><span class="br0">&#93;</span>==<span class="st0">&quot;/&quot;</span>:s-=<span class="nu0">1</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; u=<span class="nu0">0</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">while</span> u<span class="sy0">&lt;</span>s:<span class="kw1">print</span> <span class="st0">&quot; &quot;</span><span class="sy0">*</span><span class="nu0">4</span>,<span class="sy0">;</span>u+=<span class="nu0">1</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> i<span class="br0">&#91;</span>k<span class="br0">&#40;</span><span class="st0">&quot;&gt;&quot;</span>,x<span class="br0">&#41;</span><span class="nu0">-1</span><span class="br0">&#93;</span>==<span class="st0">&quot;/&quot;</span>:t=<span class="nu0">1</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">else</span>:t=<span class="nu0">0</span><span class="sy0">;</span>s+=<span class="nu0">1</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> i<span class="br0">&#91;</span>x<span class="nu0">+1</span>:k<span class="br0">&#40;</span><span class="st0">&quot;&gt;&quot;</span>,x<span class="br0">&#41;</span>-t<span class="br0">&#93;</span>.<span class="me1">strip</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; x+=<span class="nu0">1</span></div>
</li>
</ol>
</div>
<p>The contribution from Harald:</p>
<div class="geshi no python">
<ol>
<li class="li1">
<div class="de1"><span class="kw1">from</span> <span class="kw3">sys</span> <span class="kw1">import</span> stdin</div>
</li>
<li class="li1">
<div class="de1">l=stdin.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
</li>
<li class="li1">
<div class="de1">e,p,c,x=<span class="nu0">0</span>,<span class="nu0">0</span>,<span class="nu0">0</span>,<span class="nu0">0</span></div>
</li>
<li class="li1">
<div class="de1">r=<span class="st0">&quot;&quot;</span></div>
</li>
<li class="li1">
<div class="de1"><span class="kw1">for</span> i <span class="kw1">in</span> l:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">if</span> l<span class="br0">&#91;</span>e:e<span class="nu0">+2</span><span class="br0">&#93;</span>==<span class="st0">&#39;]&gt;&#39;</span><span class="kw1">or</span> l<span class="br0">&#91;</span>e:e<span class="nu0">+2</span><span class="br0">&#93;</span>==<span class="st0">&#39;-&gt;&#39;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c=<span class="nu0">0</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">if</span> l<span class="br0">&#91;</span>e:e<span class="nu0">+2</span><span class="br0">&#93;</span>==<span class="st0">&#39;&lt;!&#39;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c=<span class="nu0">1</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">if</span> p <span class="kw1">and</span> i==<span class="st0">&#39; &#39;</span><span class="kw1">or</span> i==<span class="st0">&#39;/&#39;</span><span class="kw1">or</span> i==<span class="st0">&#39;&gt;&#39;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;p=<span class="nu0">0</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">if</span> i==<span class="st0">&#39;/&#39;</span> <span class="kw1">and</span> l<span class="br0">&#91;</span>e<span class="nu0">+1</span><span class="br0">&#93;</span>==<span class="st0">&#39;&gt;&#39;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;x-=<span class="nu0">1</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">if</span> p <span class="kw1">and</span> <span class="kw1">not</span> c:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;r+=i</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">if</span> <span class="kw1">not</span> c <span class="kw1">and</span> i==<span class="st0">&#39;&lt;&#39;</span><span class="kw1">and</span> l<span class="br0">&#91;</span>e<span class="nu0">+1</span><span class="br0">&#93;</span><span class="sy0">!</span>=<span class="st0">&#39;/&#39;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;r+=<span class="st0">&quot;<span class="es0">\n</span>&quot;</span>+<span class="br0">&#40;</span><span class="st0">&#39; &#39;</span><span class="sy0">*</span><span class="nu0">4</span><span class="br0">&#41;</span><span class="sy0">*</span>x</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;x+=<span class="nu0">1</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;p=<span class="nu0">1</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">if</span> i==<span class="st0">&#39;&lt;&#39;</span><span class="kw1">and</span> l<span class="br0">&#91;</span>e<span class="nu0">+1</span><span class="br0">&#93;</span>==<span class="st0">&#39;/&#39;</span>:</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;x-=<span class="nu0">1</span></div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp;e+=<span class="nu0">1</span></div>
</li>
</ol>
</div>
<p>If any of the contributors want to provide a better description of their solutions, feel free to leave a comment!</p>
<p>Thanks to all the participants!</p>
]]></content:encoded>
			<wfw:commentRss>http://e-mats.org/2009/05/506/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Keyboard Not Working in Xorg After Booting Ubuntu</title>
		<link>http://e-mats.org/2009/05/keyboard-not-working-after-booting-ubuntu/</link>
		<comments>http://e-mats.org/2009/05/keyboard-not-working-after-booting-ubuntu/#comments</comments>
		<pubDate>Thu, 28 May 2009 11:28:09 +0000</pubDate>
		<dc:creator>Mats</dc:creator>
				<category><![CDATA[Ubuntu]]></category>
		<category><![CDATA[dbus]]></category>
		<category><![CDATA[hal]]></category>
		<category><![CDATA[keyboard]]></category>
		<category><![CDATA[mouse]]></category>
		<category><![CDATA[problems]]></category>
		<category><![CDATA[troubleshooting]]></category>

		<guid isPermaLink="false">http://e-mats.org/?p=503</guid>
		<description><![CDATA[
		
		
		
		I&#8217;ve had a weird issue a couple of times on my work computer, where the keyboard and the mouse does not respond in Xorg after rebooting. As I only reboot my work computer every 80 days or so, I tend to forget the reason why it happens between each boot sequence. 
The reason why the [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 42px; padding-right: 10px; margin: 0 0 0 10px;">
		<script type="text/javascript">
		<!--
		digg_url = "http://e-mats.org/2009/05/keyboard-not-working-after-booting-ubuntu/";
		digg_bgcolor = "";
		digg_skin = "";
		digg_window = "";
		digg_title = "Keyboard+Not+Working+in+Xorg+After+Booting+Ubuntu";
		digg_media = "";
		digg_topic = "";
		digg_bodytext = "";
		//-->
		</script>
		<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></div><p>I&#8217;ve had a weird issue a couple of times on my work computer, where the keyboard and the mouse does not respond in Xorg after rebooting. As I only reboot my work computer every 80 days or so, I tend to forget the reason why it happens between each boot sequence. </p>
<p>The reason why the mouse and keyboard does not work after rebooting at my computer is that HAL or DBUS failed to start. I&#8217;ve not dug further into this issue, as it doesn&#8217;t happen very often. The solution:</p>
<p>(you can switch to a text console by pressing ctrl+alt+f1, your keyboard will work there)</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="sy0">/</span>etc<span class="sy0">/</span>init.d<span class="sy0">/</span>dbus start</div>
</li>
<li class="li1">
<div class="de1"><span class="sy0">/</span>etc<span class="sy0">/</span>init.d<span class="sy0">/</span>hal start</div>
</li>
</ol>
</div>
<p>Restart X / GDM:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="sy0">/</span>etc<span class="sy0">/</span>init.d<span class="sy0">/</span>gdm restart</div>
</li>
</ol>
</div>
<p>Switch back to the Xorg terminal (alt+f7) and hopefully your keyboard and mouse will yet again work!</p>
]]></content:encoded>
			<wfw:commentRss>http://e-mats.org/2009/05/keyboard-not-working-after-booting-ubuntu/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Modifying a Lucene Snowball Stemmer</title>
		<link>http://e-mats.org/2009/05/modifying-a-lucene-snowball-stemmer/</link>
		<comments>http://e-mats.org/2009/05/modifying-a-lucene-snowball-stemmer/#comments</comments>
		<pubDate>Wed, 27 May 2009 19:01:31 +0000</pubDate>
		<dc:creator>Mats</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Solr]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[snowball]]></category>
		<category><![CDATA[stemmers]]></category>
		<category><![CDATA[stemming]]></category>

		<guid isPermaLink="false">http://e-mats.org/?p=484</guid>
		<description><![CDATA[
		
		
		
		This post is written for advanced users. If you do not know what SVN (Subversion) is or if you&#8217;re not ready to get your hands dirty, there might be something more interesting to read on Wikipedia. As usual. This is an introduction to how to get a Lucene development environment running, a Solr environment and [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 42px; padding-right: 10px; margin: 0 0 0 10px;">
		<script type="text/javascript">
		<!--
		digg_url = "http://e-mats.org/2009/05/modifying-a-lucene-snowball-stemmer/";
		digg_bgcolor = "";
		digg_skin = "";
		digg_window = "";
		digg_title = "Modifying+a+Lucene+Snowball+Stemmer";
		digg_media = "";
		digg_topic = "";
		digg_bodytext = "";
		//-->
		</script>
		<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></div><p>This post is written for advanced users. If you do not know what SVN (Subversion) is or if you&#8217;re not ready to get your hands dirty, there might be something more interesting to read on <a href="http://en.wikipedia.org/">Wikipedia</a>. As usual. This is an introduction to how to get a Lucene development environment running, a Solr environment and lastly, to create your own Snowball stemmer. Read on if that seems interesting. The receipe for regenerating the Snowball stemmer (I&#8217;ll get back to that&#8230;) assumes that you&#8217;re running Linux. Please leave a comment if you&#8217;ve generated the stemmer class under another operating system.</p>
<p>When indexing data in Lucene (a fulltext document search library) and Solr (which uses Lucene), you may provide a stemmer (a piece of code responsible for &#8220;normalizing&#8221; words to their common form (horses => horse, indexing => index, etc)) to give your users better and more relevant results when they search. The default stemmer in Lucene and Solr uses a library named <a href="http://snowball.tartarus.org/">Snowball</a> which was created to do just this kind of thing. Snowball uses a small definition language of its own to generate parsers that other applications can embed to provide proper stemming.</p>
<p>By using Snowball Lucene is able to provide a nice collection of default stemmers for several languages, and these work as they should for most selections. I did however have an issue with the Norwegian stemmer, as it ignores a complete category of words where the base form end in the same letters as plural versions of other words. An example:</p>
<p>one: elektriker<br />
several: elektrikere<br />
those: elektrikerene</p>
<p>The base form is &#8220;elektriker&#8221;, while &#8220;elektrikere&#8221; and &#8220;elektrikerene&#8221; are plural versions of the same word (the word means &#8220;electrician&#8221;, btw).</p>
<p>Lets compare this to another word, such as &#8220;Bus&#8221;:</p>
<p>one: buss<br />
several: busser<br />
those: bussene</p>
<p>Here the base form is &#8220;buss&#8221;, while the two other are plural. Lets apply the same rules to all six words:</p>
<p>buss => buss<br />
busser => buss [strips "er"]<br />
bussene => buss [strips "ene"]</p>
<p>elektrikerene => &#8220;elektriker&#8221; [strips "ene"]<br />
elektrikere => &#8220;elektriker&#8221; [strips "e"]</p>
<p>So far everything has gone as planned. We&#8217;re able to search for &#8216;elektrikerene&#8217; and get hits that say &#8216;elektrikere&#8217;, just as planned. All is not perfect, though. We&#8217;ve forgotten one word, and evil forces will say that I forgot it on purpose:</p>
<p>elektriker => ?</p>
<p>The problem is that &#8220;elektriker&#8221; (which is the single form of the word) ends in -er. The rule defined for a word in the class of &#8220;buss&#8221; says that -er should be stripped (and this is correct for the majority of words). The result then becomes:</p>
<p>elektriker => &#8220;elektrik&#8221; [strips "er"]<br />
elektrikere => &#8220;elektriker&#8221; [strips "e"]<br />
elektrikerene => &#8220;elektriker&#8221; [strips "ene"]</p>
<p>As you can see, there&#8217;s a mismatch between the form that the plurals gets chopped down to and the singular word. </p>
<p>My solution, while not perfect in any way, simply adds a few more terms so that we&#8217;re able to strip all these words down to the same form:</p>
<p>elektriker => &#8220;elektrik&#8221; [strips "er"]<br />
elektrikere => &#8220;elektrik&#8221; [strips "ere"]<br />
elektrikerene => &#8220;elektrik&#8221; [strips "erene"]</p>
<p>I decided to go this route as it&#8217;s a lot easier than building a large selection of words where no stemming should be performed. It might give us a few false positives, but the most important part is that it provides the same results for the singular and plural versions of the same word. When the search results differ for such basic items, the user gets a real &#8220;WTF&#8221; moment, especially when the two plural versions of the word is considered identical.</p>
<p>To solve this problem we&#8217;re going to change the Snowball parser and build a new version of the stemmer that we can use in Lucene and Solr.</p>
<h3>Getting Snowball</h3>
<p>To generate the Java class that Lucene uses when attempting to stem a phrase (such as the NorwegianStemmer, EnglishStemmer, etc), you&#8217;ll need the <a href="http://snowball.tartarus.org/">Snowball</a> distribution. This distribution also includes example stemming algorithms (which have been used to generate the current stemmers in Lucene).</p>
<p>You&#8217;ll need to download the application from <a href="http://snowball.tartarus.org/download.php">the snowball download page</a> &#8211; in particular the &#8220;Snowball, algorithms and libstemmer library&#8221; version [<a href="http://snowball.tartarus.org/dist/snowball_code.tgz">direct link</a>].</p>
<p>After extracting the file you&#8217;ll have a directory named <code>snowball_code</code>, which contains among other files the <code>snowball</code> binary and a directory named <code>algorithms</code>. The algorithms-directory keeps all the different default stemmers, and this is where you&#8217;ll find a good starting point for the changes you&#8217;re about to do.</p>
<p>But first, we&#8217;ll make sure we have the development version of Lucene installed and ready to go.</p>
<h3>Getting Lucene</h3>
<p>You can check out the current SVN trunk of Lucene by doing:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1">svn checkout http:<span class="sy0">//</span>svn.apache.org<span class="sy0">/</span>repos<span class="sy0">/</span>asf<span class="sy0">/</span>lucene<span class="sy0">/</span>java<span class="sy0">/</span>trunk lucene<span class="sy0">/</span>java<span class="sy0">/</span>trunk</div>
</li>
</ol>
</div>
<p>This will give you the bleeding edge version of Lucene available for a bit of toying around. If you decide to build Solr 1.4 from SVN (as we&#8217;ll do further down), you do not have to build Lucene 2.9 from SVN &#8211; as it already is included pre-built. </p>
<p>If you need to build the complete version of Lucene (and all contribs), you can do that by moving into the Lucene trunk:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="kw3">cd</span> lucene<span class="sy0">/</span>java<span class="sy0">/</span>trunk<span class="sy0">/</span></div>
</li>
<li class="li1">
<div class="de1">ant dist <span class="br0">&#40;</span>this will also create .<span class="kw2">zip</span> and .tgz distributions<span class="br0">&#41;</span></div>
</li>
</ol>
</div>
<p>If you already have Lucene 2.9 (.. or whatever version you&#8217;re on when you&#8217;re reading this), you can get by with just compiling the snowball contrib to Lucene, from lucene/java/trunk/:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="kw3">cd</span> contrib<span class="sy0">/</span>snowball<span class="sy0">/</span></div>
</li>
<li class="li1">
<div class="de1">ant jar</div>
</li>
</ol>
</div>
<p>This will create (if everything works as it should) a file named <code>lucene-snowball-2.9-dev.jar</code> (.. or another version number, depending on your version of Lucene). The file will be located in a sub directory of the build directory on the root of the lucene checkout (.. and the path will be shown after you&#8217;ve run ant jar): <code>lucene/java/trunk/build/contrib/snowball/</code>.</p>
<p>If you got the lucene-snowball-2.9-dev.jar file compiled, things are looking good! Let&#8217;s move on getting the bleeding edge version of Solr up and running (if you have an existing Solr version that you&#8217;re using and do not want to upgrade, skip the following steps .. but be sure to know what you&#8217;re doing .. which coincidentally you also should be knowing if you&#8217;re building stuff from SVN as we are. Oh the joy!).</p>
<h3>Getting Solr</h3>
<p>Getting and building Solr from SVN is very straight forward. First, check it out from Subversion:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1">svn <span class="kw2">co</span> http:<span class="sy0">//</span>svn.apache.org<span class="sy0">/</span>repos<span class="sy0">/</span>asf<span class="sy0">/</span>lucene<span class="sy0">/</span>solr<span class="sy0">/</span>trunk<span class="sy0">/</span> solr<span class="sy0">/</span>trunk<span class="sy0">/</span></div>
</li>
</ol>
</div>
<p>And then simply build the war file for your favourite container:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="kw3">cd</span> solr<span class="sy0">/</span>trunk<span class="sy0">/</span></div>
</li>
<li class="li1">
<div class="de1">ant dist</div>
</li>
</ol>
</div>
<p>Voilá &#8211; you should now have a apache-solr-1.4-dev.war (or something similiar) in the build/ directory. You can test that this works by replacing your regular solr installation (.. make a backup first..) and restarting your application server.</p>
<h3>Editing the stemmer definition</h3>
<p>After extracting the snowball distribution, you&#8217;re left with a <code>snowball_code</code> directory, which contains <code>algorithms</code> and then <code>norwegian</code> (in addition to several other stemmer languages). My example here expands the definition used in the norwegian stemmer, but the examples will work with all the included stemmers.</p>
<p>Open up one of the files (I chose the iso-8859-1 version, but I might have to adjust this to work for UTF-8/16 later. I&#8217;ll try to post an update in regards to that) and take a look around. The snowball language is interesting, and you can find more information about it at<br />
<a href="http://snowball.tartarus.org/">the Snowball site</a>.</p>
<p>I&#8217;ll not include a complete dump of the stemming definition here, but the interesting part (for what we&#8217;re attempting to do) is the main_suffix function:</p>
<div class="geshi no snowball">
<ol>
<li class="li1">
<div class="de1">define main_suffix as (
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; setlimit tomark p1 for ([substring])
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; among(
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &#39;a&#39; &#39;e&#39; &#39;ede&#39; &#39;ande&#39; &#39;ende&#39; &#39;ane&#39; &#39;ene&#39; &#39;hetene&#39; &#39;en&#39; &#39;heten&#39; &#39;ar&#39; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &#39;er&#39; &#39;heter&#39; &#39;as&#39; &#39;es&#39; &#39;edes&#39; &#39;endes&#39; &#39;enes&#39; &#39;hetenes&#39; &#39;ens&#39;
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &#39;hetens&#39; &#39;ers&#39; &#39;ets&#39; &#39;et&#39; &#39;het&#39; &#39;ast&#39;
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (delete)
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &#39;s&#39;
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (s_ending or (&#39;k&#39; non-v) delete)
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &#39;erte&#39; &#39;ert&#39;
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (&lt;-&#39;er&#39;)
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; )
</div>
</li>
<li class="li1">
<div class="de1">)</div>
</li>
</ol>
</div>
<p>This simply means that for any word ending in any of the suffixes in the three first lines will be deleted (given by the (delete) command behind the definitions). The problem provided our example above is that neither of the lines will capture an &#8220;ere&#8221; ending or &#8220;erene&#8221; &#8211; which we&#8217;ll need to actually solve the problem.</p>
<p>We simply add them to the list of defined endings:</p>
<div class="geshi no snowball">
<ol>
<li class="li1">
<div class="de1">&nbsp; &nbsp; among(
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &#8230; &#39;hetene&#39; &#39;en&#39; &#39;heten&#39; &#39;ar&#39; &#39;ere&#39; &#39;erene&#39; &#39;eren&#39;
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &#8230;
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &#8230;
</div>
</li>
<li class="li1">
<div class="de1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (delete)</div>
</li>
</ol>
</div>
<p>I made sure to add the definitions before the shorter versions (such as &#8216;er&#8217;), but I&#8217;m not sure (.. I don&#8217;t think) if it actually is required.</p>
<p>Save the file under a new file name so you still have the old stemmers available.</p>
<h3>Compiling a New Version of the Snowball Stemmer</h3>
<p>After editing and saving your stemmer, it&#8217;s now time to generate the Java class that Lucene will use to generate it base forms of the words. After extracting the snowball archive, you should have a binary file named <code>snowball</code> in the <code>snowball_code</code> directory. If you simply run this file with <code>snowball_code</code> as your current working directory:</p>
<pre>
./snowball
</pre>
<p>You&#8217;ll get a list of options that Snowball can accept when generating the stemmer class. We&#8217;re only going to use three of them:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1">-j<span class="br0">&#91;</span>ava<span class="br0">&#93;</span> Tell Snowball that we want to generate a Java class</div>
</li>
<li class="li1">
<div class="de1">-n<span class="br0">&#91;</span>ame<span class="br0">&#93;</span> Tell Snowball the name of the class we want generated</div>
</li>
<li class="li1">
<div class="de1">-o <span class="sy0">&amp;</span>lt;filename<span class="sy0">&amp;</span>gt; The filename of the output <span class="kw2">file</span>. No extension.</div>
</li>
</ol>
</div>
<p>So to compile our NorwegianExStemmer from our modified file, we run:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1">.<span class="sy0">/</span>snowball algorithms<span class="sy0">/</span>norwegian<span class="sy0">/</span>stem2_ISO_8859_1.sbl -j -n NorwegianExStemmer -o NorwegianExStemmer</div>
</li>
</ol>
</div>
<p>(pardon the excellent file name stem2&#8230;). This will give you one new file in the current working directory: <code>NorwegianExStemmer.java</code>! We&#8217;ve actually built a stemming class! Woohoo! (You may do a few dance moves here. I&#8217;ll wait.)</p>
<p>We&#8217;re now going to insert the new class into the Lucene contrib .jar-file.</p>
<h3>Rebuild the Lucene JAR Library</h3>
<p>Copy the new class file into the version of Lucene you checked out from SVN:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="kw2">cp</span> NorwegianExStemmer.java <span class="sy0">&lt;</span>lucenetrunk<span class="sy0">&gt;/</span>contrib<span class="sy0">/</span>snowball<span class="sy0">/</span>src<span class="sy0">/</span>java<span class="sy0">/</span>org<span class="sy0">/</span>tartaru<span class="sy0">/</span>snowball<span class="sy0">/</span>ext</div>
</li>
</ol>
</div>
<p>Then we simply have to rebuild the .jar file containing all the stemmers:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="kw3">cd</span> <span class="sy0">&lt;</span>lucenetrunk<span class="sy0">&gt;/</span>contrib<span class="sy0">/</span>snowball<span class="sy0">/</span></div>
</li>
<li class="li1">
<div class="de1">ant jar</div>
</li>
</ol>
</div>
<p>This will create <code>lucene-snowball-2.9-dev.jar</code> in <code>&lt;lucenetrunk&gt;/build/contrib/</code>. You now have a library containing your stemmer (and all the other default stemmers from Lucene)!</p>
<p>The last part is simply getting the updated stemmer library into Solr, and this will be a simple copy and rebuild:</p>
<h3>Inserting the new Lucene Library Into Solr</h3>
<p>From the build/contrib directory in Lucene, copy the jar file into the lib/ directory of Solr:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="kw2">cp</span> lucene-snowball<span class="nu0">-2.9</span>-dev.jar <span class="sy0">&lt;</span>solrtrunk<span class="sy0">&gt;</span>lib<span class="sy0">/</span></div>
</li>
</ol>
</div>
<p>Be sure to overwrite any existing files (.. and if you have another version of Lucene in Solr, do a complete rebuild and replace all the Lucene related files in Solr). Rebuild Solr:</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="kw3">cd</span> <span class="sy0">&lt;</span>solrtrunk<span class="sy0">&gt;</span></div>
</li>
<li class="li1">
<div class="de1">ant dist</div>
</li>
</ol>
</div>
<p>Copy the new <code>apache-solr-1.4-dev.war</code> (check the correct name in the directory yourself) from the <code>build/</code> directory in Solr to your application servers home as solr.war (.. if you use another name, use that). This is webapps/ if you&#8217;re using Tomcat. Remember to back up the old .war file, just to be sure you can restore everything if you&#8217;ve borked something.</p>
<h3>Add Your New Stemmer In schema.xml</h3>
<p>After compiling and packaging the stemmer, it&#8217;s time to tell Solr that it should use the newly created stemmer. Remember that a stemmer works both when indexing and querying, so we&#8217;re going to need to reindex our collection after implementing a new stemmer.</p>
<p>The usual place to add the stemmer is the definition of your text fields under the &lt;analyzer&gt;-sections for index and query (remember to change it BOTH places!!):</p>
<div class="geshi no xml">
<ol>
<li class="li1">
<div class="de1"><span class="sc3"><span class="re1">&lt;filter</span> <span class="re0">class</span>=<span class="st0">&quot;solr.SnowballPorterFilterFactory&quot;</span> <span class="re0">language</span>=<span class="st0">&quot;NorwegianEx&quot;</span> <span class="re2">/&gt;</span></span></div>
</li>
</ol>
</div>
<p>Change NorwegianEx into the name of your class (without the Stemmer-part, Lucene adds that for you automagically. After changing both locations (or more if you have custom datatypes and indexing or query steps).</p>
<h3>Restart Application Server and Reindex!</h3>
<p>If you&#8217;re using Tomcat as your application server this might simply be (depending on your setup and distribution):</p>
<div class="geshi no bash">
<ol>
<li class="li1">
<div class="de1"><span class="kw3">cd</span> <span class="sy0">/</span>path<span class="sy0">/</span>to<span class="sy0">/</span>tomcat<span class="sy0">/</span>bin</div>
</li>
<li class="li1">
<div class="de1">.<span class="sy0">/</span>shutdown.<span class="kw2">sh</span></div>
</li>
<li class="li1">
<div class="de1">.<span class="sy0">/</span>startup.<span class="kw2">sh</span></div>
</li>
</ol>
</div>
<p>Please consult the documentation for your application server for information about how to do a proper restart.</p>
<p>After you&#8217;ve restarted the application server, you&#8217;re going to need to reindex your collection before everything works as planned. You can however check that your stemmer works as you&#8217;ve planned already at this stage. Log into the Solr admin interface, select the extended / advanced query view, enter your query (which should now be stemmed in another way than before), check the &#8220;debug&#8221; box and submit your search. The resulting XML document will show you the resulting of your query in the parsedquery element.</p>
<h3>Download the Generated Stemmer</h3>
<p>If you&#8217;re just looking for an improved stemmer for norwegian words (with the very, very simple changes outlined above, and which might give problems when concerned with UTF-8 (.. please leave a comment if that&#8217;s the case)), you can simply download <a href="/uploads/NorwegianExStemmer.java">NorwegianExStemmer.java</a>. Follow the guide above for adding it to your Lucene / Solr installation.</p>
<p>Please leave a comment if something is confusing or if you want free help. Send me an email if you&#8217;re looking for a consultant.</p>
]]></content:encoded>
			<wfw:commentRss>http://e-mats.org/2009/05/modifying-a-lucene-snowball-stemmer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Content License Change</title>
		<link>http://e-mats.org/2009/05/content-license-change/</link>
		<comments>http://e-mats.org/2009/05/content-license-change/#comments</comments>
		<pubDate>Tue, 19 May 2009 12:06:10 +0000</pubDate>
		<dc:creator>Mats</dc:creator>
				<category><![CDATA[The Blog Itself]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[Writing]]></category>
		<category><![CDATA[cc-by]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[Creative Commons]]></category>
		<category><![CDATA[license]]></category>

		<guid isPermaLink="false">http://e-mats.org/?p=481</guid>
		<description><![CDATA[
		
		
		
		Just a friendly reminder that I&#8217;ve now changed the license of the content on this blog to a much more friendly Creative Commons-based license, namely the &#8220;Do what the hell you want, but remember to link back and tell people who wrote it&#8221;. I&#8217;ve been using the license for the majority of my photos during [...]]]></description>
			<content:encoded><![CDATA[<div style="float: right; width: 42px; padding-right: 10px; margin: 0 0 0 10px;">
		<script type="text/javascript">
		<!--
		digg_url = "http://e-mats.org/2009/05/content-license-change/";
		digg_bgcolor = "";
		digg_skin = "";
		digg_window = "";
		digg_title = "Content+License+Change";
		digg_media = "";
		digg_topic = "";
		digg_bodytext = "";
		//-->
		</script>
		<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></div><p>Just a friendly reminder that I&#8217;ve now changed the license of the content on this blog to a much more friendly Creative Commons-based license, namely the &#8220;Do what the hell you want, but remember to link back and tell people who wrote it&#8221;. I&#8217;ve been using the license for the majority of my photos during the last years, so it&#8217;s a natural evolution. Have fun!</p>
]]></content:encoded>
			<wfw:commentRss>http://e-mats.org/2009/05/content-license-change/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 2.725 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2009-07-03 23:05:00 -->
<!-- Compression = gzip -->