Java printf and Booleans

The available stdout mapping in Java, System.out (PrintStream), supports printf (to format a string when outputting it) in the same manner as other languages such as C. By using System.out.printf instead of System.out.print, you can also provide a formatting string which the method can use to write the output in a custom format.

As Java supports several other native datatypes that weren’t available as regular C-types, you also have a few other options. One of these are the %b / %B identifier, which represents a boolean value.

The format string “%b” expects one argument, a boolean. This will be written as either ‘true’ or ‘false. You would (as of writing) get the same result by using the %s identifier as this would cast the boolean of a string, which in turn would give true or false, but it’s always better to be explicit about what you are doing than assuming that people understand that you understood something (which even could change..).

Informa and getCategories Truncates Title at “/”

I stumbled across a weird issue in Informa and the ItemIF.getCategories method today. The categories we retrieve are separated with / to indicate their full hierarchy, but Informa only gave me the first part of the category (just “Properties” of “Properties / Houses”). The solution to this is to explicitly access the category element of the object itself:

String categoryDomain = item.getAttributeValue("category", "domain");
String categoryTitle = item.getElementValue("category");

This should be extended to support several category-elements, but as we only get one in our feeds, this solved the problem for us.

Informa and Custom XML Namespaces in RSS

While integrating a custom search application into a Java-based web application, I came across the need to access properties in custom namespaces through the Informa RSS library. Or to put it in another way; i needed to access to properties, Informa had been used for RSS parsing in the previous versions of the web application. The people who developed the original version of the application had decided to extend the Informa library into their own version, and had added several methods for .get<NameOfCustomProperty> etc. After thinking about this for approximately 2 seconds, I decided that having to support and modify a custom version of Informa was not the right track for us.

My initial thought was that their decision to customize Informa to support these methods had to come from the idea that Informa did not support custom namespaces out of the box. I did a few searchas over at Google, and found nothing useful. Reading through the documentation for Informa didn’t do me any good either, so I tried to find an alternative library instead. Did a bit of searching here too, and stumbled across a hit for one of the util classes for Informa (.. again). This did support custom namespaces, so the backend support was there at least. Then it struck me while reading the documentation for Informa and ChannelIF again; Informa did support it, as it inherited the methods from further up in the hierarchy. The getElementValue and getElementValues methods of the ChannelIF and ItemIF classes allows you to fetch the contents of elements with custom namespaces in a very easy to like manner.

System.out.println(item.getElementValue("exampleNS:field"));

This simply returns the string contained between <exampleNS:field> and </exampleNS:field>

Hoooray! We now have support for these additional fields, and we do not have to keep Informa manually in sync with the version in our application. Why the original developers decided to fork the Informa library to add their own properties I may never know, but I’ll update this post if they decide to step forward!

PHP Vikinger Notes

Just a few notes from PHP Vikinger which were arranged by Derick Rethans in Norway today. Things went mostly smoothly and people in general seemed to have a very good time. These are just some of the random notes I made during the sessions.

All in all it was a good unconference, with a friendly and laid back tone and hopefully people got what they came for. Next time I’ll try to prepare a simple presentation on some interesting and hopefully not too familiar topic and actually contribute something too. We drove from Halden and Fredrikstad to Skien in the morning and back in the evening, which worked out quite OK, except for .. well, the lack of sleep in the morning. But everyone survived and managed to stay awake, so I conclude that the trip was a great success.

To sum it all up: a banana is a fruit and a tomato is a berry. You probably had to be there for that one.

Thanks for the unconference, and hopefully I’ll be able to attend more events in the future too.

UPDATE: Derick also has a writeup online from PHP Vikinger.

Derick and Sebastian Readying a Presentation

Writing a Solr Analysis Filter Plugin

Update: If you’re writing a plugin for a Solr-version after 1.4.1 or Lucene 3.0+, you should be sure to read Updating a Solr Analysis Plugin to Lucene 4.0 as well. A few of the method calls used below has changed in the new API.

As we’ve been working on getting a better result out of the phonetic search we’re currently doing at derdubor, I started writing a plugin for Solr to be able to return better search results when searching for norwegian names. We’ve been using the standard phonetic filter from Solr 1.2 so far, using the double metaphone encoder for encoding a regular token as a phonetic value. The trouble with this is that a double metaphone value is four simple letters, which means that searchwords such as ‘trafikkontroll’ would get the same meaning as ‘Dyrvik’. The latter being a name and the first being a regular search string which would be better served through an article view. TRAFIKKONTROLL resolves to TRFK in double metaphone, while DYRVIK resolves to DRVK. T and D is considered similiar, as is V and F, and voilá, you’ve got yourself a match in the search result, but not a visual one (or a semantic one, as the words have very different meanings).

To solve this, I decided to write a custom filter plugin which we could tune to names that are in use in Norway. I’ll post about the logic behind my reasoning in regards to wording later and hopefully post the complete filter function we’re applying, but I’ll leave that for another post.

First you need a factory that’s able to produce filters when Solr asks for them:

NorwegianNameFilterFactory.java:

package no.derdubor.solr.analysis;

import java.util.Map;

import org.apache.solr.analysis.BaseTokenFilterFactory;
import org.apache.lucene.analysis.TokenStream;

public class NorwegianNameFilterFactory extends BaseTokenFilterFactory
{
    Map args;

    public Map getArgs()
    {
        return args;
    }

    public void init(Map args)
    {
        this.args = args;
    }

    public NorwegianNameFilter create(TokenStream input)
    {
        return new NorwegianNameFilter(input);
    }
}

To compile this example yourself, put the file in no/derdubor/solr/analysis/ (which matches no.derdubor.solr.analysis; in the package statement), and run

javac -6 no/derdubor/solr/analysis/NorwegianNameFilterFactory.java

(you’ll need apache-solr-core.jar and lucene-core.jar in your classpath to do this)

to compile it. You’ll of course also need the filter itself (which is returned from the create-method above):

package no.derdubor.solr.analysis;

import java.io.IOException;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;

public class NorwegianNameFilter extends TokenFilter
{
    public NorwegianNameFilter(TokenStream input)
    {
        super(input);
    }

    public Token next() throws IOException
    {
        return parseToken(this.input.next());
    }

    public Token next(Token result) throws IOException
    {
        return parseToken(this.input.next());
    }

    protected Token parseToken(Token in)
    {
        /* do magic stuff with in.termBuffer() here (a char[] which can be manipulated) */
        /* set the changed length of the new term with in.setTermLength(); before returning it */
        return in;
    }
}

You should now be able to compile both files:

javac -6 no/derdubor/solr/analysis/*.java

After compiling the plugin, create a jar file which contain your plugin. This will be the “distributable” version of your plugin, and should contain the .class-files of your application.

jar cvf derdubor-solr-norwegiannamefilter.jar no/derdubor/solr/analysis/*.class

Move the file you just created (derdubor-solr-norwegiannamefilter.jar in the example above) into your Solr home directory. This is where you keep your bin/ and conf/ directory (which contains schema.xml, etc). Create a lib directory in the solr home directory. This is where your custom libraries will live, so copy the file into this directory (lib/).

Restart Solr and check that everything still works as it should. If everything still seems normal, it’s time to enable your filter. In one of your <filter>-chains, you can simply append a <filter> element to insert your own filter into the chain:


    
    
    
    

Restart Solr again, and if everything still works as it should, you’re all set! Time to index some new data (remember that you’ll need to reindex the data for things to work as you expect, since no stored data is processed when you edit your configuration files) and commit it! Do a few searches through the admin interface to see that everything works as it should. I’ve used the “debug” option to .. well, debug .. my plugin while developing it. A very neat trick is to see what terms your filter expands to (if you set type=”query” in the analyzer section, it will be applied to all queries against that field), which will be shown in the first debug section when looking at the result (you’ll have to scroll down to the end to see this). If you need to debug things to a greater extend, you can attach a debugger or simply use the Good Old Proven Way of println! (these will end up in catalina.out in logs/ in your tomcat directory). Good luck!

Potential Problems and How To Solve Them

  • If you get an error about incompatible class versions, check that you’re actually running the same (or newer) version of the JVM (java -version) on your Solr search server that you use on your own development machine (use -5 to force 1.5 compatible class files instead of 1.6 when compiling).
  • If you get an error about missing config or something similiar, or that Solr is unable to find the method it’s searching for (generally triggered by an ReflectionException), remember to define your classes public! public class NorwegianNameFilter is your friend! It took at least half an hour until I realized what this simple issue was…

Any comments and followups are of course welcome!

JSTL, Taglibs and Capitalizing

After spending at least an hour of searching for an easy way of just capitalizing a string in a JSP, finding nothing and then starting on my own taglib do to the work, I finally found what I was looking for. The Jakarta String Taglib’s capitalize function! Amazingly enough, the documentation were not a good hit on Google, and it took quite some time to actually get the Google Fu correct. I stumbled across the documentation after searching for jakarta taglib (as JSTL is the Jakarta Standard TagLib) and rediscovered the other taglibs available from Apache Jakarta. Solved my problem in a second or two, and I hopefully never have to further endulge in writing my own taglib without keeping it as a separate JAR-file again.

Orto – A Java VM Implemented in Javascript

Meant to publish this earlier, but it’s just been sitting in a tab in my browser for a day now. Anyways, John Resig (hm, familiar domain name / real name combination there…) has a post about Orto from Japan, a compiler that transformers Java bytecode to JavaScript statements . The end result is Java applications that do not depend on Java, but can be run with just JavaScript available in a browser. Not quite sure what I’d want to use this for, but on a scale for awesome, this ranks pretty high! John’s page has screenshots and links to several more interesting sources about Orto and is well worth the read.

Solving UTF-8 Problems With Solr and Tomcat

Came across an issue with searching for UTF-8 characters in Solr today; the search worked just as it should (probably since we’re using a phonetic field to search), but our facets and limitations didn’t work as they should. This happened as soon as we had a value with an UTF-8 character (> 127 in ascii value), in our case the norwegian letters Æ, Ø or Å.

The solution was presented by Charlie Jackson at the Solr-user mailing list and is quite simply to add URIEncoding="UTF-8" to the appropriate connector in the Tomcat server.xml file. This is also documented on the Solr on Tomcat page in the Solr Wiki .

Using Solrj – A short guide to getting started with Solrj

As Solrj – The Java Interface for Solr – is slated for being released together with Solr 1.3, it’s time to take a closer look! Solrj is the preferred, easiest way of talking to a Solr server from Java (unless you’re using Embedded Solr). This way you get everything in a neat little package, and can avoid parsing and working with XML etc directly. Everything is tucked neatly away under a few classes, and since the web generally lacks a good example of how to use SolrJ, I’m going to share a small class I wrote for testing the data we were indexing at work. As Solr 1.2 is the currently most recent version available at apache.org, you’ll have to take a look at the Apache Solr Nightly Builds website and download the latest version. The documentation is also contained in the archive, so if you’re going to do any serious solrj development, this is the place to do it.

Oh well, enough of that, let’s cut to the chase. We start by creating a CommonsHttpSolrServer instance, which we provide with the URL of our Solr server as the only argument in the constructor. You may also provide your own parsers, but I’ll leave that for those who need it. I don’t. By default your Solr-installation is running on port 8080 and under the solr directory, but you’ll have to accomodate your own setup here. I’ve included the complete source file for download.

class SolrjTest
{
    public void query(String q)
    {
        CommonsHttpSolrServer server = null;

        try
        {
            server = new CommonsHttpSolrServer("http://localhost:8080/solr/");
        }
        catch(Exception e)
        {
            e.printStackTrace();
        }

The next thing we’re going to do is to actually create the query we’re about to ask the Solr server about, and this means building a SolrQuery object. We simply instanciate the object and then start to set the query values to what we’re looking for. The setQueryType call can be dropped to use the default QueryType-handler, but as we currently use dismax, this is what I’ve used here. You can then also turn on Facet-ing (to create navigators/facets) and add the fields you want for those.

        SolrQuery query = new SolrQuery();
        query.setQuery(q);
        query.setQueryType("dismax");
        query.setFacet(true);
        query.addFacetField("firstname");
        query.addFacetField("lastname");
        query.setFacetMinCount(2);
        query.setIncludeScore(true);

Then we simply query the server by calling server.query, which takes our parameters, build the query URL, sends it to the server and parses the response for us.

        try
        {
            QueryResponse qr = server.query(query);

This result can then be fetched by calling .getResults(); on the QueryResponse object; qr.

            SolrDocumentList sdl = qr.getResults();

We then output the information fetched in the query. You can change this to print all fields or other stuff, but as this is a simple application for searching a database of names, we just collect the first and last name of each entry and print them out. Before we do that, we print a small header containing information about the query, such as the number of elements found and which element we started on.

            System.out.println("Found: " + sdl.getNumFound());
            System.out.println("Start: " + sdl.getStart());
            System.out.println("Max Score: " + sdl.getMaxScore());
            System.out.println("--------------------------------");

            ArrayList> hitsOnPage = new ArrayList>();

            for(SolrDocument d : sdl)
            {
                HashMap values = new HashMap();

                for(Iterator> i = d.iterator(); i.hasNext(); )
                    Map.Entry e2 = i.next();
                    values.put(e2.getKey(), e2.getValue());
                }

                hitsOnPage.add(values);
                System.out.println(values.get("displayname") + " (" + values.get("displayphone") + ")");
            }

After this we output the facets and their information, just so you can see how you’d go about fetching this information from Solr too:

            List facets = qr.getFacetFields();

            for(FacetField facet : facets)
            {
                List facetEntries = facet.getValues();

                for(FacetField.Count fcount : facetEntries)
                {
                    System.out.println(fcount.getName() + ": " + fcount.getCount());
                }
            }
        }
        catch (SolrServerException e)
        {
            e.printStackTrace();
        }
    }

    public static void main(String[] args)
    {
        SolrjTest solrj = new SolrjTest();
        solrj.query(args[0]);
    }
}

And there you have it, a very simple application to just test the interface against Solr. You’ll need to add the jar-files from the lib/-directory in the solrj archive (and from the solr library itself) to compile and run the example.

Download: SolrTest.java