Solving UTF-8 Problems With Solr and Tomcat

Came across an issue with searching for UTF-8 characters in Solr today; the search worked just as it should (probably since we’re using a phonetic field to search), but our facets and limitations didn’t work as they should. This happened as soon as we had a value with an UTF-8 character (> 127 in ascii value), in our case the norwegian letters Æ, Ø or Å.

The solution was presented by Charlie Jackson at the Solr-user mailing list and is quite simply to add URIEncoding="UTF-8" to the appropriate connector in the Tomcat server.xml file. This is also documented on the Solr on Tomcat page in the Solr Wiki .

Using Solrj – A short guide to getting started with Solrj

As Solrj – The Java Interface for Solr – is slated for being released together with Solr 1.3, it’s time to take a closer look! Solrj is the preferred, easiest way of talking to a Solr server from Java (unless you’re using Embedded Solr). This way you get everything in a neat little package, and can avoid parsing and working with XML etc directly. Everything is tucked neatly away under a few classes, and since the web generally lacks a good example of how to use SolrJ, I’m going to share a small class I wrote for testing the data we were indexing at work. As Solr 1.2 is the currently most recent version available at apache.org, you’ll have to take a look at the Apache Solr Nightly Builds website and download the latest version. The documentation is also contained in the archive, so if you’re going to do any serious solrj development, this is the place to do it.

Oh well, enough of that, let’s cut to the chase. We start by creating a CommonsHttpSolrServer instance, which we provide with the URL of our Solr server as the only argument in the constructor. You may also provide your own parsers, but I’ll leave that for those who need it. I don’t. By default your Solr-installation is running on port 8080 and under the solr directory, but you’ll have to accomodate your own setup here. I’ve included the complete source file for download.

class SolrjTest
{
    public void query(String q)
    {
        CommonsHttpSolrServer server = null;

        try
        {
            server = new CommonsHttpSolrServer("http://localhost:8080/solr/");
        }
        catch(Exception e)
        {
            e.printStackTrace();
        }

The next thing we’re going to do is to actually create the query we’re about to ask the Solr server about, and this means building a SolrQuery object. We simply instanciate the object and then start to set the query values to what we’re looking for. The setQueryType call can be dropped to use the default QueryType-handler, but as we currently use dismax, this is what I’ve used here. You can then also turn on Facet-ing (to create navigators/facets) and add the fields you want for those.

        SolrQuery query = new SolrQuery();
        query.setQuery(q);
        query.setQueryType("dismax");
        query.setFacet(true);
        query.addFacetField("firstname");
        query.addFacetField("lastname");
        query.setFacetMinCount(2);
        query.setIncludeScore(true);

Then we simply query the server by calling server.query, which takes our parameters, build the query URL, sends it to the server and parses the response for us.

        try
        {
            QueryResponse qr = server.query(query);

This result can then be fetched by calling .getResults(); on the QueryResponse object; qr.

            SolrDocumentList sdl = qr.getResults();

We then output the information fetched in the query. You can change this to print all fields or other stuff, but as this is a simple application for searching a database of names, we just collect the first and last name of each entry and print them out. Before we do that, we print a small header containing information about the query, such as the number of elements found and which element we started on.

            System.out.println("Found: " + sdl.getNumFound());
            System.out.println("Start: " + sdl.getStart());
            System.out.println("Max Score: " + sdl.getMaxScore());
            System.out.println("--------------------------------");

            ArrayList> hitsOnPage = new ArrayList>();

            for(SolrDocument d : sdl)
            {
                HashMap values = new HashMap();

                for(Iterator> i = d.iterator(); i.hasNext(); )
                    Map.Entry e2 = i.next();
                    values.put(e2.getKey(), e2.getValue());
                }

                hitsOnPage.add(values);
                System.out.println(values.get("displayname") + " (" + values.get("displayphone") + ")");
            }

After this we output the facets and their information, just so you can see how you’d go about fetching this information from Solr too:

            List facets = qr.getFacetFields();

            for(FacetField facet : facets)
            {
                List facetEntries = facet.getValues();

                for(FacetField.Count fcount : facetEntries)
                {
                    System.out.println(fcount.getName() + ": " + fcount.getCount());
                }
            }
        }
        catch (SolrServerException e)
        {
            e.printStackTrace();
        }
    }

    public static void main(String[] args)
    {
        SolrjTest solrj = new SolrjTest();
        solrj.query(args[0]);
    }
}

And there you have it, a very simple application to just test the interface against Solr. You’ll need to add the jar-files from the lib/-directory in the solrj archive (and from the solr library itself) to compile and run the example.

Download: SolrTest.java

Writing a Custom Validator for Zend_Form_Element

My good friend Christer has written a simple tutorial on how to write a custom validator for a Zend_Form_Element. If you’ve ever laid your hands on Zend_Form, you’ll want to have a look at this for a short and concise introduction to the topic. He’ll show you how to create a “repeat the password”-field by creating a custom validator and hooking it onto the original password field. Neat stuff.