January 7th, 2010
As we’re now getting a bit more touchy and feely about 2010, I’ll give a short update on the blog itself like I did at the last day of 2008 (and even the title has been recycled! again! The big trend of 2009!).
Key statistics (last year’s number in paranthesis) (2008 was the first year I blogged, so 2009 is the first complete year of data):
- Posts: 54 (161)
- Comments: 81 (89)
- Hits: 399084 (242781)
- Unique: 30561 (18882) (since the start)
- Spam caught by akismet: 11828 (2596)
- Most spammed month: May, 4123 (November)
- Technorati Authority: 99 (13)
- Technorati Rank: 78.215 (454.532)
- Reactions registered on Technorati: no idea how to find this now! (51)
- Incoming links (to e-mats.org) on Google: 187 (29)
- Incoming links (to e-mats.org) on Google Blogsearch: 30 (23)
Most popular referers:
- google.com
- search.live.com
- google.co.uk
- google.de
- google.co.in
- google.ca
- google.fr
- en.blog.wordpress.com
- google.nl
- google.com.au
- google.pl
- google.se
At least two weren’t Google!
Stats I didn’t include last year, but decided to include this year (the numbers for last year in paranthesis as above) from Google Analytics:
- Page Views: 49.403 (21.997) (+125%)
- Visits: 38.537 (16.082) (+139%)
- Search Engine Referers: 85.4% (70.8%) (+20%)
Most popular search terms (and some of the most popular posts):
- solrj
- svn external
- ssh_exchange_identification: connection timed out
- solrj example
- tortoise svn externals
And then we look ahead for the same post in 2011. Keep your pants on!
Tags: 2010, new year, popular posts, statistics, The Blog Itself
Posted in The Blog Itself | No Comments »
April 23rd, 2008
While working with the facets for our search result earlier today, I came across the need to limit the search against solr on one specific field in addition to our regular search string (which we run against several fields with different weights). The situation was something like this:
- Lastname
- AggregateSearchField
- AggregatePhoneticSearchField
We do the searches against the AggregateSearchField and the AggregatePhoneticSearchField, where we weight the exact match higher than the phonetic matches. This ensure that the more specific matches are ranked higher than those that are merely similiar. We do this for several different field groupings, but that’s not revelant for this post, so let’s just assume that these are the three fields relevant. We search against two of them, and uses Lastname as a facet / navigator field to allow users to get more specific with their search.
However, while users should be allowed to get more specific with their search when selecting one of the facets, it should not change their regular search. And since the dismax handler will search through all the allowed field for a given value, you cannot just append Lastname:facetValue to the search string and be done with it (dismax does not support fielded searches through the regular query). After a bit of searching through our friends over at Google, I finally stumbled across the solution (which I of course should have seen on the Solr wiki): use the fq-parameter. This allows you to submit a “Filter Query” in your request, which will be used to further filter your existing query through another set of queries. This fits very neatly in with keeping your original query and then appending filter queries for each facet limitation that gets set.
A small code example for Solrj: (filterQueries is a HashMap<String, String> which contains the facets; filterQueries.put(“Lastname”, “Smith”) will add a limitation on the field “Lastname” being “Smith” (you might want to escape “-s in the facet values)):
-
if (filterQueries != null)
-
{
-
for (String q : filterQueries.keySet())
-
{
-
String value = filterQueries.get(q);
-
query.addFilterQuery(q + ":\"" + value + "\""); // this adds Lastname:"Smith" as a filter query
-
}
-
}
So now we can just parse the query string for valid facet limitations, and set the fields in the filterQueries HashMap accordingly. As we already have a list of facet fields to include, this is a simple as iterating that list and checking for the parameters in the request variables.
A great thank you to Mike Klaas in the dismax and Plone thread indexed by nabble.com that sent me in the right direction.
Posted in Uncategorized | 2 Comments »
April 17th, 2008
As Solrj – The Java Interface for Solr – is slated for being released together with Solr 1.3, it’s time to take a closer look! Solrj is the preferred, easiest way of talking to a Solr server from Java (unless you’re using Embedded Solr). This way you get everything in a neat little package, and can avoid parsing and working with XML etc directly. Everything is tucked neatly away under a few classes, and since the web generally lacks a good example of how to use SolrJ, I’m going to share a small class I wrote for testing the data we were indexing at work. As Solr 1.2 is the currently most recent version available at apache.org, you’ll have to take a look at the Apache Solr Nightly Builds website and download the latest version. The documentation is also contained in the archive, so if you’re going to do any serious solrj development, this is the place to do it.
Oh well, enough of that, let’s cut to the chase. We start by creating a CommonsHttpSolrServer instance, which we provide with the URL of our Solr server as the only argument in the constructor. You may also provide your own parsers, but I’ll leave that for those who need it. I don’t. By default your Solr-installation is running on port 8080 and under the solr directory, but you’ll have to accomodate your own setup here. I’ve included the complete source file for download.
-
class SolrjTest
-
{
-
public void query(String q)
-
{
-
CommonsHttpSolrServer server = null;
-
-
try
-
{
-
server = new CommonsHttpSolrServer("http://localhost:8080/solr/");
-
}
-
catch(Exception e)
-
{
-
e.printStackTrace();
-
}
The next thing we’re going to do is to actually create the query we’re about to ask the Solr server about, and this means building a SolrQuery object. We simply instanciate the object and then start to set the query values to what we’re looking for. The setQueryType call can be dropped to use the default QueryType-handler, but as we currently use dismax, this is what I’ve used here. You can then also turn on Facet-ing (to create navigators/facets) and add the fields you want for those.
-
SolrQuery query = new SolrQuery();
-
query.setQuery(q);
-
query.setQueryType("dismax");
-
query.setFacet(true);
-
query.addFacetField("firstname");
-
query.addFacetField("lastname");
-
query.setFacetMinCount(2);
-
query.setIncludeScore(true);
Then we simply query the server by calling server.query, which takes our parameters, build the query URL, sends it to the server and parses the response for us.
-
try
-
{
-
QueryResponse qr = server.query(query);
This result can then be fetched by calling .getResults(); on the QueryResponse object; qr.
-
SolrDocumentList sdl = qr.getResults();
We then output the information fetched in the query. You can change this to print all fields or other stuff, but as this is a simple application for searching a database of names, we just collect the first and last name of each entry and print them out. Before we do that, we print a small header containing information about the query, such as the number of elements found and which element we started on.
-
System.out.println("Found: " + sdl.getNumFound());
-
System.out.println("Start: " + sdl.getStart());
-
System.out.println("Max Score: " + sdl.getMaxScore());
-
System.out.println("——————————–");
-
-
ArrayList<HashMap<String, Object>> hitsOnPage = new ArrayList<HashMap<String, Object>>();
-
-
for(SolrDocument d : sdl)
-
{
-
HashMap<String, Object> values = new HashMap<String, Object>();
-
-
for(Iterator<Map.Entry<String, Object>> i = d.iterator(); i.hasNext(); )
-
Map.Entry<String, Object> e2 = i.next();
-
values.put(e2.getKey(), e2.getValue());
-
}
-
-
hitsOnPage.add(values);
-
System.out.println(values.get("displayname") + " (" + values.get("displayphone") + ")");
-
}
After this we output the facets and their information, just so you can see how you’d go about fetching this information from Solr too:
-
List facets = qr.getFacetFields();
-
-
for(FacetField facet : facets)
-
{
-
List facetEntries<FacetField.Count> = facet.getValues();
-
-
for(FacetField.Count fcount : facetEntries)
-
{
-
System.out.println(fcount.getName() + ": " + fcount.getCount());
-
}
-
}
-
}
-
catch (SolrServerException e)
-
{
-
e.printStackTrace();
-
}
-
}
-
-
public static void main(String[] args)
-
{
-
SolrjTest solrj = new SolrjTest();
-
solrj.query(args[0]);
-
}
-
}
And there you have it, a very simple application to just test the interface against Solr. You’ll need to add the jar-files from the lib/-directory in the solrj archive (and from the solr library itself) to compile and run the example.
Download: SolrTest.java
Tags: Java, lucene, Solr, solrj
Posted in Java, Solr | 19 Comments »
April 17th, 2008
While working with a view of a collection of documents returned from Solr using Solrj earlier today, I was attempting to write out the number of documents found in the search. In pure Java code you’d just request this by just calling .getNumFound() on the SolrDocumentList containing your documents, which whould also mean that they should be available through EL in JSTL by calling ${solrDocumentList.numFound} (which in turn calls getNumFound() in the SolrDocumentList object). The code in question was as simple as:
-
<c:out value="${solrDocumentList.numFound}"/>
Which resulted in this error message, which kind of came as a surprise:
java.lang.NumberFormatException: For input string: "numFound"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:447)
at java.lang.Integer.parseInt(Integer.java:497)
After digging around a bit and reading the error message yet again, it suddenly hit me: $solrDocumentList was being interpreted and casted to a List, and as such, EL expected an index into the List instead of my call to a function. I’ve not been working with JSTL for too long, so I thought a bit about how to solve this. One solution would be to do the calls in the Action and then just map them to separate variables in the template, but this wasn’t really as pretty as it could be. Instead I wrote a simple wrapper around the SolrDocumentList, which is not a list in itself, but exposes all the elements through it’s getDocumentList-method. That way we can access it in the template by calling ${solrDocumentList.documentList…}.
I’ve included the simple, simple wrapper here. It should be expanded with access to Facet fields etc, but this should be a simple indicator of my suggested solution.
-
public class SolrSearchResult
-
{
-
SolrDocumentList resultDocuments = null;
-
-
public SolrSearchResult(SolrDocumentList results)
-
{
-
this.resultDocuments = results;
-
}
-
-
public long getNumFound()
-
{
-
return this.resultDocuments.getNumFound();
-
}
-
-
public long getStart()
-
{
-
return this.resultDocuments.getStart();
-
}
-
-
public float getMaxScore()
-
{
-
return this.resultDocuments.getMaxScore();
-
}
-
-
public SolrDocumentList getDocumentList()
-
{
-
return this.resultDocuments;
-
}
-
-
public void setDocumentList(SolrDocumentList results)
-
{
-
this.resultDocuments = results;
-
}
-
}
Any comments and updates are of course as always welcome.
Posted in Uncategorized | No Comments »