As Solrj – The Java Interface for Solr – is slated for being released together with Solr 1.3, it’s time to take a closer look! Solrj is the preferred, easiest way of talking to a Solr server from Java (unless you’re using Embedded Solr). This way you get everything in a neat little package, and can avoid parsing and working with XML etc directly. Everything is tucked neatly away under a few classes, and since the web generally lacks a good example of how to use SolrJ, I’m going to share a small class I wrote for testing the data we were indexing at work. As Solr 1.2 is the currently most recent version available at apache.org, you’ll have to take a look at the Apache Solr Nightly Builds website and download the latest version. The documentation is also contained in the archive, so if you’re going to do any serious solrj development, this is the place to do it.
Oh well, enough of that, let’s cut to the chase. We start by creating a CommonsHttpSolrServer instance, which we provide with the URL of our Solr server as the only argument in the constructor. You may also provide your own parsers, but I’ll leave that for those who need it. I don’t. By default your Solr-installation is running on port 8080 and under the solr directory, but you’ll have to accomodate your own setup here. I’ve included the complete source file for download.
class SolrjTest
{
public void query(String q)
{
CommonsHttpSolrServer server = null;
try
{
server = new CommonsHttpSolrServer("http://localhost:8080/solr/");
}
catch(Exception e)
{
e.printStackTrace();
}
The next thing we’re going to do is to actually create the query we’re about to ask the Solr server about, and this means building a SolrQuery object. We simply instanciate the object and then start to set the query values to what we’re looking for. The setQueryType call can be dropped to use the default QueryType-handler, but as we currently use dismax, this is what I’ve used here. You can then also turn on Facet-ing (to create navigators/facets) and add the fields you want for those.
SolrQuery query = new SolrQuery();
query.setQuery(q);
query.setQueryType("dismax");
query.setFacet(true);
query.addFacetField("firstname");
query.addFacetField("lastname");
query.setFacetMinCount(2);
query.setIncludeScore(true);
Then we simply query the server by calling server.query, which takes our parameters, build the query URL, sends it to the server and parses the response for us.
try
{
QueryResponse qr = server.query(query);
This result can then be fetched by calling .getResults(); on the QueryResponse object; qr.
SolrDocumentList sdl = qr.getResults();
We then output the information fetched in the query. You can change this to print all fields or other stuff, but as this is a simple application for searching a database of names, we just collect the first and last name of each entry and print them out. Before we do that, we print a small header containing information about the query, such as the number of elements found and which element we started on.
System.out.println("Found: " + sdl.getNumFound());
System.out.println("Start: " + sdl.getStart());
System.out.println("Max Score: " + sdl.getMaxScore());
System.out.println("--------------------------------");
ArrayList> hitsOnPage = new ArrayList>();
for(SolrDocument d : sdl)
{
HashMap values = new HashMap();
for(Iterator> i = d.iterator(); i.hasNext(); )
Map.Entry e2 = i.next();
values.put(e2.getKey(), e2.getValue());
}
hitsOnPage.add(values);
System.out.println(values.get("displayname") + " (" + values.get("displayphone") + ")");
}
After this we output the facets and their information, just so you can see how you’d go about fetching this information from Solr too:
List facets = qr.getFacetFields();
for(FacetField facet : facets)
{
List facetEntries = facet.getValues();
for(FacetField.Count fcount : facetEntries)
{
System.out.println(fcount.getName() + ": " + fcount.getCount());
}
}
}
catch (SolrServerException e)
{
e.printStackTrace();
}
}
public static void main(String[] args)
{
SolrjTest solrj = new SolrjTest();
solrj.query(args[0]);
}
}
And there you have it, a very simple application to just test the interface against Solr. You’ll need to add the jar-files from the lib/-directory in the solrj archive (and from the solr library itself) to compile and run the example.
Download: SolrTest.java
This is great, thanks for posting this… I don’t suppose you know how to add content to the index… I’m digging through the API but there is no documentation (until the release of 1.3 I presume).
Thanks again…
Adding content to the index can be performed by simply POST-ing a suitable XML document to the index by using a regular HTTP POST. You can see this in the regular Solr Tutorial: http://lucene.apache.org/solr/tutorial.html
If you want to use SolrJ for this, there is a very, very simple example in the Solrj Wiki now, check out:
http://wiki.apache.org/solr/Solrj#head-0adf51b414cbf44c692bcadad4b12326df56d298
Nice guide! I have a newbie question though; how do I add the required .jar files when I compile?
Simply provide them together with the -classpath directive to javac and java, or set the CLASSPATH environment variable.
If you’re using an IDE like Netbeans or Eclipse, you can add the libraries by right clicking on your project and selecting add -> library (or something like that, it’s been a while since I added things manually).
Hope that helps!
Mats,
Thanks for the tutorial!
Question: are the CommonsHttpSolrServer and the EmbeddedSolarServer classes thread-safe?
The CommonsHttpSolrServer class represents a client connection to the server, so that should be thread-safe. I have no experience with the EmbeddedSolrServer class, so I’d suggest you post that question to the Solr development list or do a Google search for the issue instead.
If my memory serves me right (which it very well may not do), the EmbeddedSolrServer is considered to be a inferior way of running Solr compared to the full stack.
Hi All,
I want to index the document fields in a xml file to index using solrj. I
know how to index the document fields using doc.addfield(). But I dont know
how to post the xml document instead of adding each field in solrj.
Can I index xml file using solrj? Can anyone help me in how to do this?
Thanks,
Very helpful tutorial!
Trying to follow it, I wrote a small app that uses Solr through Solrj. Everything works fine except for the fact that I don’t get the results I expect. :) Probably because I can’t find where the indexed data is kept. The Solr documentation says that it should go the the solr/data directory which is made automatically by Solr. But it’s not there.
Does anybody know the answer?
Thanks a lot.
@Sergey: Remember to commit after adding the documents, otherwise they will not be added to the index.
@aida: To index xml-files directly, just submit the XML documents through a regular POST operation to the /update-handler. This is what solrj does in the background for you.
A very helpful tutorial, thanks!
BTW, I think this line should be tweaked:
List facetEntries = facet.getValues();
To:
List facetEntries = facet.getValues();
At least with my compiler setup I was getting a warning about missing semicolon in the first version.
Hi,
Does anybody how to query a specific core with solrj ?
I have a core0 configured but I didnt find how to query it with solrj
Thanks
I am trying to run a SolrJ test program. I am having problems with the Tomcat 6.0 configuration for SolrJ. Sorry for posting the Exception Trace. What does it mean:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:96)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:109)
at SolrJQuery.query(SolrJQuery.java:64)
at SolrJQuery.main(SolrJQuery.java:112)
Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:391)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:183)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
… 3 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at com.sun.net.ssl.internal.ssl.InputRecord.readFully(InputRecord.java:293)
at com.sun.net.ssl.internal.ssl.InputRecord.read(InputRecord.java:331)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:789)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1112)
at com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:623)
at com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at org.apache.commons.httpclient.HttpConnection.flushRequestOutputStream(HttpConnection.java:828)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.flushRequestOutputStream(MultiThreadedHttpConnectionManager.java:1565)
at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2116)
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:335)
… 5 more
The exception indicates that the request timed out while trying to get results from the Solr server. This can be caused by the Solr server not being available, locking up or other issues. Try issuing the same query through the web interface to the solr server or use Wireshark to look at the traffic between your application and the Solr server.
Hi Mats,
My application already has the Lucene´s indexes and I want to use it with the Solr passing the path where the indexes are stored. How can I do this?
Thanks a lot.
André
Read about how to use an existing Lucene index in Solr at the solr-user mailinglist. Hopefully that helps!
:)
Hi All,
Very helpful tutorial!
Thanks a lot.
Hi all,
I am a newbie for using solr. I am indexing data from database using solr data import handler. My question is that once everything is indexed how can i query using any keyword. Using *:* it gives all the results. However if i search using a keyword that is already indexed, the search results gives nothing and even the keyword is indexed. Do i have to use solrj to add the fields in the schema.xml in a client application?
Where i have to store this solrTest file???
You store the file anywhere you want – as long as you’re able to find it again and compile it with javac. Usually you can do this with just javac , but it may require adding some libraries to the path if they’re not already there (for SolrJ).
Just wanted to say that I stumbled across this post last year, and it is indeed the best example of SolrJ that I have found thus far.
Just wondering, have you ever done any MoreLikeThis examples using SolrJ? I’m currently experimenting with it to see how far I can get…
hey.. I am a newbie to all this.. I was using the class GeoHashFunction and it required ValueSource type objects as parameter.. but ValueSource is an abstract class.. Can you tell me how to use this ValueSOurce class and add LAT and LNG values to it?
hoping to get a reply soon..
Sorry for the late answer Sumit, but I really don’t have any experience using the GeoHashFunction. The ValueSource class hierarchy are usually implemented using any of the classes that Inherit from ValueSource. This seems to be internal stuff in Solr that you really shouldn’t have to do much work with on your own.
I’ll get back to an article about proper geo searching through Solr later as the standard seems to have stabilised.
Great article,
Don’t suppose the example could include Paging where a cursor was maintained in solr so the second page does not have to go through all the nodes in the index tree again that the first page went through.
ie. kind of like ScrollableResultSet from hibernate vs. Query…Query gets slower and slower the more and more pages where ScrollableResultSet stays linear.
thanks,
Dean
Hi Mats,
I want to read the data from Solr in Json format. Is there any way to directly read the Json string (instead of reading the data as Beans & then converting them to Json)?
Thanks,
Anil.
Hi Anil,
You can use the SolrJSON output writer to get the output from Solr directly as JSON.
Hope that helps!
–mats
Thanks a ton Mats, for the lightning fast reply.
From what I got from that page, it is only mentioned that if append “wt=json” to the url we will get the response as json. But my problem is how do I get the same json in my java code. Currently I am doing some thing like this.
QueryResponse rsp = solrServer.query(query, SolrRequest.METHOD.POST);
List results = rsp.getBeans(AbstractSolrEntity.class);
String jsonResponse = convertBeansToJson(results);
Instead of doing all this, is there a way to directly get the jsonResponse from the solrServer/QueryResponse?
Thanks,
Anil.
Ah, sorry.
No, I don’t think there’s a way of directly getting the JSONResponse through SolrJ. As far as I can see there’s no way of getting the stream to read from the query or the raw output from the query before parsing in SolrJ.
When I do rsp.toString() it is giving me the result in javabin format. Following is what I am getting.
{responseHeader={status=0,QTime=2,params={start=0,q=agencyId:1,wt=[javabin, javabin],rows=1000000,version=2}},response={numFound=1,start=0,docs=[SolrDocument[{agencyId=1, agencyName=Agency One, hostId=2, hostName=Host Two, subOrgId=3, subOrgName=SubOrg Three}]]}}
I want the same string in Json format. Even though I have done query.set(“wt”,”json”), I am getting in this format. Am I doing anything wrong here?
Very nice introduction, I learned a lot :)
I will now be looking on other tutorials/references for more advanced features! Can you please give some pointers of other documentation? I did not find much about SolrJ.
Just a quick correction on a code snippet you show on the tutorial. There is a variable name (facetEntries) out of place in:
”
List facetEntries = facet.getValues();
”
This is just fine in the file for download.
Thank you! Cheers!
Thank you for the update, the example has been corrected now! I don’t have any suggestions of other resources, but if there’s anything particular you want me to dig into, it’d be great to hear what people are missing. I’ve been wanting to do a part 2, but have never found the time.
After playing a little bit more with SolrJ and also check some other documentation (for example the http://www.solrtutorial.com/, which gives a good overview on how to configure schema.xml and solrconfig.xml) I think there are not many other things to present as introduction to SolrJ.
However, I am working with Solr for 1 week, and for sure I will encounter some other problems… When this happens I will let you know…
Cheers!
Hi, Mats,
I have been using Solr for several months, but only recently I want to use SolrJ to access Solr.
I can understand most parts in your post, and I really want to have a try of the sc you posted; probably change it somehow to cater my needs. My problem is, I am not sure how to make the sc running. Say I have downloaded apache-solr-3.5.0-src, SolrJ included, where should I put the sc in the package? Can you give some hints or description of steps? Thanks a lot.
Really useful one to use solr in custom java web application.
Thanks for sharing.
Hi,
I’ve used solr before tweaking it’s config files and things like that, I wanna try using solrJ.
I did as you said. I made a maven project including the dependencies mentioned on the wiki.
I have a few questions regarding this though:
1 – How do I know what the solr core is?
2 – So, when i instantiate the server, it starts up one core? Where does it store the data? Is it reading from a file? How do I se the solrconfig.xml and schema.xml.
When I run my program as a java application i get this error:
Any help will be really appreciated!!
Error:
INFO: Retrying request
org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException: Connection refused
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:472)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at com.comcast.cvs.solrsearch.ServerSolr.queryServer(ServerSolr.java:38)
at com.comcast.cvs.solrsearch.App.main(App.java:22)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at java.net.Socket.connect(Socket.java:478)
at java.net.Socket.(Socket.java:375)
at java.net.Socket.(Socket.java:249)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:416)
SolrJ is a client library – a Solr Client. It does not contain any server code, so any configuration etc. will be as a regular setup of a Solr server (on a server or computer by itself, or in another process at the same machine). The Solr core is usually known beforehand (configured when you set up Solr), and you usually use it when accessing the server / submitting documents: http://server:8080/solr/<corename> or http://server:8080/<corename>.
The connection refused is probably caused by you not running a Solr server on the location you’ve provided to SolrJ.
That makes much more sense. I’m sorry I did not read the wiki properly.
Thanks!
I was wondering if there is anything I can do to completely write it in java. I mean i have changed the configurations (schema.xml and solrconfig.xml) and I tweaked velocity to get a good engine on browse UI.
What I’m looking for is a way where I don’t need to tweak these values and I can set them pornographically and also start the server in one custom application.
Any suggestions?
Thanks a ton!
*programmatically.
You do have an option to use the Solrj EmbeddedSolrServer-option, but this will not allow you (as far as I know, I may be in the wrong here) to run everything in one process. You’ll just avoid using the HTTP interface to Solr.
I’d recommend using the HTTP interface if at all possible, as it will keep everything separate. Otherwise you could also use Lucene directly if you’re looking at embedding everything in one application, but this will require a tighter binding between your code and the data.
Here are the steps I followed to write an application to index pdf documents to fresh solr3.6
1 -In Schema.xml:
I added the fields I wanted indexed and changed stored = true.
2 – Started Solr using java -jar start.jar from the /example dir in Solr 3.6
3- In my application I start the server
solrServer = new CommonsHttpSolrServer(url);
solrServer.setParser(new XMLResponseParser());
4 – i index the data like this:
ContentStreamUpdateRequest index = new ContentStreamUpdateRequest(“/update/extract”);
index.setParam(“literal.id”, “doc”);
index.setParam(CommonParams.STREAM_FILE, “/location/x.pdf”);
index.setParam(CommonParams.STREAM_CONTENTTYPE, “application/pdf”);
index.setParam(UpdateParams.COMMIT, “true”);
5 – I commit using solrServer.commit().
When I run a simple query like(*:*) – don’t see anything. The numDocs that have been indexed is still 0.
What am I doing incorrectly?
I am a newbie to these. I wanted to know, how can I index multiple files of different types(pdf, txt, doc) in Solr? In lucene we can do this by providing a directory location path in IndexFiles (i.e. org.apache.lucene.demo.IndexFiles.java). Is there a similar way here using SolrJ?
You want to have a look at the ExtractingRequestHandler which uses Tika to extract content from rich documents. See http://wiki.apache.org/solr/ExtractingRequestHandler for examples and usage.
Thanks a lot Mats. It is very helping.
Sir, kindly please elaborate what should I give in the command line argument for running this class?
You might need to set up your classpath so it can find solrj, and the main() function expects the search query as the first argument:
java SolrTest foo
.. should work, if your classpath is correct.
error came. Please give suggestion.
Exception in thread “main” java.lang.NoSuchMethodError: org.apache.solr.common.SolrException.(ILjava/lang/String;)V
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at urls.query(urls.java:46)
at urls.main(urls.java:95)
i have one field, name is book_name. I need to search in this field using solrj. if i give a single word it works fine , but when i give multiple words it won’t work.
SolrQuery query = new SolrQuery(); query.setQuery(“book_name:live alone”);
the result should list the book name that contains both the words any where in the field book_name
i am using solr-5.0.0
@Falakh: Unless you’re providing a field name for each token, Solr searches the default search field for each word by itself. You can change this using the df (default field) and qf (query fields) parameters, or change the configuration in your schema / config if needed. I recommend using the df/qf parameters unless you have a good reason to keep it in the Solr configuration.
I also recommend using the admin query interface of Solr to work with query issues, as that’s probably easier to debug than using SolrJ directly.