Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Book Review: Solr 1.4 Enterprise Search Server

MassDosage (1967508) writes | more than 3 years ago

Books 1

MassDosage writes "[Note to Slashdot editors: my e-mail address is massdosage@gmail.com if you need to contact me. Please do NOT publish this on the site.]

Solr 1.4 Enterprise Search Server written by David Smiley and Eric Pugh provides in-depth coverage of the open source Solr search server. In some ways this book reads like the missing reference manual for the advanced usage of Solr. It is aimed at readers already familiar with Solr and related search concepts as well as those having some knowledge of programming (specifically Java). The book covers a lot of ground, some of it fairly challenging, and gives those working with Solr a lot of hands-on technical advice on how to use and fine-tune many parts of this powerful application.

Solr 1.4 Enterprise Search Server starts off with a brief description of what Solr is, how it is related to the Lucene libraries (which it is built around) and how it compares to other technologies such as databases. This book is not an introduction to search and this chapter covers only the basics and assumes the reader already knows what they are getting into or that they will read up on search concepts themselves before reading further. Solr is free, open-source technology licensed under the Apache license and is available here. This book covers the 1.4 version of Solr and was published before this version was actually released so it is a bit patchy in areas which were still undergoing change but the authors point this out very clearly in the text where applicable.

The book provides details on downloading and installing Solr, building it from source and the manifold options available for configuring and tweaking it. A freely available data set from Music Brainz is provided for download along with various code examples and a bundled version of Solr 1.4 which is used as the basis for many of the examples referred to throughout the text. In some ways this dataset is limited as it only allows for fairly simple usages compared with the challenges of indexing and searching large bodies of text. Again, the authors clearly mention these limits and briefly describe how certain concepts would be better applied to other data sources.

The basics of schema design, text analysis, indexing and searching are covered over the next three chapters and these include a wide-range of essential search concepts such as tokenizers, stemming, stop-words, synonyms, data import handlers, field qualifiers, filters, scoring, sorting etc. The reader is taken through the process of setting up Solr so it can be used to index data that is to be searched and then how this data can be imported into Solr from a variety of sources like XML and HTML documents, PDF’s, databases, CSV files and many others. Using Solr to build search queries is covered with examples that the reader can run via the Solr web interface and provided sample data.

More advanced search techniques are covered next and at this point I felt a lot of what was being discussed went over my head. Perhaps this was because my own search experience hasn’t extended very far and the behind-the-scenes algorithms powering search aren’t something I’ve had to directly work with. There were sections here that definitely felt aimed at people with a much more thorough understanding of the theory underpinning search and how a knowledge of mathematics and the data being searched are essential for search algorithm design. Having said this, these chapters felt like they would be really useful to come back to at some point in the future and I’m sure that people working with search on a daily basis would find some useful advice here for how to get the best out of Solr.

Solr provides much more than just indexing and search and the fact that various components are available to do many other common search-related functions is one of its main benefits. These components provide things like the highlighting of search terms in returned results, spell-checking, related documents and so on. The authors cover components which ship with Solr to provide this functionality as well as a mentioning a few that are currently separate software projects. One can easily see how all of this would be directly applicable if one was adding search capability to one’s own product or web site as there are a lot of wheels that Solr saves you from having to re-invent. The book also mentions the various parts of Solr that can be extended to modify or add new behaviours, which of course if one of the many advantages of its open source nature.

The final three chapters move on to the more practical side of actually using Solr in the “real world” and discuss various deployment options, how it can be monitored using JMX, security, integration and scaling. In addition to Java (which is the probably the most powerful and straightforward way of integrating with Solr) support for languages like JavaScript, PHP and Ruby is described. I felt the Ruby section was way too long, maybe one of the authors has a soft spot for the Ruby language? The sections on writing a web crawler and doing autocomplete were far more interesting and probably also more generally applicable. The book wraps up with a thorough discussion on how to scale Solr from scaling high (optimising a single server through techniques like caching, shingling and clever schema design and indexing strategies), scaling wide (using multiple Solr servers and replicating or sharding data between them) and scaling deep (a combination of the former two approaches).

On the whole this is a very thorough, detailed book and it is clear that the authors have a lot of experience with Solr and how it is used in practice. This book does not cover a lot of theory and assumes a fair amount of prior knowledge and is definitely aimed at those who need to get their hands dirty and get up and running with Solr in a production environment. The authors have a straightforward, open and honest writing style and aren’t afraid of clearly stating where Solr has limitations or imperfections. While the book may have a somewhat steep learning curve, this is isolated to certain chapters which can be skipped and returned to later if necessary. The fact that the writing is concise and to the point means one doesn’t have to wade through pages of flowery text before getting to the good bits. If you’re seriously thinking about using Solr or are already using it and want to know more so you can take full advantage of it, I would definitely recommend this book.

Full disclosure: I was given a copy of this book free of charge by the publisher for review purposes. They placed no restrictions on what I could say and left me to be as critical as I wanted so the above review is my own honest opinion."

cancel ×

1 comment

uhm ... (1)

patf (2013550) | more than 3 years ago | (#35483200)

[Note to Slashdot editors: my e-mail address is massdosage@gmail.com if you need to contact me. Please do NOT publish this on the site.]

Worked out really well!

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...