Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Book Review: Solr 1.4 Enterprise Search Server

samzenpus posted more than 3 years ago | from the read-all-about-it dept.

Books 43

MassDosage writes "Solr 1.4 Enterprise Search Server written by David Smiley and Eric Pugh provides in-depth coverage of the open source Solr search server. In some ways this book reads like the missing reference manual for the advanced usage of Solr. It is aimed at readers already familiar with Solr and related search concepts as well as those having some knowledge of programming (specifically Java). The book covers a lot of ground, some of it fairly challenging, and gives those working with Solr a lot of hands-on technical advice on how to use and fine-tune many parts of this powerful application." Keep reading for the rest of MassDosage's review.Solr 1.4 Enterprise Search Server starts off with a brief description of what Solr is, how it is related to the Lucene libraries (which it is built around) and how it compares to other technologies such as databases. This book is not an introduction to search and this chapter covers only the basics and assumes the reader already knows what they are getting into or that they will read up on search concepts themselves before reading further. Solr is free, open-source technology licensed under the Apache license and is available here. This book covers the 1.4 version of Solr and was published before this version was actually released so it is a bit patchy in areas which were still undergoing change but the authors point this out very clearly in the text where applicable.

The book provides details on downloading and installing Solr, building it from source and the manifold options available for configuring and tweaking it. A freely available data set from Music Brainz is provided for download along with various code examples and a bundled version of Solr 1.4 which is used as the basis for many of the examples referred to throughout the text. In some ways this dataset is limited as it only allows for fairly simple usages compared with the challenges of indexing and searching large bodies of text. Again, the authors clearly mention these limits and briefly describe how certain concepts would be better applied to other data sources.

The basics of schema design, text analysis, indexing and searching are covered over the next three chapters and these include a wide-range of essential search concepts such as tokenizers, stemming, stop-words, synonyms, data import handlers, field qualifiers, filters, scoring, sorting etc. The reader is taken through the process of setting up Solr so it can be used to index data that is to be searched and then how this data can be imported into Solr from a variety of sources like XML and HTML documents, PDF's, databases, CSV files and many others. Using Solr to build search queries is covered with examples that the reader can run via the Solr web interface and provided sample data.

More advanced search techniques are covered next and at this point I felt a lot of what was being discussed went over my head. Perhaps this was because my own search experience hasn't extended very far and the behind-the-scenes algorithms powering search aren't something I've had to directly work with. There were sections here that definitely felt aimed at people with a much more thorough understanding of the theory underpinning search and how a knowledge of mathematics and the data being searched are essential for search algorithm design. Having said this, these chapters felt like they would be really useful to come back to at some point in the future and I'm sure that people working with search on a daily basis would find some useful advice here for how to get the best out of Solr.

Solr provides much more than just indexing and search and the fact that various components are available to do many other common search-related functions is one of its main benefits. These components provide things like the highlighting of search terms in returned results, spell-checking, related documents and so on. The authors cover components which ship with Solr to provide this functionality as well as a mentioning a few that are currently separate software projects. One can easily see how all of this would be directly applicable if one was adding search capability to one's own product or web site as there are a lot of wheels that Solr saves you from having to re-invent. The book also mentions the various parts of Solr that can be extended to modify or add new behaviours, which of course if one of the many advantages of its open source nature.

The final three chapters move on to the more practical side of actually using Solr in the "real world" and discuss various deployment options, how it can be monitored using JMX, security, integration and scaling. In addition to Java (which is the probably the most powerful and straightforward way of integrating with Solr) support for languages like JavaScript, PHP and Ruby is described. I felt the Ruby section was way too long, maybe one of the authors has a soft spot for the Ruby language? The sections on writing a web crawler and doing autocomplete were far more interesting and probably also more generally applicable. The book wraps up with a thorough discussion on how to scale Solr from scaling high (optimising a single server through techniques like caching, shingling and clever schema design and indexing strategies), scaling wide (using multiple Solr servers and replicating or sharding data between them) and scaling deep (a combination of the former two approaches).

On the whole this is a very thorough, detailed book and it is clear that the authors have a lot of experience with Solr and how it is used in practice. This book does not cover a lot of theory and assumes a fair amount of prior knowledge and is definitely aimed at those who need to get their hands dirty and get up and running with Solr in a production environment. The authors have a straightforward, open and honest writing style and aren't afraid of clearly stating where Solr has limitations or imperfections. While the book may have a somewhat steep learning curve, this is isolated to certain chapters which can be skipped and returned to later if necessary. The fact that the writing is concise and to the point means one doesn't have to wade through pages of flowery text before getting to the good bits. If you're seriously thinking about using Solr or are already using it and want to know more so you can take full advantage of it, I would definitely recommend this book.

Full disclosure: I was given a copy of this book free of charge by the publisher for review purposes. They placed no restrictions on what I could say and left me to be as critical as I wanted so the above review is my own honest opinion.

You can purchase Solr 1.4 Enterprise Search Server from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

cancel ×

43 comments

Sorry! There are no comments related to the filter you selected.

Solr? (-1)

Anonymous Coward | more than 3 years ago | (#35483204)

never heard of it. what the fuck would you use this for? I'm pretty sure I'll survive without it.

Re:Solr? (0)

Anonymous Coward | more than 3 years ago | (#35483454)

nobody cares.

Re:Solr? (0)

Anonymous Coward | more than 3 years ago | (#35483462)

Exactly. These Packt Publishign book reviews get less and less comments as time goes on. Packt really needs to space out their slashvertisements a little more.

Re:Solr? (1)

larry bagina (561269) | more than 3 years ago | (#35484530)

You have to buy the book to find out what solr is. I think Nancy Pelosi wrote the review.

Another paid for Packt Publishing slashvertisement (-1)

Anonymous Coward | more than 3 years ago | (#35483332)

Packt Publishing: Check!
8/10 Review: Check!
Recommended purchase: Check!

"I'm Packt Publishing and I approve of this slashvertisement!"

Re:Another paid for Packt Publishing slashvertisem (1)

Jeff DeMaagd (2015) | more than 3 years ago | (#35483810)

I wondered what was up. It's really hard to understand an article summary when I don't know any of the nouns they use. Checking back, Lucene barely had any previous /. coverage, and SOLR gets even less.

Solr is a search server (3, Informative)

Anonymous Coward | more than 3 years ago | (#35483558)

For the people wondering what Solr is for, it runs off of Apache Lucene. You feed it data/text and it processes and indexes it. It has some really neat text processing things. After using it for a while I am constantly critiquing crappy search implementations on websites.

Re:Solr is a search server (1)

MikeDirnt69 (1105185) | more than 3 years ago | (#35484310)

Works fine, but hard to keep data updated. Not worthy.

Re:Solr is a search server (2)

nzadrozny (555073) | more than 3 years ago | (#35486818)

Works fine, but hard to keep data updated.

That depends on how you are interacting with Solr. There are a number of good clients that integrate in to popular ORMs and can automatically post over updates to Solr as data changes in your application. I'm compiling a list of popular Solr clients over at https://websolr.com/guides/solr/clients [websolr.com] .

For some popular example, there is RSolr or Sunspot for Ruby applications. Haystack is a good one for Django, and there are Drupal and Django extensions as well.

Re:Solr is a search server (1)

larry bagina (561269) | more than 3 years ago | (#35484548)

Thanks! Instead of not caring about solr, now I can not care about solr and lucene.

Re:Solr is a search server (1)

Anonymous Coward | more than 3 years ago | (#35485054)

I've found that Lucene is useful for creating custom search/indexing solutions, but as a server, I much prefer Sphinx [sphinxsearch.com] . It's lighter-weight and compatible with pretty much any language. It's also remarkably fast.

Re:Solr is a search server (1)

tixxit (1107127) | more than 3 years ago | (#35487932)

I use it at work. Once you get used to it, it is hard to not use it in projects. Seriously, I couldn't imagine adding search to a site without it.

Re:Solr is a search server (1)

nikkipolya (718326) | more than 3 years ago | (#35488922)

Can't agree more. Its really a very useful tool.

Re:cant believe such perfect things here (1)

kintalucy (2017912) | more than 3 years ago | (#35500030)

wholesale CHI straightening irons [ghdkissoutlet.com] straightening irons wholesale CHI flat irons [ghdkissoutlet.com] DISCOUNT CHI Black hair straighteners [ghdkissoutlet.com] DISCOUNT flat irons [ghdkissoutlet.com] DISCOUNT flat irons [ghdkissoutlet.com] DISCOUNT flat irons [ghdkissoutlet.com] Asics Men's Mini Cooper [australia-asics.com] Asics Mexico 66 [australia-asics.com] Asics Mexico 66 Baja [australia-asics.com] onitsuka tiger asics mexico 66 deluxe [australia-asics.com] Asics Women's Mid Runner Shoes [australia-asics.com] Asics Women's Mini Cooper Shoes [australia-asics.com] Asics Gel-Kinsei 2 [australia-asics.com] hot selling Vibram Five Fingers Classic Black [bestkicksbuy.com] online shop Vibram Five Fingers Classic Black Yellow [bestkicksbuy.com] cheapest Vibram Five Fingers KSO Blue [bestkicksbuy.com] [bestkicksbuy.com] Nike air max 2010 men's black/red [air-max-usonline.com] Nike air max 2011 men's black/silver/blue [air-max-usonline.com] Nike air max 2011 men's black/silver/blue [air-max-usonline.com] http://www.air-max-usonline.com/nike-air-max-247-mens-blueblack-p-1619.html [slashdot.org] "> Nike air max 24-7 men's blue/black

Re:Solr is a search server (0)

Anonymous Coward | more than 3 years ago | (#35515938)

Thank [url=http://www.emlakkulisi.com/e/avrupa%20konutlar%C4%B1%20atakent%203/arama/avrupa-konutlar-atakent-3]avrupa konutlar atakent 3[/url]
[url=http://www.emlakkulisi.com/48288_star_towers_fiyat_listesi__134_bin_tl_ye_/arama/star-towers]star towers[/url]
[url=http://www.emlakkulisi.com/semerkant_line_beylikduzu_nde_200_bin_tl_ye_2_1_-60344.html/arama/semerkant-line]semerkant line[/url]
[url=http://www.emlakkulisi.com/terrace_feri_fiyatlari_250_bin_tl_den_basliyor_-65828.html/arama/terrace-feri]terrace feri[/url]
[url=http://www.emlakkulisi.com/e/teknik%20yap%C4%B1%20evora/arama/evora-teknik-yap]evora teknik yap[/url] you for your post it is nice

Re:Solr is a search server (0)

Anonymous Coward | more than 3 years ago | (#35550534)

For the people wondering what Solr is for, it runs off of Apache Lucene. You feed it data/text and it processes and indexes it. It has some really neat text processing things. After using it for a while I am constantly critiquing crappy search implementations on websites.

thank you very bosphorus city [emlakkulisi.com] good a post

How about a title that says WTF it is? (1)

lwsimon (724555) | more than 3 years ago | (#35483668)

"Solr"? Sounds Web 2.0, I don't think I'd be interested. Web 2.0 shouldn't require a book to explain it - in fact, the summary of the book is a bit too long for a proper Web 2.0 application.

Re:How about a title that says WTF it is? (1)

Anonymous Coward | more than 3 years ago | (#35483756)

Did you RTFT? It's a book review for a book about a search server, Solr, aimed at the enterprise market. That was a lot of information in not a lot of words.

It's a search technology (2)

kwerle (39371) | more than 3 years ago | (#35483768)

You're right, of course. /. editors suck.

SOLR is [related to] a text search technology that is often used in parallel with a database.

http://lucene.apache.org/solr/#intro [apache.org]

Re:It's a search technology (1)

lwsimon (724555) | more than 3 years ago | (#35483818)

Well crap, that sounds useful. Why on Earth did they do the stupid trendy "drop the e" thing with the name?

I'm interested enough now that I'm going to go read about it - but I'm still not going to read the summary, out of spite.

Re:It's a search technology (1)

hobo sapiens (893427) | more than 3 years ago | (#35483984)

because there is no e in solr, dumbass.

Quit spouting off about things you know nothing about.

Solr is actually quite powerful and is a very useful tool for creating awesome searches on your site.

Re:It's a search technology (1)

lwsimon (724555) | more than 3 years ago | (#35484024)

Excuse me. I meant "drop the e" as a surrogate from "remove the last vowel in the word, preceeding the letter r, which must end the word."

See also: Flickr

Re:It's a search technology (1)

abigor (540274) | more than 3 years ago | (#35484876)

So you think the original name was "soler" rather than "solar"?

Re:It's a search technology (0)

Anonymous Coward | more than 3 years ago | (#35487296)

So you think the original name was "soler" rather than "solar"?

Yeah, as in that guy is a good shoe soler. Duh. Winning!

Re:It's a search technology (1)

kwerle (39371) | more than 3 years ago | (#35484002)

Well crap, that sounds useful. Why on Earth did they do the stupid trendy "drop the e" thing with the name?

Because the apache foundation is primarily interested in web 2.0-y things, so that's what they want their projects to look/sound like?

Besides, e-solr may be a little over-the-top. :-)

Re:It's a search technology (0)

Anonymous Coward | more than 3 years ago | (#35484044)

Yeah, they should have left the "e" in there! SOLER, that's right! Yeah!

Re:It's a search technology (0)

Anonymous Coward | more than 3 years ago | (#35484618)

The should have called it asolr.
 
;-)

Re:It's a search technology (0)

Anonymous Coward | more than 3 years ago | (#35484158)

The technology was originally called 'SOLAR' when it was developed internally by CNET Networks. The name had to be changed when it was open-sourced because of trademark considerations.

Re:It's a search technology (1)

subk (551165) | more than 3 years ago | (#35484270)

Teh editors do not, in fact, suck. They merely assumed that you--a wiz-kid, tech-mag reader--would be smart enough to perform a simple evaluation before jumping into a topic. 1) Check title for prefixes. This one says "book review". 2) Do I know what Solr (or ) is? If yes, read article. Maybe post a comment. If no, see step 3 if still interested. 3) Google/wiki the technology until you are ready answer "yes" to step 2.

Re:How about a title that says WTF it is? (1)

david.emery (127135) | more than 3 years ago | (#35483832)

Is this what Web 2.0 means, supporting an attention span where technologies must fit into little soundbites for people unwilling to actually read and understand the underlying complexity? Oh, I guess I'm not "agile enough". sigh....
(And yeah this could be considered flamebait, but I really am pretty disgusted with the whole "I don't want to deal with complexity" notion. I think one thing that increasingly separates the few competent programmers from the great unwashed masses of hackers is the willingness to actually tackle and understand big, hard problems.)

Re:How about a title that says WTF it is? (1)

Slashdot Parent (995749) | more than 3 years ago | (#35494356)

"Solr"? Sounds Web 2.0, I don't think I'd be interested. Web 2.0 shouldn't require a book to explain it - in fact, the summary of the book is a bit too long for a proper Web 2.0 application.

Is your google broken, or do you merely enjoy acting like a douchebag?

Solr rocks! (2)

jnelson4765 (845296) | more than 3 years ago | (#35484960)

Use it at work to replace all the MySQL fulltext indexes we were using for a (rather bad) search interface when we moved to InnoDB. Don't miss the old search at all. I may be grabbing this book, since my boss asked for predictive search in our app soon...

Re:Solr rocks! (2)

nzadrozny (555073) | more than 3 years ago | (#35486850)

For predictive search, you'll want to get friendly with the Solr TermsComponent [apache.org] , which serves up the terms present in your index along with their frequency.

If you want to get really fancy, you can log your popular queries—particularly the ones that have a high correlation with click-throughs.

Re:Solr rocks! (0)

Anonymous Coward | more than 3 years ago | (#35493554)

Or you use the popular http://sematext.com/products/autocomplete/index.html which you can see on search-lucene and search-hadoop sites.

SOLR is no Drupal. (1)

/dev/trash (182850) | more than 3 years ago | (#35485766)

In fact all the big shops use Solr searching and not Drupal's built in search. Awesome no?

Re:SOLR is no Drupal. (1)

nzadrozny (555073) | more than 3 years ago | (#35486892)

In fact all the big shops use Solr searching and not Drupal's built in search. Awesome no?

Including, in fact, the White House [oreilly.com] , which is on a LAMP stack of Open Source goodness, including Drupal and Solr. Awesome indeed.

Re:SOLR is no Drupal. (1)

/dev/trash (182850) | more than 3 years ago | (#35499212)

uh yeah. If you think they're actually using Drupal like a normal person would....wellllllll

7pepy.com (0)

Anonymous Coward | more than 3 years ago | (#35486744)

Yes, thanks for the samples, Alex, studying those did the trick for me... [7pepy.com] [7pepy.com] [7pepy.com] [7pepy.com] [7pepy.com] [7pepy.com] [7pepy.com] [7pepy.com] [7pepy.com]

zetaclear and toenailfungus cures (1)

semon (1964650) | more than 3 years ago | (#35488558)

http://zetacleartoenailfunguscures.info/ [slashdot.org] " >zetaclear and toenailfungus cures Zeta Clear Review Site describing how Zeta Clear is a natural cure for toenail fungus. Immediately after a nail caution session, carry out somewhat test. To find out how smooth your fingernails actually are run them down an outdated pair of tights or pantyhose. For more information please visit: - http://zetacleartoenailfunguscures.info/ [zetacleart...cures.info]

weldon (1)

malinadevid (2017094) | more than 3 years ago | (#35488758)

I was very encouraged to find this site. I wanted to thank you for this special read. I definitely savored every little bit of it. http://www.parislimousineorlando.com/ [parislimou...rlando.com] orlando limo services

LV (1)

helenbetty (2011096) | more than 3 years ago | (#35488878)

Have you thought that living in a white-collar workers life? Do you want your life be more enjoyable? Come to our Louis Vuitton outlet store [buylouisvu...outlet.org] and Louis Vuitton outlet [buylouisvu...outlet.org] is where you want to go, your dreams will be realized here, look !Louis Vuitton handbags and purses, buy Louis Vuitton [buylouisvu...outlet.org] will make you be more beautiful . Oh ! Please take a action!

Solr Vs Sphinx (0)

Anonymous Coward | more than 3 years ago | (#35488902)

Solr is great and all but there are quite some overheads with its use. Since its using a stack of different technologies Java, Jetty (or tomcat),Lucene, usually PHP etc. Dont expect to run it on a low spec machine it need fast disks, plenty of ram and can get cpu bound (depending on your requirements). Its also a bit of an artform to configure and get working optimally - this is excacerbated if the Solr stack is not given much breathing space to begin with. However Lucene, upon which it is built is a tried and tested and in use in a lot of high profile sites. Expect to spend a fair bit of time getting it right. Books like this are a definite requirement - but I have to admit I have this book(not sure if it is the same edition) and aside from a few nuggets of information, and explanations of some terms. It didnt really help me very much when i inherited a badly configured (read barely modified example config) setup. I found a lot more information from the official documentation and other sites on the web. I've blogged some of the things i've found - mostly for my own benefits but you are welcome to have a look http://solrjack.blogspot.com/ [blogspot.com] i welcome any input / corrections (again for my own benefit!).

Make sure you need all the features that Solr offers because there are alternatives out there such as Sphinx, which is still very full featured, but is a bit easier to get to grips with and doesnt have the overheads that Solr/Lucene has.

All I am saying is make sure you look at the alternatives, Sphinx being the main one.

N.

Solr is great (1)

word_virus (838778) | more than 3 years ago | (#35492656)

I've used it at work with the acts_as_solr plugin for Rails. Simply define in your models which fields in the database you'd like solr to index and it just does it, allowing you to build a nice, robust search capability into your website with not a lot of work. And I'm sure I'm only using about 10% of what Solr actually provides. Look forward to checking out this book and seeing what other tricks it's got.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?