Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Lucene and SOLR Get Commercial Support

ScuttleMonkey posted more than 5 years ago | from the foss-going-mainstream dept.

47

ruphus13 writes "Two of the technical leads and core committers of the Lucene Project have launched Lucid Imagination, a venture backed company now offering commercial versions of Lucene and SOLR in the hopes of making it the de facto choice of search technologies used by companies within their products. 'The Lucene search library ranks amongst the top 5 Apache projects, installed at over 4,000 global companies. Although OStatic is primarily Drupal-based, our site's search is based on Lucene. According to Lucid Imagination officials, the Solr search server, which transforms the Lucene search library into a ready-to-use search platform for building applications, is the fastest growing Lucene sub-project...Lucid's business model is roughly comparable to Red Hat's very successful model, in that it centers on support and services for free, open source software.'"

cancel ×

47 comments

Sorry! There are no comments related to the filter you selected.

fp (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#26673055)

fp?

oookay. (4, Insightful)

girlintraining (1395911) | more than 5 years ago | (#26673057)

Nice press release but.. what does it do? O_o Five million dollars and they couldn't even buy a one sentence description of their product. Standards are slipping.

Re:oookay. (3, Insightful)

Azar (56604) | more than 5 years ago | (#26673153)

"...in the hopes of making it the defacto choice of search technologies used by companies within their products. 'The Lucene search library ranks amongst the top 5 Apache projects... According to Lucid Imagination officials, the Solr search server, which transforms the Lucene search library into a ready-to-use search platform for building applications...

I agree, it could have been more explicit in giving a brief description, but was it really that difficult to glean what it does from the summary?

Re:oookay. (-1, Flamebait)

girlintraining (1395911) | more than 5 years ago | (#26673221)

I agree, it could have been more explicit in giving a brief description, but was it really that difficult to glean what it does from the summary?

So they spent $5 million to build a SELECT query? Maybe they have a future in government contract work. More seriously, this is like saying that NASA develops space technologies, space vehicles and is ranked amongst the top five space research agencies. What does that really say? Nothing. The terms are so broad as to offer zero technical insight. I mean, I could interpret my description of NASA to mean they build portable storage pods.

Re:oookay. (1)

The End Of Days (1243248) | more than 5 years ago | (#26673355)

So your reaction to something you don't understand is denigration? I hope you don't expect people to accept you as a girl in training, then. It would be hypocritical.

Re:oookay. (3, Informative)

morgan_greywolf (835522) | more than 5 years ago | (#26673495)

No. It's a search engine for your website. It's not quite as simple as a SELECT query. Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. [apache.org] . That does quite a bit more than a SELECT query could hope to do.

Re:oookay. (0)

Anonymous Coward | more than 5 years ago | (#26673639)

Link is currently DoS'd... but you can't blame the knee jerk posters who treat this like a standard slashvertisment.

TFS reads like ad copy, not new for nerds.

Re:oookay. (0, Redundant)

morgan_greywolf (835522) | more than 5 years ago | (#26673843)

Google is your friend [google.com] . Yes, I can blame the posters. If you don't know what something is, we have technology for that. It's called a freakin' search engine.

Re:oookay. (0, Troll)

FishWithAHammer (957772) | more than 5 years ago | (#26675129)

That, and at the least anyone peripherally involved with non-MS development should probably know what Lucene is. It's that awesome.

Re:oookay. (5, Informative)

liquidpele (663430) | more than 5 years ago | (#26673763)

Wow, you are an IDIOT. I use SOLR for indexing a few million documents. A full text search with a regular database engine took about 30 seconds for a detailed search, or would take several minutes for a search that probably hit most of the documents. SOLR does it in less than 1 second no matter what, and actually scales (the DB searches had to be pretty much one at a time or the server hung). Next time you open your mouth, maybe you shouldn't.

Re:oookay. (0, Flamebait)

morgan_greywolf (835522) | more than 5 years ago | (#26673933)

use SOLR for indexing a few million documents....SOLR does it in less than 1 second no matter what, and actually scales

Nice. Yeah, I'd definitely say that's worth a 5 million bucks.

Re:oookay. (2, Insightful)

liquidpele (663430) | more than 5 years ago | (#26676539)

Not sure if you're being sarcastic or not, but yes it certainly is. Full-text search is a very *very* complicated thing, and their products do it very well. Not to mention, only a few million isn't that much for a company to be worth these days due to inflation.

Re:oookay. (1, Insightful)

Chabo (880571) | more than 5 years ago | (#26673257)

I read the summary twice and it just made my head spin.

There's a big presumption in the summary that we've heard of Lucene before. I don't even know what they do. Do they search... the web? ...your LAN? ...your desktop?

Re:oookay. (0)

Anonymous Coward | more than 5 years ago | (#26673269)

I read the summary twice and it just made my head spin.

There's a big presumption in the summary that we've heard of Lucene before. I don't even know what they do. Do they search... the web? ...your LAN? ...your desktop?

...your mom!

Re:oookay. (1)

macraig (621737) | more than 5 years ago | (#26673539)

That's still illegal, though Bush was working on it behind closed doors.

Re:oookay. (0)

Anonymous Coward | more than 5 years ago | (#26674137)

That's because the average Slashdot user's technology knowledge ends with how to use Bittorent. Lucene is an indexing engine. A really good, free one. It is a truly MAJOR piece of technology infrastrucure, and one of the real gems of Open Source. Moron.

Re:oookay. (1)

sonsonete (473442) | more than 5 years ago | (#26674963)

I don't even know what they do. Do they search... the web? ...your LAN? ...your desktop?

In short: yes.

  1. Lucene can be set up to search just about anything—the web, a network, your desktop, a database, or anything else you can tell it to read.
  2. Solr provides a web interface to Lucene.
  3. Lucid Imagination contributes to the Lucene and Solr projects and provides commercial support for users of the software.

Re:oookay. (2, Informative)

FooBarWidget (556006) | more than 5 years ago | (#26673197)

Lucene is a full-text indexer and search library. Solr is a full-text indexer and search server, based on Lucene.

full-text search (4, Informative)

CarpetShark (865376) | more than 5 years ago | (#26673493)

Nice press release but.. what does it do?

You mentioned SQL SELECTs elsewhere. Full-text search isn't like a SELECT. It's more like what what happens when you google something: many documents are searched in a split second, and complex queries can be done, like documents containing a phrase, but not this one, or documents that mention X with Y within a few sentences of that, or documents that mention X and Y, but not Z. Yes, SQL lets you do that, but not for text, except in very inefficient ways.

From what I've seen of it (which is very little), Lucene lets you, as a programmer, index data using your own field names. So, say you're indexing word documents and HTML documents. You can extract most of the text and index it as "maincontent", but seperately extract the author, title and subtitle, indexing those individually. This lets you query attributes, like: "space nasa and not genre:sci-fi". Full text search also does ranking based on the occurences of different words you query by, etc. Presumably Lucene would let you specify which fields/attributes are included in a search, and which ones have the highest scores in search results, for instance.

Yeah, I don't get where $5m USD went on that either. I didn't think it was THAT big a problem. But maybe it is. Personally, I'm holding out for a decent Triple API, which hopefully make all but the indexer of this obsolete.

Re:full-text search (1)

FishWithAHammer (957772) | more than 5 years ago | (#26675163)

Yeah, I don't get where $5m USD went on that either. I didn't think it was THAT big a problem.

Getting it right, and doing it as well as Lucene does (which is spectacularly well), really is THAT big a problem.

Re:full-text search (2, Informative)

Wokan (14062) | more than 5 years ago | (#26678077)

Nice press release but.. what does it do?

From what I've seen of it (which is very little), Lucene lets you, as a programmer, index data using your own field names. So, say you're indexing word documents and HTML documents. You can extract most of the text and index it as "maincontent", but seperately extract the author, title and subtitle, indexing those individually. This lets you query attributes, like: "space nasa and not genre:sci-fi". Full text search also does ranking based on the occurences of different words you query by, etc. Presumably Lucene would let you specify which fields/attributes are included in a search, and which ones have the highest scores in search results, for instance.

You've certainly hit close to the mark. I work on a site that uses Solr and it does work just as incredibly as others have said. You can tell it what fields you want to search. You can tell it what order you want results sorted in (and you can sort on more than one column in cases of relevancy ties). You can tell it you want matches in one column weighted more than another. You can tell it you want the terms to be within X words of each other. And you can tell it what words should not be in the results.

And then there's the other results it can offer. Faceted search is fantastic. If you have products split by department, you can facet by department and your search for widgets can then return not only the results, but a list of the departments the current results were found in with a result count for each. (Very common feature on ecommerce sites, especially those using Endeca.)

They also have more-like-this results you can use as well as match highlighting. I haven't had the opportunity to try the spelling correction parts yet.

And the indexes can be incredibly small. After indexing over 1 million pages of information, the index data folders were under 500MB. The Lucene indexer can literally hold our entire search set in RAM while it's running.

Re:oookay. (1)

saibot834 (1061528) | more than 5 years ago | (#26675595)

One of the most interesting fields where Lucene is useful (probably also for you) is Wikipedia. Remember how painful it was to search something on Wikipedia some months ago?

Well now, thanks to Lucene, Wikipedia (and its sister projects) don't have to use the in-build MediaWiki search engine (which really is crappy). Probably the best feature Lucene brings is "Did you mean ...". Google is still better, but Lucene was a big step for Wikipedia.

Re:oookay. (1)

Simetrical (1047518) | more than 5 years ago | (#26688801)

Wikipedia has been using Lucene for a few years by now. The recent changes were improvements to how it was used, but it was being used the whole time. Out of the box, MediaWiki uses whatever fulltext search is available from the DBMS being used -- in MySQL's case, that means using MyISAM, which is impossible for a site the size of Wikipedia (all selects, updates, deletes, etc. take out table-level locks).

first!!! (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#26673063)

firsssdt!

Re:first!!! (0, Offtopic)

macraig (621737) | more than 5 years ago | (#26673575)

Ummm, nope. There Can Be Only One.

interesting (1)

larry bagina (561269) | more than 5 years ago | (#26673127)

Talk at the water cooler was that Sun was taking an interest in them to expand their open source catalog. All in all, they're probably a lot better off going it alone in the current market. With companies looking to save money by going open source, it's a great time for OS support.

Re:interesting (1)

Darkness404 (1287218) | more than 5 years ago | (#26673929)

That and possible large government projects. With Obama wanting to increase government projects and more transparency, along with save money, OSS is a great way to do it and I believe that Sun has already written to Obama about switching to all OSS. So Sun wanting to acquire more OSS vendors certainly makes sense.

possible alternative: xapian (2, Interesting)

bdqbit (1465479) | more than 5 years ago | (#26673129)

Nice going for Lucene (LGPL?), although i've preferred Xapian (GPL) in the past (with python bindings).

Good to have choice, i guess.

Re:possible alternative: xapian (-1, Flamebait)

Anonymous Coward | more than 5 years ago | (#26673211)

Python is balls slow compared to Java.

And less portable.

Re:possible alternative: xapian (2, Informative)

bdqbit (1465479) | more than 5 years ago | (#26673297)

Xapian is C++ (with plenty of bindings for a lot of languages -- including python)

Re:possible alternative: xapian (0)

Anonymous Coward | more than 5 years ago | (#26673321)

Ah well then:

Java is balls slow compared to C++.

And less portable.

Re:possible alternative: xapian (1)

bdqbit (1465479) | more than 5 years ago | (#26673365)

We can agree on that. Xapian (setup correctly) can be blazingly fast with loads of data. The python bindings (basically having access to the xapian API) don't add much weight in my experience.

Re:possible alternative: xapian (1)

morgan_greywolf (835522) | more than 5 years ago | (#26673885)

They typically wouldn't. Python bindings for C or C++ libraries are usually nothing more than pointers to the correct shared library calls.

SOLR has several advantages (3, Informative)

Krischi (61667) | more than 5 years ago | (#26675349)

I agree, Xapian is nice, and we considered it for a while. However, in the end, the decision was made to use SOLR because of one overriding factor in its favor: it takes care of all the nasty details to enable concurrent access, which makes developing web applications just so much easier. With SOLR you just don't have to worry about who might currently be reading or writing to the index, and the index replication features are very powerful, too.

That, and facet searches are very nice, too (e.g., searching for a keyword and then automatically displaying the # of hits per category, and refining per category).

SOLR has Python bindings, too, by the way. They currently are not in the official repository, but recently maintenance on them has picked up, and they work in a very Pythonic way.

About to move to the Java port of Lucene... (4, Informative)

merreborn (853723) | more than 5 years ago | (#26673347)

We're currently using the Zend PHP port of Lucene. It was nice, because we were able to use all our existing code for loading our PHP objects from the database for indexing. It worked fine, as long as are indexes stayed small.

Now we have several indexes weighing in at around 300+ megabytes, and Zend Lucene has proven to be absolute crap. It takes seconds of CPU time, and hundreds of megs of ram to process simple queries against these indexes. When tested in Luke [getopt.org] , the same queries against the same indexes finish in milliseconds with minimal memory usage. Either the Zend port, or PHP itself is clearly unsuitable for production use on large indexes.

Either way, we're going to switch it out for Solr ASAP, and we anticipate the development overhead should be minimal -- we'll keep using the same code to load our objects, and pass them to Solr via JSON.

Re:About to move to the Java port of Lucene... (1)

Sentry21 (8183) | more than 5 years ago | (#26673543)

Either the Zend port, or PHP itself is clearly unsuitable for production use on large indexes.

You phrase this in such a way as to imply an exclusion, when really both are often true. We've ported our PHP application to Rails (which provides a different, but workable, set of problems), and we've rid ourselves of the Zend engine in return for Ferret; I'm a proponent of replacing that with SOLR, but we've yet to go down that path.

Re:About to move to the Java port of Lucene... (3, Insightful)

WoLpH (699064) | more than 5 years ago | (#26673581)

That's because the Zend Lucene library is written in pure PHP, ergo... _really_ slow. Either use a C module or get SOLR to get it fast. In my simple tests the Python lucene libraries were about 100-500 times faster than the Zend PHP version, it's really one of the worst Lucene libraries around (in terms of speed).

Re:About to move to the Java port of Lucene... (1, Informative)

Anonymous Coward | more than 5 years ago | (#26673803)

I found the original Java libraries to be plenty fast as well. We index millions of records, and it's always been plenty fast returning even the most complex queries. Granted, it probably isn't as fast as the C library, but it is the most updated and feature rich. And, many of those later features that the C library lacks makes it COMPLETELY worth it.

Re:About to move to the Java port of Lucene... (1)

ionix5891 (1228718) | more than 5 years ago | (#26673881)

Yes the Lucene php version is very very slow (very)

I recently switched to sphinx (http://www.sphinxsearch.com/) its written in C and compiles nicely on my linux servers, indexes documents at crazy speeds and theres piles of options

I highly recommend above (use it on 200,000 queries a day vertical search engine for one of our sites)

Re:About to move to the Java port of Lucene... (1)

tcopeland (32225) | more than 5 years ago | (#26674035)

> I recently switched to sphinx (http://www.sphinxsearch.com/) its written in C

Minor nit - it's in C++. But yeah, it's totally awesome - fast when indexing, easy to scale horizontally, powerful query language, custom stop word lists, etc, etc. The APIs (I use the Ruby one, Riddle) make it easy to do nifty excerpt formatting (for example, note the highlighting around the word 'battle' [militarypr...glists.com] ), and there are a couple of different ways to integrate it into a Ruby on Rails app.

Speaking of Sphinx and Rails, here's a code snippet for escaping extended mode Sphinx queries [blogs.com] . This will probably make its way into Riddle at some point, but, until then, there it is.

Re:About to move to the Java port of Lucene... (1)

Ythan (525808) | more than 5 years ago | (#26674455)

As a satisfied user I just wanted to give another shoutout to Sphinx. It really is fantastic, better than Lucene if you want something lightweight and easy to configure, and the speed and relevance of search results are excellent. Commercial support is available and it's being used on Craigslist and The Pirate Bay among other notable sites. Anyone who's struggling with MySQL's anemic fulltext search would do well to give it a look.

Re:About to move to the Java port of Lucene... (1)

Moebius Loop (135536) | more than 5 years ago | (#26675353)

What I ended up doing for various webapps (PHP and Python, although Python's port of Lucene actually loads the Java runtime, and is fairly fast) is create a simple local server that a PHP script can communicate with over sockets and a trivial protocol.

This is fairly straightforward for me since most of the time I just want Lucene to return a list of document IDs. I use those IDs to create a temp table that I can do additional queries against in SQL.

Running it as a separate server allows me to use the original Java codebase (which make updating the library easy), and also avoid the overhead of loading/instantiating Lucene from PHP on every request.

Both Spring and Hibernate have Lucene modules (1)

kbrasee (1379057) | more than 5 years ago | (#26673759)

I've heard great things about Lucene (guy at the company I used to work for swears by it, he used it for anything from searching B2B stores to biological indexing). Both Hibernate [hibernate.org] and Spring [java.net] have support for this library.

I'm looking into adding search on my site so I should probably check it out. There's a new "In Action" [amazon.com] book out for using the Hibernate Lucene add-on -- I might have to pick that up.

troolkore (-1, Troll)

Anonymous Coward | more than 5 years ago | (#26673967)

and that 7he floor have the 3nergy

Based on open source? #5? (-1, Offtopic)

Jane Q. Public (1010737) | more than 5 years ago | (#26675727)

"... the Lucene search library ranks amongst the top 5 Apache projects, installed at over 4,000 global companies."

Okay... maybe the ultimate base of their business is opeb-source, but frankly I try to pay attention and I have never heard of these people before.

Open-source but stealth development? Maybe it's tongue-in-cheek software.

Re:Based on open source? #5? (1, Informative)

Anonymous Coward | more than 5 years ago | (#26676553)

Google shows over 3 million hits for a search on Lucene, and the first 100 summaries all seem relevant.

Xapian vs Lucene (0)

Anonymous Coward | more than 5 years ago | (#26676307)

What's the feature set vs Xapian?

  http://www.xapian.org/

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>