×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

What Desktop Search Engine For a Shared Volume?

timothy posted more than 4 years ago | from the which-side-of-the-firewall dept.

Data Storage 232

kriston writes 'Searching data on a shared volume is tedious. If I try to use a Windows desktop search engine on a volume with hundreds of gigabytes the indexing process takes days and the search results are slow and unsatisfying. I'm thinking of an agent that runs on the server that regularly indexes and talks to the desktop machines running the search interface. How do you integrate your desktop search application with your remote file server without forcing each desktop to index the hundred gigabyte volume on its own?'

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

232 comments

Call the NSA (5, Funny)

Anonymous Coward | more than 4 years ago | (#29800715)

They already have it indexed for you.

wow! (5, Funny)

tivoKlr (659818) | more than 4 years ago | (#29800733)

It's been an hour since this story was posted.

You've stumped Slashdot. Bravo!

Re:wow! (2, Informative)

beelsebob (529313) | more than 4 years ago | (#29801715)

I was wondering if "spotlight on OS X does that just fine" was too trollish... But, too late, I said it, and it's true, can't be trollish if it's true can it?

Re:wow! (1)

cyber-dragon.net (899244) | more than 4 years ago | (#29801759)

That's what we use... server version indexes servers, SAN volumes etc and makes them searchable from each desktop ;)

Re:wow! (1)

v1 (525388) | more than 4 years ago | (#29802875)

and it's faaaaast

It searches by content as well as by filename too for those that didn't know. Incredibly useful.

Re:wow! (0)

Anonymous Coward | more than 4 years ago | (#29802895)

sharepoint does this. (Let's find out how biased slashdot readers are)

Google Jew Search (-1, Troll)

Anonymous Coward | more than 4 years ago | (#29800735)

Jew have too many files. Jews Joogle Jew Search.

Find Jews gas chambers before it's Jew late.

Google Enterprise Search (5, Interesting)

HeavyD14 (898751) | more than 4 years ago | (#29800757)

Not that I've ever used it before, but it sounds like it does what you want: http://www.google.com/enterprise/search/gsa.html [google.com]

Everything (2, Informative)

OttoErotic (934909) | more than 4 years ago | (#29801333)

How about Everything [voidtools.com] (assuming the server is Windows & NTFS)? Works well for me (quickest desktop search I've found yet), and can either run locally or connect to an ETP server. The site seems to be down right now, but here's the original Lifehacker article [lifehacker.com] where I found it. Incidentally, I never heard of ETP til I started using it. Anyone know if it's an Everything-specific protocol?

Re:Everything (1)

The MAZZTer (911996) | more than 4 years ago | (#29802413)

Well it sounds like it is. I should note that Everything only indexes filenames, so if you want to index file CONTENTS you're out of luck (that sort of thing is GOING to take a long time anyway, since you have to read every file on disk that the indexer knows how to parse, so "quicker" could well translate to "less complete search index").

But if you don't care about indexing contents then Everything should work fine for you.

Re:Everything (1)

stg (43177) | more than 4 years ago | (#29802985)

Everything only searches for file and folder names. While that is useful too, usually Search Engine presumes searching inside the files too.

Re:Google Enterprise Search (1)

xoundmind (932373) | more than 4 years ago | (#29802071)

It's exactly whats he wants. In a shared Windows environment, it beats the native Outlook search speed handily and covers my organization's shared drive. Actually, the search speed has saved my a few times in being able to reconstruct a problem and react accordingly.

Re:Google Enterprise Search (1)

lorenlal (164133) | more than 4 years ago | (#29802305)

...it beats the native Outlook search speed handily...

Honestly? I beat native Outlook search on a fairly regular basis. I cried tears of happiness and joy when Lookout hit the scene. I choked up a bit when Microsoft bought them out, but I recovered when it looked like they integrated the engine into the engine Microsoft uses to do Desktop search on XP.

solution to hundreds of terabytes of docs (3, Interesting)

Anonymous Coward | more than 4 years ago | (#29800771)

how about using a program like Documentum? We generate several thousand technical documents and drawing a month, and use it for all our document management needs.

A couple of options (4, Informative)

Unhandled (1660063) | more than 4 years ago | (#29801353)

Here's a few options you might want to consider: 1) Use Office SharePoint Server 2007 to index the share 2) Upgrade to Windows Server 2008 (or above) and Windows Vista (or above) and use the Federated search feature: http://trycatch.be/blogs/roggenk/archive/2007/11/05/windows-vista-amp-windows-server-2008-federated-search.aspx [trycatch.be]

NO! Try Alfresco (4, Informative)

thule (9041) | more than 4 years ago | (#29801809)

SharePoint is $$$$. Try Alfresco. Alfresco can look like a file share (support SMB, DAV, FTP, etc). The indexing is built is and does not require a separate SQL Server license.

Re:NO! Try Alfresco (4, Informative)

Orion Blastar (457579) | more than 4 years ago | (#29801923)

You mean the Document Management Alfresco [alfresco.com] and not the CMS software. The Community Edition is free but unsupported, and the Enterprise edition has a free 30 day trial. It looks like it won a government award for document management which is rare for open source document management software.

Re:NO! Try Alfresco (0)

Anonymous Coward | more than 4 years ago | (#29802537)

Sharepoint is $$$$ if you want search across portals. If you have a single "portal" (consisting of multiple "webs"), Sharepoint comes with Server 2003 and above and is included in the price. There is a difference between Sharepoint Portal Server and Sharepoint Portal Services. Portal Server serves multiple portals and is not free.

Re:A couple of options (2, Insightful)

shutdown -p now (807394) | more than 4 years ago | (#29801861)

Here's a few options you might want to consider: 1) Use Office SharePoint Server 2007 to index the share

First, MOSS isn't free.

Second, have you ever actually tried using SharePoint 2007 text search feature? I dunno what it indexes, but finding anything in that afterwards is about as convenient as searching for a needle in a haystack.

There have been claims of some huge improvements in the upcoming SP2010, which is not surprising in light of Bing, but that's not released yet.

Everything (Search Engine) (2, Informative)

dxdisaster (1121229) | more than 4 years ago | (#29801371)

I guess it could work, although you can't index the files directly. You have to run a local copy and one on the server as an EPT Server. www.voidtools.com, although it seems to be down at the moment, so here's a link to the FAQ on Google's Cache: http://74.125.113.132/search?q=cache:fcYHcEJKH3UJ:www.voidtools.com/faq.php [74.125.113.132]

we have a winner! (0)

Anonymous Coward | more than 4 years ago | (#29802131)

Everything is EVERYTHING you could want

1. it is blazingly fast indexing drives
2. it is truly instant searching
3. it can be run in client/server mode

Federated Search (5, Informative)

Anonymous Coward | more than 4 years ago | (#29801375)

MS does have a solution, it's called Windows Federated Search. Windows 7 with 2008R2 has it .. there might be a way to do with Windows Desktop Search 4.0. Here's some info on it - http://geekswithblogs.net/sdorman/archive/2009/05/14/windows-7-federated-search.aspx

SSH and locate. (1)

dov_0 (1438253) | more than 4 years ago | (#29801419)

I use SSH to access my file server. Because I use it as a music server as well, I use X forwarding. As I'm accessing the actual server instead of just mounting fileshares (which I do also), I do the file searches directly on the server. Usually good old locate. Haven't really found anything that beats it yet, but then again I like the CLI. If you're running windows... Sorry.

Re:SSH and locate. (1)

Jesus_666 (702802) | more than 4 years ago | (#29801563)

In fact, shouldn't it be possible to mirror the locate database to the local file system so that local calls to locate will show the proper results on the share? Granted, you lose the ability to index the local file system but depending on the setup that might not actually be a loss.

Re:SSH and locate. (1, Informative)

Anonymous Coward | more than 4 years ago | (#29802315)

Well, if you are mounting a networked filesystem (e.g. use sshfs), then there's no reason to mirror it locally. But even if you did, GNU locate and slocate understand $LOCATE_PATH as a list of databases, so you can use both..

Of course, there's the whole mount-point issue, but either a hacked updatedb or a mount --bind lets you build the index with a fictitious prefix. e.g.:
mntpt=/ssh-$HOSTNAME
mkdir $mntpt
mount --rbind /home/music $mntpt
#for slocate:
updatedb -U $mntpt -o $mntpt/slocate.db
#not 100% sure on GNU updatedb syntax, but you get the point...
umount /ssh-$HOSTNAME

Now you can:
scp music@host.net:slocate.db .musicdb
export LOCATE_PATH=$LOCATE_PATH:$HOME/.musicdb
locate mytune

and receive results of the form
. . .
/usr/share/some/path/armytune.wav
/ssh-host.net/myalbum/mytune.ogg
. . .

-- that is, local and remote results together. Or just make an alias for slocate -l0 -d $HOME/.musicdb, if you want searches only on the remote volume.

p.s.

WRT GP's music server -> X forwarding, I've come to the conclusion there's no one right way to deal with music indexing/databasing/serving/streaming/playing, but for my needs, I've always found mpd the right solution. Just thought I'd throw it out there...

Re:SSH and locate. (1)

ls671 (1122017) | more than 4 years ago | (#29802015)

Hmmm... locate doesn't allow you to search within files. What about using rgrep or grep -r ?

find is great too (but slower on the first run before results get cached by the kernel, if you have enough spare memory) when you need to know which files have been modified in a given period of time, which files take more room on the disk, etc..

I usually disable locate for security reasons, at least use slocate ! ;-)

So I'd say I use find and rgrep ;-)

Re:SSH and locate. (1)

hey (83763) | more than 4 years ago | (#29802567)

You could make a web interface to locate.
(Only searches files names.)

Enterprise Content Management with Alfresco (5, Informative)

RicRoc (41406) | more than 4 years ago | (#29801439)

Yes, Google's Search Appliance (GSA) could be used, I have seen it used with limited success. The main problem was how to respect access control on documents: either you index them or you don't, and if you index them with GSA, sensitive data may show up in search results. Also, we had a lot of trouble "taming" GSA: it would regularly take down servers that were dimensioned for light loads.

I would suggest using Alfresco http://www.alfresco.com/ [alfresco.com] as a CIFS (Common Internet File System) or WebDav store for all those documents. This would give you the simplicity of a shared folder and the opportunity to enrich the documents with searchable metadata such as tags, etc. Each folder (or any item, in fact) could have the correct access control that would be respected by the search engine, Lucene. http://lucene.apache.org/java/docs/ [apache.org]

Alfresco comes in both Enterprise and Community Edition, it's very easy to try out -- even our non-techie project manager could install it on his PC within 10 minutes. Try that with Documentum, FileNet or IBM DB2 Content Manager!

DTSearch (0)

Anonymous Coward | more than 4 years ago | (#29801475)

DTSearch (http://www.dtsearch.com/). Although not free, you can install it on a server, schedule index updates, and have the client use the indexes (provided they are placed on a shared folder of the server).

Mirror it. (4, Funny)

palegray.net (1195047) | more than 4 years ago | (#29801479)

You could just rsync the shared volume to a local drive as frequently as needed and run the search engine on the local copy.

Re:Mirror it. (1)

codepunk (167897) | more than 4 years ago | (#29801543)

So you would rsync hundreds of gig to a local disk to index it? I think I would rethink that strategy.

Re:Mirror it. (0)

palegray.net (1195047) | more than 4 years ago | (#29801847)

I don't think you understand how rsync works. After the initial mirror, it only transfers changed files, which tends to be a really quick operation. For most organizations, that's going to be a hell of a lot less than the total disk usage on the shared volume.

Re:Mirror it. (1)

Trahloc (842734) | more than 4 years ago | (#29802281)

So I should rsync my 20TB server to my 320gb drive? What was the point of the 20TB file server then?

Re:Mirror it. (-1, Troll)

palegray.net (1195047) | more than 4 years ago | (#29802371)

That's a completely difference scenario, and you know it. How about addressing the actual use case in question? He's talking about hundreds of gigabytes, not tens of terabytes. By the way, many organizations would have the "20 TB file server" so everyone can access the data, not necessarily exclusively via the share.

You also didn't specify how often the bulk of your data changes (I'm guessing the bulk of his share doesn't change every day, and yours probably doesn't either). Add in the fact that a 1 TB drive costs about $90 (you can do better if you shop around) and your objection to simply doing incremental updates for a few hundred gigs of data to a local mirror looks pretty ridiculous. It gets even more absurd when you consider how fast you can rsync a crapload of data over even a 100 Mbit connection, let along increasingly prevalent gigabit networks.

Why slam the server with complicated indexing schemes, coupled with multiple users competing for all the data on a potentially frequent basis? That sounds like a much bigger headache than just taking the simple route, unless business requirements specifically stipulate a minute-by-minute ability to run reports on the data in question. Given that the submitter is dealing with very time windows for processing reports in the first place, I don't get the impression that this is the case.

Re:Mirror it. (1)

cl0s (1322587) | more than 4 years ago | (#29803189)

No matter how you put it, rsyncing a share on every persons computer that wants to search it, is not how you do it.

Re:Mirror it. (5, Insightful)

Makoss (660100) | more than 4 years ago | (#29802441)

Have you ever actually used rsyng on a decent sized file set? Determining the changed file set requires significant disk activity.

It's a certain win when compared to just blindly transferring everything. But if you think that rsyncing 20 changed files in a 100 file working set is the same as rsyncing 20 changed files out of a 2,000,000 file working set you are very very wrong.

Completely aside from the absolute insanity of suggesting that you replicate the full contents of the fileserver to every desktop, which has been covered by others.

Re:Mirror it. (1)

quanticle (843097) | more than 4 years ago | (#29801595)

It seems that the parent wants to merge a remote index with his desktop search so that he doesn't have to do this. Also, wouldn't giving each desktop its own copy of the data defeat the purpose of having a shared server?

Re:Mirror it. (1)

palegray.net (1195047) | more than 4 years ago | (#29801831)

Whether it defeats the purpose or not depends entirely on the organization's needs. If querying data every few hours in a local app is the objective, that can be met quite effectively with mirroring. Disk space is cheap.

Re:Mirror it. (1)

quanticle (843097) | more than 4 years ago | (#29801913)

Disk space is cheap when you're outfitting a single server. Outfitting even ten workstations with the same amount of disk can become quite expensive.

Re:Mirror it. (1)

palegray.net (1195047) | more than 4 years ago | (#29802407)

Okay, so how much do you figure it will cost to outfit a server to be capable of supporting an arbitrary number of users, running extremely IO intestive reports on a shared volume whenever they feel like it, including the network infrastructure required to support this? Oh, and don't forget redundancy for the server.

Trust me, I've learned from experience that the local disk space works out to be much cheaper for this sort of thing.

Why even ask? (0)

h4rr4r (612664) | more than 4 years ago | (#29801521)

slocate

Re:Why even ask? (1)

quanticle (843097) | more than 4 years ago | (#29801621)

Well, that runs into the problem the OP has discussed. If the data is present as a network share, it'd take slocate forever to index the data on the remote server. Basically, he or she wants a way to run slocate once on the server and have that index file be merged with all of the individual desktops. That way, each desktop wouldn't have to go through the effort of duplicating work.

Re:Why even ask? (1)

h4rr4r (612664) | more than 4 years ago | (#29801783)

So use ssh and run slocate on the server, or share out the slocate.db file.

Re:Why even ask? (1)

quanticle (843097) | more than 4 years ago | (#29801895)

Yeah, that could work, but I don't think it'd be as seamless as the OP wants. The user would still have to select which db file to use. Still, its a solution.

Re:Why even ask? (1)

h4rr4r (612664) | more than 4 years ago | (#29801959)

or here is a real smart idea:

In each users path place a locate_on_server.sh script that just runs "locate -d $PATHTODBONSERVER".

-d can take multiple database filename arguments, so you could have one locate_on_server that searches all your fileservers.

Re:Why even ask? (1)

zippthorne (748122) | more than 4 years ago | (#29803149)

Heck, you could skip the script and just alias locate

But you still run into the problem that it runs from the command line and the database is byte-order dependent.

Re:Why even ask? (1)

Trahloc (842734) | more than 4 years ago | (#29802321)

Well a problem with slocate is that it doesn't track changes live. Its basically a prettier version of a find dump into txt file then grep it. Something that tracks files live on the server end which can be searched remotely. Heck even a web interface or ssh would fit my needs, it doesn't need a pretty popup window thing.

the god-awful truth (0)

Anonymous Coward | more than 4 years ago | (#29801525)

How do you integrate your desktop search application with your remote file server without forcing each desktop to index the hundred gigabyte volume on its own?'

Really? I ssh to the fileserver, and then do something like

find . -name "*.php*" -print | xargs grep damnfunctionname

Re:the god-awful truth (1, Informative)

Anonymous Coward | more than 4 years ago | (#29801733)

Don't you mean,

grep damnfunctionname -R . --include='*.php*'

I guess if you're skipping perfectly cromulent indexing servers, you might as well needlessly break out the pipes, too.

Re:the god-awful truth (1)

beelsebob (529313) | more than 4 years ago | (#29801741)

And then you sit and wait for ages for find to finish, and then you realise that it only searches in the file names, and not the contents of the file. Of course, what I do, is ssh in, and then use mdfind, but yeh, find doesn't cut it on multi-terrabyte volumes, and especially not when you want to search on more than just the name.

Re:the god-awful truth (1)

kabloom (755503) | more than 4 years ago | (#29802289)

Wrong! The "grep" part of that command searches the contents of the files.
But if you think you can get away with just grep on large amounts of data, you really ought to learn something about how indexing works and how much faster it can make your searches.

Re:the god-awful truth (0)

Anonymous Coward | more than 4 years ago | (#29802383)

And then you man xargs, and maybe man grep, and realize that find is indeed being used to search the filenames for *.php*, and letting grep search the contents for damnfunctionname.

Anyway, any decent locate implementation should work fine for the file names, modulo a few details about mount-points. For searching in files? Good question, but I very rarely need to search inside more than a hundred files, so locate/grep works well enough for me, with find on the rare occasions I need fs metadata...

How about Spotlight? That works on shared volumes. (3, Informative)

thedbp (443047) | more than 4 years ago | (#29801527)

*ducks*

Re:How about Spotlight? That works on shared volum (0)

Anonymous Coward | more than 4 years ago | (#29801643)

Final Cut Server, CatDB, Mediabeacon and a number of other asset cataloging management tools could do the job and offload it in a reasonable sensible way .

Re:How about Spotlight? That works on shared volum (3, Informative)

Henriok (6762) | more than 4 years ago | (#29801775)

Yeah, my thought exactly? I wasn't aware that it was a problem searching hundreds of gigabytes on shared volumes. We have a couple of terabytes shared by our Mac servers and I don't think I've had search times longer than ten seconds over a couple of million files.. MS Office files, PDFs, movies, audio, pictures, photographs, text, HTML, source code.. all indexed with metadata and contents.

Even the days before Spotlight, using AppleShare IP Servers in the 90s, finding stuff on the servers was never an issue. It has always been so fast that I have never even reflected over that it was fast. Maybe I should use some other operating system once in a while to experience what the majority experiences. Or not.. I'd rather stay care free and productive.

Don't call me when you figure this out.

Re:How about Spotlight? That works on shared volum (1, Funny)

Anonymous Coward | more than 4 years ago | (#29801833)

As usual, Apple and Closed Source to the rescue! Is there anything open sores can do that comes even CLOSE?

Re:How about Spotlight? That works on shared volum (0)

Anonymous Coward | more than 4 years ago | (#29802003)

While the parent's response is rather snide, it nonetheless highlights an important truth. Granted there was a many-year gap where Spotlight didn't have server integration like good old Classic Mac OS, but it does now as of Leopard.

To be honest I assumed Windows would have the same thing already, given how obvious it is. Why don't they ever copy the good bits? :P

Re:How about Spotlight? That works on shared volum (0)

Anonymous Coward | more than 4 years ago | (#29802173)

spotlight doesn't work on my ntfs volumes.
or am i doing something wrong?

Re:How about Spotlight? That works on shared volum (0)

Anonymous Coward | more than 4 years ago | (#29802287)

No you're not seeing anything wrong, you were just distracted by him ducking.

Re:How about Spotlight? That works on shared volum (2, Funny)

RancidPickle (160946) | more than 4 years ago | (#29802069)

I've tried ducks, but they tend to nibble the occasional one or zero, and they leave an awful mess on the platters when they poop. Try Spotlight instead -- not as cute, but easier on the data, hardware, and the nose.

Use Microsoft Indexing Service (2)

Icono (238214) | more than 4 years ago | (#29801553)

One way is to set up Microsoft Indexing Service on the server with the shared drive. The MSC console app provides a search capability and one can also use the Indexing Service SDK for client apps.

Hmm (1)

ShooterNeo (555040) | more than 4 years ago | (#29801569)

Basically, you need your desktop search application to look at the index file on the remote file server generated by an instance of the application running on the file server. Technically, incredibly simple but I don't know which application currently available is divided into front and back ends like that. Maybe open source...

Locate32 (1)

EvilXenu (706326) | more than 4 years ago | (#29801605)

It really depends on what you are looking for. Are you wanting to index file names or do you want names and contents? For me, I typically know what I'm looking for based on file name, so Locate32 works out great. It's the Windows equivalent to 'slocate'.

Re:Locate32 (0)

Anonymous Coward | more than 4 years ago | (#29802203)

Locate32 works great for me too.

Your answer in song (0)

Anonymous Coward | more than 4 years ago | (#29801629)

dtsearch - google it

Autonomy (0)

Anonymous Coward | more than 4 years ago | (#29801663)

Autonomy search. Check them out. One of the best in the world. Obviously an enterprise solution and not inexpensive.

Cross-Platform (1)

pinkocommie (696223) | more than 4 years ago | (#29801765)

Wondering if there's anything cross-platform. I'm in the process of setting up an OpenSolaris fileserver (primarily to use ZFS/Raid-Z) and have both linux and windows boxes. It would be great to be able to have an index on each that could be read by a client app or a unified index perhaps.

BlackBall (0)

Anonymous Coward | more than 4 years ago | (#29801791)

www.blackball.com

They do federated indexing/searching without having to import data. Scans the data where it resides...

The XP search assistant dog (0)

Anonymous Coward | more than 4 years ago | (#29801797)

The dog. Always helps me sniff ouut the files.

RDP + CTRL-F (1)

isochroma (762833) | more than 4 years ago | (#29801855)

RDP into the machine, then CTRL-F on the volume, which is now local. Don't bother with indexing service anyway, it just wastes time. Life's faster without it.

Windows 7? (1)

craenor (623901) | more than 4 years ago | (#29801891)

I am probably missing something obvious here or misunderstanding the question, however, I am very happy with the search integrated in Windows 7. I have about a terabyte of data across two different volumes, and when I use the regular Windows 7 search I get instant, detailed results.

You don't (2, Interesting)

saleenS281 (859657) | more than 4 years ago | (#29801965)

You don't allow every client to index. There's been several suggestions already, but most enterprises intentionally DISABLE desktop search. It absolutely slaughters the share. It's not a big deal when one user is doing it... but when 5,000 are, the I/O load becomes unsustainable.

Re:You don't (1)

amRadioHed (463061) | more than 4 years ago | (#29802379)

You seem to be missing the point. He already knows having clients index the shares is a problem, that's why he's asking how to avoid that. Disabling searching altogether is not a solution because obviously he wants to be able to search.

Try Earth (2, Interesting)

theritz (1116521) | more than 4 years ago | (#29801987)

"Earth allows you to find files across a large network of machines and track disk usage in real time. It consists of a daemon that indexes file systems in real time and reports all the changes back to a central database. This can then be queried through a simple, yet powerful, web interface. Think of it like Spotlight or Beagle but operating system independent with a central database for multiple machines with a web application that allows novel ways of exploring your data." http://open.rsp.com.au/projects/earth [rsp.com.au]

Desktop search is not the way to go (1)

vmxeo (173325) | more than 4 years ago | (#29802119)

Seriously. You're probably going to want a separate server(s) for this job. You didn't specifiy what you're indexing, how often, or where, however I'll make some assumptions and point you towards an enterprise search appliance or product. Many will probably point you to Google Enterprise Search. I've worked with the search functionality withing Microsoft Sharepoint 2007, and it's (ostensibly) free spin-off, Microsoft Search Server. Again, you'll probably need to dedicate some hardware to this. In addition to crawling all the content, the search product will also need to index and present it to the user. This requires a front-end crawling role, back-end indexing role, and a database to keep all the data in. Dealing with several hundred gigs mean you'll want to have separate servers for all 3 (again, basing this off of my knowledge of MS products. YMMV). The nice part is that your users will work through a webpage, and the workstation won't be tied up doing any crunching of it's own.

Try starting here: http://www.google.com/enterprise/ [google.com]

or here: http://www.microsoft.com/enterprisesearch/en/us/search-server-express.aspx [microsoft.com]

www.X1.com (0)

Anonymous Coward | more than 4 years ago | (#29802157)

Haven't used it in a couple of years since they went away from their free model, but X1 (www.x1.com) rocked as a desktop search engine. They have federated search and plugins for a variety of server apps (sharepoint, etc).

Portfolio Server (Digital Asset Management) (2, Informative)

mrnutz (108477) | more than 4 years ago | (#29802439)

(Disclaimer: I work for Extensis)

Portfolio Server can continuously index files on SMB/CIFS (and AFP) volumes using a feature called "AutoSync". Web and Desktop (Windows/Mac) clients then search by folder name, file name, document text, or other metadata. Indexing and thumbnail creation takes place on the server, so clients are relieved of any cataloging workload and metadata is centralized.

http://www.extensis.com/en/products/portfolioserver9/overview.jsp

Look at e-discovery products (0)

Anonymous Coward | more than 4 years ago | (#29802457)

One of the products in this category will probably meet your needs.

Use Windows Indexing Service (1)

a.koepke (688359) | more than 4 years ago | (#29802581)

I am just embarking on a project to do exactly what the OP is asking for. Windows Server 2003 has an indexing service you can setup. http://www.windowsnetworking.com/articles_tutorials/Working-With-Windows-Server-2003-Indexing-Service.html [windowsnetworking.com] It is limited in its own form but provides the back-end tools you need.

Combine that with the next article from that site and you have a solution: http://www.windowsnetworking.com/articles_tutorials/Making-Windows-Server-2003-Indexing-Service-Useful.html [windowsnetworking.com]

This article shows you how to use the Indexing service from an ASP script. The solution I am working on will be done in PHP as it can also link to COM applications. This basically allows you to put a file search tool on your Intranet which is indexed and returns the results very quickly. Best of all, it uses existing software on Windows and doesn't cost any extra.

No simple options (1)

benjamindees (441808) | more than 4 years ago | (#29802809)

Since this is a task that benefits from some optimization, there are so many different combinations of file servers/clients out there, and so many use cases to choose from, there are lots of different solutions but not many good ones that will do exactly what you want out of the box.

So, in order to narrow it down, you need to decide exactly what you're looking for. What server are you using? What clients do you need to support? Are you wanting to just search file names, or contents, ownership and modification times as well? Do you need the index to be completely up-to-date, or not? How long can you stand to wait for results?

Try Xapian Omega (1)

Sarusa (104047) | more than 4 years ago | (#29802861)

I had this same problem not too long ago - we have a shared documentation tree with tens of thousands of documents that I wanted to index. I tried dozens of search engines in my spare time, most of which were just horrible (Beagle), were a nightmare to install for someone like me who's not a full or even part time admin (Apache SOLR), wouldn't allow cross platform access (lots of Windows ones, obviously), store a complete separate copy of every document (Alfresco, which didn't seem to have an option to ) and especially ones that had trouble indexing pdfs and MS Office docs (which we have a lot of). I'm not the IT guy, and have no budget for this, so Google Appliance was right out.

What I eventually ended up with is Omega on top of Xapian - http://xapian.org/ - it's not too hairy to install, indexes pretty fast, points back to the original files (so it doesn't duplicate everything), and can handle multiple repositories. It will also detect dups and not show them twice, though similar files are treated as completely different (which is probably what you want in the absence of something more sophisticated).

Two downsides: It can't do incremental update (unless that's changed recently) so you have to rebuild the entire index nightly. And the search is really sparse and ugly, which turned off some of my users, but you can rewrite the templates if you want to.

SPLUNK (0)

Anonymous Coward | more than 4 years ago | (#29803027)

You can use Splunk if your files are heterogeneous. It is fast and easy to set up. It is pretty good for doing relatively advanced searches against tons of data.
http://www.splunk.com/

The Google GSA is quite good, too, and better for non-technical users.

There's also an alternative to GSA and Sharepoint (1)

binaryspiral (784263) | more than 4 years ago | (#29803049)

From IBM and Yahoo called OmniFind. It runs on a desktop or server and can index multiple shares... and the basic version is free but offers a lot of functionality.

Although if your business is booming, a GSA is freakin' sweet.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...