Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

How To Manage Hundreds of Thousands of Documents?

timothy posted more than 5 years ago | from the you-have-the-planes-just-fly-files-around dept.

Data Storage 438

ajmcello78 writes "We're a mid-sized aerospace company with over a hundred thousand documents stored out on our Samba servers that also need to be accessed from our satellite offices. We have a VPN set up for the remote sites and use the Samba net use command to map the remote shares. It's becoming quite a mess, sometimes quite slow, and there is really no naming or numbering convention in place for the files and directories. We end up with mixed casing, all uppercase, all lowercase, dashes and ampersands in the file names, and there are literally hundreds of directories to sort through before you can find the document you are looking for. Does anybody know of a good system or method to manage all these documents, and also make them available to our satellite offices?"

Sorry! There are no comments related to the filter you selected.

Google wave (1, Funny)

Anonymous Coward | more than 5 years ago | (#28285587)

I think it's in beta though.

Re:Google wave (2, Insightful)

Gerzel (240421) | more than 5 years ago | (#28285995)

Or better yet talk to people who've done it before. I mean seriously there have been organizations managing hundreds of thousands of documents since the Roman Era, its nothing new.

Google to the rescue? (4, Insightful)

Shatrat (855151) | more than 5 years ago | (#28285597)

Isn't this the sort of thing that a google search appliance would be helpful for? Then you don't need to know the exact filename, just some specific information that can identify the file. This certainly solved my problem with having thousands of emails.

Hummingbird Document management (1, Informative)

Anonymous Coward | more than 5 years ago | (#28285613)

http://en.wikipedia.org/wiki/Hummingbird_Ltd

and

http://connectivity.hummingbird.com/home/connectivity.html?cks=y

Re:Hummingbird Document management (3, Insightful)

HikingStick (878216) | more than 5 years ago | (#28285667)

If they're going to consider Hummingbird, they need to be ready to cough up the dollars to get an *EXPERIENCED* Hummingbird administrator. If not, the product will be set up, but basic search functionality will be hosed because of some of the same issues in the original problem description (arising from differences in how the document's properties sheets are populated). If done well, it can be fantastic. If not, it users will hate it and do everything possible to avoid it (including installing their own NAS devices).

Re:Hummingbird Document management (3, Informative)

kiwimate (458274) | more than 5 years ago | (#28285869)

Yes, but it's not that hard to find someone. But Hummingbird (now owned by Open Text) or any other Document Management System. You've got a bunch of documents. You need to manage them. Ergo, a document management system.

Parent makes an excellent point, however: the single most critical component of a successful implementation is to get a skilled* consultant who can work with you to properly define the taxonomy. Everything else flows from there.

* If you go with Hummingbird DM, "skilled" means "not one of their over priced professional services people". They're dreadful.

Re:Hummingbird Document management (1)

pkluss (731808) | more than 5 years ago | (#28285873)

HikingStick is exactly right. We used Hummingbird for a while and it got out of hand and then it was every bit as bad as what you're experiencing now but we paid an arm and a leg for it (so it felt much worse). It's a decent product, but we ended up with migrating to SharePoint since it fit our needs.

Google Appliance (4, Informative)

TornCityVenz (1123185) | more than 5 years ago | (#28285635)

Re:Google Appliance (0)

Anonymous Coward | more than 5 years ago | (#28286013)

some additional enterprise search options -

Autonomy IDOL,
Microsoft Fast ESP,
Microsoft Search Server,
IBM OmniFind

(i believe there is a free version of omnifind called yahoo edition)

Re:Google Appliance (0)

Anonymous Coward | more than 5 years ago | (#28286277)

One thing I don't like about the GSA is that the licensing requires you to pay per year, otherwise the appliance will stop working:

At the end of your license term, the Google Search Appliance expires and no longer searches or serves data.

Source: http://www.google.com/support/gsa/bin/answer.py?hl=en&answer=18282 [google.com]

Maybe they give you the appliance at a discounted price which offsets the cost of the licensing, but it is nice to be able to purchase a piece of hardware without having to worry about yearly maintenance fees for it to keep working. It would be interesting to know whether the appliance can be outright purchased.

Organize the files (0)

Anonymous Coward | more than 5 years ago | (#28285641)

Sometimes you just have to do the work and not look for the magic bullet.

Re:Organize the files (1, Funny)

Anonymous Coward | more than 5 years ago | (#28285661)

I don't think this is one of those times, tough.

How not to do it (3, Funny)

Daimanta (1140543) | more than 5 years ago | (#28285655)

Store it on a single FAT32 partition and hope for the best. Only meant for people with guts or really really nice bosses.

Re:How not to do it (4, Funny)

CarpetShark (865376) | more than 5 years ago | (#28285847)

Pfft. This is a serious job. 320k floppies are what you want.

Or... you know... you could try managing those documents with a document management system.

Answered your own question (5, Insightful)

Sir_Lewk (967686) | more than 5 years ago | (#28285657)

and there is really no naming or numbering convention in place for the files and directories.

I think you already know the answer.

Re:Answered your own question (4, Informative)

peektwice (726616) | more than 5 years ago | (#28286037)

Absolutely correct. However, I would take it a step further and say that you need a document management system that manages security, meta-data, retention, disposition, etc. Examples are Documentum, IBM FileNet P8, Alfresco, etc. Here's a place to start readin: http://www.cmswire.com/ [cmswire.com] .

Re:Answered your own question (1)

hedwards (940851) | more than 5 years ago | (#28286119)

In all honesty, I tend to agree with what you're implying. A database solution is great, if you put it into place immediately, otherwise you have to spend a lot of time getting all of the items into the database and properly tagged and sorted.

One way or another the work is going to have to be done, the relevant question is how easily will it be maintained, how will it handled increases in size and how easily can it be backed up.

I'm doing this sort of thing right now with my digital images. Thankfully, I can fall back on meta data to do most of the heavy lifting, which just leaves the process of creating subjective tags for pulling up random files and figuring out a decent backup system. I've been doing it all this week and haven't found a proper solution. Which is really a minimal hassle compared to what the OP is dealing with finding the files and reading them and putting them into some reasonable category, presumably many were created by employees no longer at the company.

To boil it all down a bit, make absolutely sure you've got all the tags you're going to want in, a file hierarchy of some sort for storing the physical files, and the thumb screws for anybody that's not willing to do their part. A system doesn't stay neat and organized on it's own, just because it's residing on some sort of database doesn't mean it's automatically easy to find things. Best bet for files is to organize those by roughly date, depending upon how many, that may require by day, week, month or year to keep them in a reasonable place to find.

Take it relatively slow demand that any new files be created within the realm of the new system and make regular effort at putting the older files into the new system in a consistent manner.

Re:Answered your own question (2, Insightful)

nine-times (778537) | more than 5 years ago | (#28286193)

Yeah, some people mentioned Google appliances, which I suppose is a sort-of solution. I've never used one of those internally, but I wouldn't trust that to be the end-all solution to your organizational problems. What if there's a file that Google can't read or gather good metadata for? What if you're searching for common terms, and the file you're looking for is on the 75th page? What if you're not remembering the correct search parameters and so your file just isn't turning up in your searches?

There's really no substitute yet for real organization and discipline. The first thing you should do is define your needs/parameters. Does everyone from every site need read access to all files? Do they all need write access? Most likely, the answer to both of these questions is "no", so narrow it down to specifically "who needs access to what". That will help you figure out the rest of these things. Also ask, who needs to be able to find which documents under which circumstances? What information will they have? You're going to want to use those pieces of information in your organization so that people can intuitively find the files that they need, without necessarily needing to see everyone else's files.

Come up with a hierarchical organization for your files, requesting user input if appropriate. Then create a directory structure that matches it. Make sure you've communicated the organization clearly to your users, and try to get them to use it.

If necessary, use directory permissions to try to restrict writing files to appropriate places. For example, if you break down the file structure by particular engineering groups or departments, then only provide write access to members of that group or department. Designate the head of that department as the person responsible for organization within that folder. If need be, restrict write access in a particular folder to only one person, and make that person responsible for checking files in and maintaining the organization for the group or department. Do the same sort of control with individual satellite sites, if appropriate.

Be a little tiny bit of a control freak, but you might want to give people a particular folder share where they can transfer files in a more freeform manner in a pinch. Someone might want to share one particular file, back something up for a minute, or whatever, but make it clear that this share is completely insecure and temporary. Let people know that everyone has access to that share, anyone can delete any file, you won't be backing it up, and in fact you might be clearing it out (deleting it) on a regular basis. Make a habit of deleting it all on a regular basis, or people will start dumping everything there to sidestep the organization. To be careful, you might want to actually move everything into a non-shared folder for a week, and then deleting it later, so if someone shows up and says, "Oh crap! You deleted business-critical information!" you can sigh, and say, "I'll see what I can do, but you really shouldn't store business-critical data there."

So, to go back and summarize: Come up with an organization, stick to it, enforce it, and retrain your users to use it properly.

it's all about the index (2, Informative)

Hognoxious (631665) | more than 5 years ago | (#28285663)

The lack of a naming convention for the filenames and directories is neither here nor there. What matters is how well it's indexed.

Now I use naming conventions for my files (photos ,mp3s etc). Am i contradicting myself? No, it's because I don't have enough of them that I need a separate index.

OpenDocMan (1)

loVolt (664437) | more than 5 years ago | (#28285677)

OpenDocMan has helped a lot with our Graphics and Engineering department issues, similar to yours,
ldap access to storage helped sort out who could put what ..where. The implementation took a bit of
time to get the original files files into right locations, but it's easyer to manage now.

 

Tiered storage (0)

Anonymous Coward | more than 5 years ago | (#28285687)

You have massive project on your hands! You need a tiered storage solution and document management system that is back-end (Stored) on SAN storage. How big of a budget do you have to solve this problem? Double it.

Tiered storage requires the business to prioritize data by levels (1...n) 1 is highest, 2 is less than one, 3 is less than 1 and 2.
Generally 3 levels are employed sometime more.

Does the mgmt understand the complexity of the issue? Do they support the project? You have a lot of data gathering to do before you can even determine what you need.

Godspeed!

Alfresco or SharePoint (3, Insightful)

flydpnkrtn (114575) | more than 5 years ago | (#28285697)

Or some other corporate content management system

Re:Alfresco or SharePoint (2, Informative)

atrizu (1434023) | more than 5 years ago | (#28286089)

I'd recommend a Content Management Solution called ArchivalWare [archivalware.net] .

Re:Alfresco or SharePoint (3, Interesting)

flydpnkrtn (114575) | more than 5 years ago | (#28286161)

...and I found an article [networkworld.com] backing up Alfresco pretty well:

"You can now stand up an Alfresco Labs server next to a SharePoint Server, and Office will not be able to tell the difference between the two," said John Newton, CTO of Alfresco. "But we are offering considerably more scale than SharePoint can deliver," he said.

Start with.... (1)

s0litaire (1205168) | more than 5 years ago | (#28285699)

...Setting up a standard naming convention and make sure bosses and managers enforce it. It won't help older files but will stop it getting worse!

Then if you can be bothered, you can start going through older files and updating the naming conventions or entering them into the Document management system of you choice...

Re:Start with.... (1)

LandDolphin (1202876) | more than 5 years ago | (#28285943)

Hire a temp employee for $10/hr to go through and rename everything, or do any other clerical grunt work.

Re:Start with.... (1)

hedwards (940851) | more than 5 years ago | (#28286143)

Only problem is that this is an aerospace company, they might get lucky finding somebody that's capable and willing to work for peanuts, but I wouldn't count on it. Realistically they may require somebody with technical know how of what the files actually are in order to properly categorize them. A temp might be able to handle reformatting the file names based upon information in the name, but probably not much more than that.

Use a cataloging system (4, Interesting)

vondo (303621) | more than 5 years ago | (#28285707)

I happen to have written one:

http://sourceforge.net/projects/docdb-v/ [sourceforge.net]

could be what you are looking for. Of course, it'll take effort to catalog the documents.

SharePoint? (4, Informative)

tekiegreg (674773) | more than 5 years ago | (#28285711)

I know I'm gonna get hit for blurting out the Microsoft Solution but...give SharePoint a shot...

Re:SharePoint? (4, Insightful)

goffster (1104287) | more than 5 years ago | (#28285777)

Why should you give sharepoint a chance? Even it it works well, it is proprietary and you are locked in.

Re:SharePoint? (4, Interesting)

moosesocks (264553) | more than 5 years ago | (#28286305)

Why should you give sharepoint a chance? Even it it works well, it is proprietary and you are locked in.

No less proprietary than other similar systems. Getting files in/out of Sharepoint is a fairly trivial process, and the API is open enough to craft your own migration plan if you ever decide to move away from it, given that everything else is equally (or even more) proprietary than Sharepoint.

MS Office might be proprietary, but is so widespread that it's a 'standard' in its own right -- Sharepoint integrates excellently with Office, and keeps your users happy.

I'm typically not one to advocate the use of Microsoft products. However, Sharepoint worked just fine when I was using it, and is definitely a huge step up from any of the competing products at the same price-level.

Re:SharePoint? (1)

EnhancedPanda (1574099) | more than 5 years ago | (#28285997)

I am going to have to second the Sharepoint suggestion, we have been using it for 2 years now to do exactly what you need. But I would recommend investing in SANS, no more vpn.

Re:SharePoint? (0)

Anonymous Coward | more than 5 years ago | (#28286303)

Posting anon because I am shilling the company I work for, But docuware - www.docuware.com is exactly what this person needs. Document management software scan archive emails index OCR all in one, Integrations with all ODBC (by all I mean most) compliant databases - can use mysql oricle or mssql as the main foundation (bundled with mysql) short answer, google desktop and start indexing documents

Re:SharePoint? (2, Informative)

moosesocks (264553) | more than 5 years ago | (#28286227)

Mod parent up. I helped create a tag-based document retrieval system for my former employer using SharePoint. It actually worked quite well.

Use the right tool for the job. It's got a nice interface (that's also very familiar-looking to most users), scales well, and integrates well with MS Office, which (like it or not) is used by 99.99% of the corporate world. It also handles non-office files just fine.

That's not to say that Unix-based solutions don't have their place. During the migration, I actually employed a series of shell/python scripts to assist with several of the more mundane aspects of the process. These probably saved us a couple thousand man-hours that would have otherwise been spent categorizing the files.

SharePoint (3, Informative)

PIPBoy3000 (619296) | more than 5 years ago | (#28286265)

NASA is a big user of SharePoint, strangely enough. My coworkers run into their folks at conferences from time to time.

I personally am ambivalent about SharePoint. Its roots are in document management, so it seems to do that relatively well. The publishing features are fairly nice as well. I don't think it's the best system for making web sites, but it may some day get there. Currently it feels like a 2.0 product (the magic rule is to never buy anything from Microsoft before 3.0).

There are gotchas. SharePoint is tightly coupled with your clients. If everyone accessing the documents are using the latest version of Office, you'll be okay. If not, you'll run into problems. You may also need to throw a lot of hardware into SharePoint, as storing files inside of SQL has some built-in inefficiencies.

Still, some of our users seem to love SharePoint, so it might be a good option for you.

Re:SharePoint? (1)

jockeys (753885) | more than 5 years ago | (#28286283)

+1.

I'm no MS fanboy, but Sharepoint is great. I work for a large engineering company and we use it to organize blueprints, as well as pretty much all of our non-code documents. Even the most clueless HR-types can use it, and it's really not hard to set up.

Sharepoint (0)

Anonymous Coward | more than 5 years ago | (#28285723)

invest in sans and get a sharepoint server. you dont need sans for sharepoint though.

Google Search Appliance (1)

yakatz (1176317) | more than 5 years ago | (#28285727)

Google Search Appliance

Document Locator (0)

Anonymous Coward | more than 5 years ago | (#28285731)

I'm not affliated with them, but I do use their product, and its a steal for the cost.

www.documentlocator.com

You get version control, auditing control, web access, and a bunch more stuff.

Cygnet (1)

Rob Kaper (5960) | more than 5 years ago | (#28285745)

Cygnet ECM might work for you.

Documentum (2, Interesting)

trondwn (1574085) | more than 5 years ago | (#28285749)

use EMC document solution, where you have all documents i central database with metadata that can describe content. And can be accessed thru cached server from different sites.

ECM (0)

Anonymous Coward | more than 5 years ago | (#28285755)

Look into Enterprise Content Management solutions, there are many. Many of them are very expensive but depending on your needs it may be worth it. Several examples are EMC Documentum, Alfresco, and even Sharepoint to an extent. Alfresco is open source so that may be a good place to start.

Just the doc, or collaboration? (1)

geekoid (135745) | more than 5 years ago | (#28285757)

If you need to use just plain documents, store then in on big directory, update the meta information.
Let people move links onto there system and organize the links how the like, but don't let them move the documents.

Think iTunes for documents. I loath that example since I have set this sort of thing long before iTunes came around.

If you on collaborative use of your documents get something like this:
Jive.com

Document Management to the Rescue (1, Informative)

Anonymous Coward | more than 5 years ago | (#28285759)

Sounds like you need a real document management system.

Depending on your requirements, you could go with something open source like Alfresco or one of the big boys like EMC Documentum or IBM/Filenet P8. Either way, you will end-up with an indexed repository of documents that makes it easy to to find old documents, add new ones, etc (assuming you and/or your integrator do the project correctly). It will also provide a web front-end so you don't have as much killer WAN traffic as you do now.

With a good document management system in-place, you are also on your way to having a workflow and other benefits as well. e.g. When Bob submits a document with XYZ as an index value, automatically tell Joe that it is in and ask Joe to approve it. When Joe approves it, tag it "Approved", and let Jim know.

Depending on your requirements for document retention, archiving, e-discovery, etc. the document management system can help you fulfill all of those automatically.
 

Simple answer... (1, Interesting)

Anonymous Coward | more than 5 years ago | (#28285763)

Hire human beings to sift through it and label each file with a numbering/labeling system devised by your engineers. The human mind is a relatively inexpensive and already well designed piece of machinery. A few dozen of them given enough time can work through those hundreds of thousands of document and get them sorted correctly. The problem you have, is that you have unsorted, improperly labeled material. It is cheaper to hire sufficiently (or even insufficiently) evolved groups of people than to invent a machine capable of doing so. And, with the economy the way it is, you'll be doing everyone a favor by giving them years of employment. When the Manhattan project needed to create a large excess of fissile material for the war with Japan, and with all the men away at war, they hired dozens of women to sit at machines; turning knobs, checking meter levels, verifying output. The scientists themselves did not even need to be there, they designed a process and the women were trained in it and followed it.

Document management software (4, Insightful)

Wrexs0ul (515885) | more than 5 years ago | (#28285771)

Most print companies like Xerox have their own proprietary Document management [wikipedia.org] tools you can buy, and a bunch of CRM and ERP solutions (like OpenERP - it's free AND Open Source) provide some good simple document searching and indexing tools.

Really it comes down to how complex you want searching to be? Are there specific keys in the document you could index by? Do you require the full-text search capabilities of a Google search appliance?

A really good solution I've come across for some clients in Edmonton is Called MetalTrace [traceapps.com] by Trace Applications. Don't let the name fool you about the specificity, software like this can Scan, Index, and even read barcodes on all sorts of documents then let people search for it via the web. Their "killer-app" has multiple user-defined document types with multiple search fields, combined with some back-filing (digital and scanning) really saved the day.

Do your research though on "Document managment" and see what product best fits your needs. It's a really well established field so reinventing the wheel is a little masochistic... not that there's anything wrong with that. ;)

-Matt

Re:Document management software (0)

Anonymous Coward | more than 5 years ago | (#28286273)

If you are an Aerospace company, you must not be ISO standard, which means you are under imminent risk of the mighty audit hammer.

You're supposed to have a CHANGE CONTROL SYSTEM!!! for your documents.

I like Agile, but you can implement it wrong. Parent post also has some good material.

Just so you know, what you are about to install, will help or hinder the entire company.

Knowledge Tree (3, Informative)

crackervoodoo (663384) | more than 5 years ago | (#28285773)

http://www.knowledgetree.com/ [knowledgetree.com] If you're looking for a no-cost (read as no license fee) option then Knowledge Tree Community Edition is a decent Document Management tool. We've been using it for a couple of years.

Enterprise solution: (0)

Anonymous Coward | more than 5 years ago | (#28285785)

Don't know enough about your company, budget, policies, real requirements. But throwing Documentum at it is probably good. Either that or something simple like Sharepoint. Both provide rich web based access and documentum can support long term archiving and version control. I have no idea how google appliances would do jack for access. However, if you need search there are google or cheaper/better commercial solutions from companies that actually do it right.

try wiki (1)

bitsmith (841565) | more than 5 years ago | (#28285793)

JamWiki.org, for instance, has search capabilities built in. Has security built-in and easily mnageable. You can upload the documents and even migrate them to wiki format later. Keeping the documents in near-text open format will help you re-migrate them into the future sometime later.

Worldox (0)

Anonymous Coward | more than 5 years ago | (#28285797)

http://www.worldox.com

Document management is generally very good. Forces people to fill out required fields. I've seen it implemented in law offices.

Anonymous Jonas (0)

Anonymous Coward | more than 5 years ago | (#28285801)

http://cdsware.cern.ch/invenio/index.html

ask google (0)

Anonymous Coward | more than 5 years ago | (#28285805)

http://www.google.com/search?hl=en&safe=off&q=document+management+system

I worked on (0)

Anonymous Coward | more than 5 years ago | (#28285815)

a web application 4 years ago at Konica-Minolta. It is called DocuBreeze. I am not sure whether you need all the functionality it provides, but you may want to take a look. Google Docubreeze and you will find it.

I am no way related to this company any more and I have nothing to gain from recommending this to you.

Your Website (0, Redundant)

Anonymous Coward | more than 5 years ago | (#28285819)

You forgot the link to your website: www.nasa.gov

Knowledge Tree? (1)

gilesjuk (604902) | more than 5 years ago | (#28285825)

I used an old version a while ago and it was pretty good then. Does versioning and other things.

http://www.knowledgetree.com/ [knowledgetree.com]

Get yourself a good management system. (2, Informative)

Anonymous Coward | more than 5 years ago | (#28285829)

While this may be an odd suggestion, here's two things:
1) Get yourself a damn good document or content management system. Get it set up on the baddest machines you can afford.Overshoot the capability you need, so that you have room to grow.
2) Get a librarian to look at the kinds of documents you create, and develop a system to catalog documents while maintaining reasonable standards for file names. As the super simplest system, maybe document names that indicate (at a minimum) what project or what overhead department they belong to, a broad category of subject matter, and if it's versioned, a version number.

I tried to bludgeon a small company I worked for (around 40 engineers, one overworked Q&A person, and one system administrator) into moving towards a storage system for word documents that was not "Create a new folder for each version of the document set, place them all in the right folder, and if you don't Ray will eat your head." We wound up using (of all things) Perforce SCM to house fifty thousand word documents, and were starting on putting actual code revisions for automated test sets into the system when our avionics testing focus became a serious liability, and overhead workers were drastically cut. (Why have one Q&A guy and one system admin guy? We can get an intern to do BOTH!)

Get a Document Management System (1)

bsy-1 (169906) | more than 5 years ago | (#28285833)

Any of many document managment systems. They allow the extraction of meta data, which is in turn used to 'find' the document you are looking for. Nearly all contain some security settings and a viewer for many types of files. One thing to note. This magic doesn't happen by itself, if you get stuck doing this, be prepared for a. No one really knows how they want to do this, they all want to wonder if one of the many docs has their answer and have the correct doc located and opened for them. b. you are about to become a stranger to all those who know you outside of work.

Re:Get a Document Management System (0)

Anonymous Coward | more than 5 years ago | (#28286149)

We manage hundreds of thousands of documents with Laserfiche. It works pretty well and is not as expensive as some of the other systems out there. (still not cheap..) They key is to do your homework and find the one that is right for your needs.

Indexing and Cataloguing (1)

Zerocool3001 (664976) | more than 5 years ago | (#28285843)

If you don't like the idea of sending your information to google to have it indexed, you can look into some server side applications (with associated client apps) that do the indexing and searching for you. I'm not familiar with Windows ones (although I'm sure there are some) but there are quite a few for Linux and primarily Spotlight for the Mac. The option have the actual indexing done server side would save on your bandwidth tremendously. You may also want to consider using a different filesystem, one that has indexing capabilities built in.

I got 3 letters for you (-1, Flamebait)

Capt.DrumkenBum (1173011) | more than 5 years ago | (#28285859)

S V N

Do it right, or just don't freaking do it.

Re:I got 3 letters for you (0)

Anonymous Coward | more than 5 years ago | (#28286221)

S V N

Do it right, or just don't freaking do it.

What SVN repository manages hundreds of thousands of documents between users that do not know how to deal with SVN?

Lots of ECM solutions out there... (2, Informative)

jwilkins13 (661548) | more than 5 years ago | (#28285861)

Sure, with any number of ECM solutions. At the simplest end many of them simply enforce naming conventions; at the more robust end, they support many different file types for viewing, indexing, etc. and can also provide rich metadata on a document-by-document basis. Some of them have been named in the comments, including but certainly not limited to SharePoint 2007, Cygnet, Documentum, Open Text, FileNet, etc. Any system worth looking at has a web-based interface, at least for searching, and many of them offer for more meaningful interaction as well. Alfresco, Hyland, and SpringCM all have web-based ECM solutions and more comprehensive web-based offerings are available all the time. Oh - and if you're aerospace there are a number of regulatory requirements for information management you'll need to comply with, which does complicate the situation but spending the ducats for software and/or consulting help is probably cheaper than whatever your litigation and regulatory audit support processes cost today. Hope this helps, Jesse Wilkins ECM and other stuff consultant jwilkins13 at gmail dot com

Re:Lots of ECM solutions out there... (1)

NeoSkandranon (515696) | more than 5 years ago | (#28286109)

I don't think electronic countermeasures are gonna help in this case.

Wow (0)

Locke2005 (849178) | more than 5 years ago | (#28285901)

"We're a mid-sized aerospace company with... satellite offices. Wow... apparently the state-of-the-art in aerospace is a lot more advanced than I thought! What kind of rocket do you use for commuting to those satellite offices?

Shameless plug (0)

Anonymous Coward | more than 5 years ago | (#28285903)

I work on a product whose focus is to address this very problem. Check us out at http://www.kalexo.com/

It's integrated file/document/project management. It's targeted at industries that are geographically spread far and wide but need collaborative, secure access to common files to work on stuff.

where is the slowdown? (1)

the_denman (800425) | more than 5 years ago | (#28285909)

I think step one is to pick a storage/naming convention and stick with it. Also depending on your needs a document management system could help. The other thing I would do is look and figure out where the bottleneck is for your speed issue, is it the vpn connection, the network not being able to keep up, or the computer running samba. Once you know more of where the slowdown is work on that spot.

Switch to Apple... (3, Informative)

Tibor the Hun (143056) | more than 5 years ago | (#28285911)

I only partly jest, I know such a thing is damn near impossible to actually do, but in our Mac shop, such things are trivial. With one click of the mouse we enable spotlight searching on our Leopard AFP server and bam... all the clients have almost instantaneous search access to their docs.

nothing beats a folder structure and naming (2, Insightful)

fxdgear (1574093) | more than 5 years ago | (#28285923)

I'm gonna say nothing beats a proper folder structure and naming convention. I'd also recommend using svn. Also spend some time to develop some macros to assist in the creation/saving/retrieval of said documents from the repository. Maybe create some standard templates too... just my 2cents!

Who tagged this delete? (0)

Anonymous Coward | more than 5 years ago | (#28285927)

That's such a silly solution to the problem.

Shift+Delete

works so much better!

OpenAFS (0)

Anonymous Coward | more than 5 years ago | (#28285935)

OpenAFS will speed up local access, and also provide an automatic backup of important files at all the satellite offices. (could be a full backup if you mirror everything).

As for the lack of any naming convention or other organization - first, the fact that you somehow manage to continue operating with a hundred thousand documents indicates that you actually DO have some form of organization in place.

If it isn't structured - get on it.

WebDav (4, Informative)

SplashMyBandit (1543257) | more than 5 years ago | (#28285949)

There are a few options:
  • For relatively unstructured data without versioning you could serve them over HTTP with WebDAV (Apache) and use your existing HTTP security mechanisms. You wouldn't believe how relieved I've often been when I can get my (secured) resources from home-base while located at a clients site.
  • My outfit uses KnowledgeTree for versioned stuff (http://www.knowledgetree.com/)
  • Or you could embrace your dark-side and use Microsoft SharePoint (plus, with all the Microsoft bugs you'd have a job for life until your employeer goes bust). If you are a friend to your company you won't do this, plus your outfit has engineers and the good ones can spot trash solutions.

If you users are naming their files with strange characters in them (assuming it's not due to Samba) then they will just have to live with it, you won't have time to sort out all the wierd names that (mostly MS-Word) users give to their filenames. The primary objective should be to give your users access to the files. Making the directory listing pretty ought to be a secondary concern.

Most big companies seem to use.. (1)

fluffernutter (1411889) | more than 5 years ago | (#28285961)

..something like Filenet or SAP. Sound like you have big corporation needs, get a big corporation solution.

Sounds easy enough... (0)

Anonymous Coward | more than 5 years ago | (#28285969)

If you need an easy way to find things, your looking at a good searching algorithm. In order to use a good searching algorithm I'd have to recommend the bubblesort first. That way you don't need to worry about the data for a good millenium or two!

Mindoka Technology Corp. (1)

Alethes (533985) | more than 5 years ago | (#28285975)

Mindoka (http://www.mindoka.com) has a document management product that is designed to solve the problem that you have.

Riverbed Steelhead mobiles (1)

DecepticonEazyE (1165265) | more than 5 years ago | (#28285983)

Put Steelhead mobile on all the clients. Document transfer over the VPN will GREATLY improve. Since it's mostly text/pictures, there will be so much duplicate data that doesn't need to be transferred over the wire multiple times, the round trip time will decrease so much they'll forget they're on a VPN.

FileNet (4, Interesting)

Ohio Calvinist (895750) | more than 5 years ago | (#28285985)

I worked at a place that used FileNet [filenet.com] , which is now an IBM product, to do this sort of thing. We had millions of scanned documents in the system. I wasn't personally very impressed with it, in that whenever anything "bad" happened, you had to call IBM because finding support online was impossible, and at that they support wasn't very good. It was also a very picky system, those seemed to handle the load well. If you go with it, I strongly encourage doing it for UNIX/Oracle because it screamed "poorly ported" when we used it for Windows/MSSSQL. It has an API for integration, but it is also, poorly documented and would take some time to integrate into your existing business systems.

This is more of a rant at this point, but it is a stop-gap solution that allows people to continue to use outdated business processes storing important data in image formats or in documents scattered about with minimal indexing/search capabilities, rather than analyzable "data" that can lead to "information." I always take the position that if the goal is something on paper, or the goal is to store something that "was" on paper, it is time to rethink the business process to see if we can automate it, or store/present the data electronically in the first place. The old school fights against it, but no one has ever been able to say it wasn't more efficent in the end and enabled IT to say "yes we can" when the next great idea came along versus "here is a stack of papers, figure out $trend."

Technical issues aside (3, Insightful)

Vroom_Vroom (29347) | more than 5 years ago | (#28285987)

Hire a document manager / clerk person who will create order. Your engineers won't.

SQL... nuff said... (1)

Youngbull (1569599) | more than 5 years ago | (#28285989)

I think the right option for you would have to be ordering the documents in a database and serving them up through a website. I think that would be helpfull for your satelite offices since mapping shares through samba over VPN is sometimes unstable and always nontrivial. Besides the system doesn't seem to be working for you. You really don't have to be that proficiant with functional webpages to make something like this, especially if you use ruby on rails. A ruby on rails guy would probably use only a couple of hours to make such an application. Then you could have functionality like searching and sort by author, department, type and so on.

Just Don't Use Livelink (1)

Myrv (305480) | more than 5 years ago | (#28286041)

Can't really suggest a good document management program but I can tell you one to avoid. We use Livelink at my place of work and its indexing and search capabilities are horrible (some would say non-existent). For example every document added to Livelink gets a document number assigned to it. One would expect to be able to retrieve that document by using the same document number but if you enter it into the search bar Livelink returns no results found. Huh? Not to mention some odd UI behaviours like when you add a folder to the favourites box the original folder disappears from the standard file listing (meaning there is no single canonical listing of files and directories, you need to always look in 2 places).

Institutional repository? (1)

sidb (530400) | more than 5 years ago | (#28286043)

What kind of documents are they? If they're mostly text and you want versioning, the only drawback to subversion is getting people to learn the tools, but that might be too much.

If they're archival/static documents, an institutional repository could work. Something like DSpace isn't that hard to deploy and will provide basic archival and search features.

The middle ground between those two solutions is probably what you want, though. Everyone I work with uses SharePoint for that, and I hate recommending proprietary lock-in.

Laserfiche (2, Informative)

wguy00 (985922) | more than 5 years ago | (#28286045)

Laserfiche (or LF) is just what this is for. It is DOD, DOJ certified and crap, and is used by all branches of the military and several other areas of the government as their document management system. With several different software offerings, just about any situation can be taken care of. It's features include the ability to search based on document name, template information, or OCR'd text (which the software also takes care of). With add-on features such as Quick Fields, it may be able to automatically sort, add template information, OCR, name and then store the documents. It really is a nice way to go. Satellite offices can access and be either full or read-only users. It has the ability and modules to connect to just about any other type of data/information system (GIS, financial software, etc) and is very scalable.

I was a tech for 5 years with a LF VAR. I'm not there anymore. We were constantly cleaning up messes left by other document management systems. Take your time with this thing and really plan your naming convention, folder hierarchy and user setup. It's easier to get it right(or as close to it as possible) then going back and having to fix it later. A good LF VAR should help you with this. Definitely check references of competing companies. Some VAR's are A LOT better than others.

I work for a part 121 air carrier (1)

maric (770402) | more than 5 years ago | (#28286061)

we have extensive documentation and tracking needs. we use two sets of software for records and also keep a hard copy for long term storage. For tracking parts on/off and hours in service, TSO TSI etc... we use TRAX Evo2 We scan all written paperwork into a database which is interfaced with via Alchemy. This allows us to view the current status of all of our aircraft and their parts and track the paperwork for each action taken. Alchemy has a browser interface and we use IE to access it. this allows for a person to access the documentation from any of our stations and or offices internally on the network. Both Alchemy and TRAX are acceptable to our local FSDO. The hardware setup for this is not something I can shed light on as I do not get to play with computers that are ground bound. hope that helps, maric

Organize.... (1)

Fallen Kell (165468) | more than 5 years ago | (#28286065)

As may have been pointed out, organizing the files is really the best way. Develop a strict schema for naming conventions as well as a hierarchical directory structure for maintaining and organizing. Something like:

/projectname/projectpart/data (contains the final draft of any document) /projectname/projectpart/working (contains files that people are modifying so that they can be merged/checked in to the data dir) /projectname/projecttpart/misc (contains misc. notes or files that need to be filed with the project)

The "projectpart" dirs are really just logical groupings of data/files for the project. Say you are designing a plane, well, break it up into relevant systems, like electronics, power plant, structure, etc., and each of those are the "projectpart" directories. The "projectname" is simply the overall project itself, be it the name of the plane, maybe the name of the contract, etc.

windows Terminal Server (1)

smalltimecrime (1496551) | more than 5 years ago | (#28286077)

The OP did not mention exactly how many remote branches or computers need to access the documents at once, however, windows Terminal Server licenses aren't too expensive and the remote desktop experience is silky smooth. Also the documents would all reside on a central server raid array or NAS device and never need to travel over the internet to remote sites. This would also free up massive amounts of bandwidth over the VPN, considering TS just needs an internet connection and uses SSL encryption. (although I don't know what you would even need a VPN for after making this conversion)

Re:windows Terminal Server (1)

JustNiz (692889) | more than 5 years ago | (#28286211)

>> windows Terminal Server licenses aren't too expensive and the remote desktop experience is silky smooth.

BWAHHAHAHAHAAHAHAHAHAHAHAHAHAHAHA

Thanks for that. I needed a laugh. Silky smooth? Having to do anything remotely technical via Terminal Server is the biggest pain in the butt I've ever experienced.
BTW if you're really not a paid shill for Microsoft then WTF are you smoking?

Who else read this and thought... (1, Interesting)

tlambert (566799) | more than 5 years ago | (#28286091)

Who else read this and thought... working in a satellite office for an aerospace company would involve a lot of cool travel perks?

-- Terry

Odd that the next story... (2, Informative)

ak_hepcat (468765) | more than 5 years ago | (#28286115)

Odd that the next story has a great idea for document management right in the summary...

Hadoop!

Sharepoint (1)

jayhawk88 (160512) | more than 5 years ago | (#28286125)

...seems like a natural solution for your connectivity issues, or perhaps whatever the open source variety of Sharepoint is. You really do need to tackle the naming convention question though. You can have all the file indexing you want, but sometimes a nice, logical, clean file name will get you what you're after much faster than any kind of searching.

It's going to be horrible, painful, thankless work that will put you on the shit list of just about every department manager and administrative assistant ("You want me to rename how many files?"), but it has to be done.

Re:Sharepoint (1)

cfryback (870729) | more than 5 years ago | (#28286295)

We run a EDMS system for our local council here - doesn't matter about the filename, it is how it is all indexed. Too many people here are thinking that you need to re-name EVERY document. I don't have any experience with Hummingbird, but what about HP's TRIM software? Yes $$$$, but it also has a WEB GUI interface. Just a thought.

try this software (0)

Anonymous Coward | more than 5 years ago | (#28286165)

www.Mindwrap.com

Aerospace QMS (1)

dwarf75 (1437849) | more than 5 years ago | (#28286177)

What worries me more than anything else is that you claim to be a mid-sized aerospace company. If you are having problems finding documents, what happened to your traceability processes necessary for your QMS and how do you guarantee that employees use up-to-date documents? How did you handle the process in the past??? And, what does your QMS stipulate for records and traceability?

Not quite what you want, but maybe similar enough (0)

Anonymous Coward | more than 5 years ago | (#28286199)

In a previous job we dealt with the same problem but on a smaller scale: One main office with ~ 60 people with a branch office at quite some distance with ~ 6 people working there. In our case the problem wasn't documents but a combination of large profiles which had to be pumped through a VPN link over a rather narrow ADSL line at the branch office.

In that case we placed an offsite login server which contains all the information that was also present on the main server, with nightly delta synchronisation. Users still use the main server for work that requires write acces, but we were able to offer ~ 300 GB of data locally, instead of over the network.

We also placed a so-called WAFS device in both offices. This is basically a network optimizer which intercepts inefficient network traffic and wraps this data with compression in its own network protocol. Next to that it also caches network traffic which means that to some extent, often-referenced data / network traffic is also available locally. So far i've been positively surprised with the increased throughput we've shown (about a five-fold increase as compared to the old situation).

Lastly, we've been trying to push a version tracker system as a basis for documents, but hit a lot of walls with users whom preferred their 'known' samba enviroments over a versioning system. It does allow for you to re-design your data structure for documents and string together old/related documents in an interesting way.

Regardless, you'll have to rethink and restructure how you want to store documents, if only by using better directories and creating a 'method' which users will have to adhere to. And in the end you'll need some poor cheap students whom will have the pleasure of migrating all this data to your new system.

Just my 2 cents.

IBM OmniFind - a simple easy solution (1)

sfalc (822450) | more than 5 years ago | (#28286259)

IBM OmniFind should do the trick, It indexes your files and then you can search the index very quickly. It also does caching of documents and other nifty stuff. It is based on Apache Lucene and there is a free (as in beer) version, IBM OmniFind Yahoo Edition. The free version will work with up to 500 000 documents. I used it for searching a number of networked drives with circa 50 000 files on them which it did very well.

Good luck (1)

kilodelta (843627) | more than 5 years ago | (#28286271)

When I worked for the state Attorney General's office as I.T. Director a request came into I.T. that immediately gave me an upset stomach. The request was for all documents on the server that contained the word "lead" as in the chemical element Pb. The issue was that the word lead and the element share the same spelling.

I kicked in and wrote an app that generated a web list on the fly and had clickable links so the documents could be examined and then marked as part of discovery.

I also brought in three Xerox 490's. Those were the hardware part of the document management system. I don't know if they ever got the servers for it but at least they had the gear. In the meantime I suggested using meta-data in filenames.

New Hire. (1)

deimtee (762122) | more than 5 years ago | (#28286279)

Hire a real librarian, it's what they do.
On the plus side, you also get to hire a librarian. nudge, nudge, wink, wink, say no more.

Alfresco of course! (2, Interesting)

thule (9041) | more than 5 years ago | (#28286297)

It can scale extremely well. It is the backend to Adobe's acrobat.com website! So you know it can handle millions of documents if you need it to. Sharepoint requires MS SQL Server for searching documents. With Alfresco, that feature is built in.

Sharepoint is teaming software and not really designed for large document repositories. Alfresco has a teaming interface (Alfresco Share) and a more generic document repository interface.

Alfresco can expose the repository via FTP, SMB, WebDAV, and a web client interface.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?