Obama Orders Federal Agencies To Digitize All Records 186
Lucas123 writes "President Obama this week issued a directive to all federal agencies to upgrade records management processes from paper-based systems that have been around since President Truman's administration to electronic records systems with Web 2.0 capabilities. Agencies have four months to come up with plans to improve their records keeping. Part of the directive is to have the National Archives and Records Administration store all long-term records and oversee electronic records management efforts in other agencies. Unfortunately, NARA doesn't have a stellar record itself (PDF) in rolling out electronic records projects. Earlier this year, due to cost overruns and project mismanagement, NARA announced it was ending a 10-year effort to create an electronic records archive."
Unit Of Measurement (Score:5, Funny)
So, how many Library of Congress equivalents worth of material are they intending to scan??
Re:Unit Of Measurement (Score:5, Funny)
At least 1.
Re: (Score:2)
Brings up a great point -- why not have the Library of Congress manage it.
The LoC is very good at managing digital content -- and making it searchable/available through partnerships with open source projects like the Univ of Michigan's Hathi Trust project:
http://www.implu.com/federal_contracts/listing/LC-HathiTrust [implu.com]
Could the National Archives and Records Administration outsource this project to the LoC?
Re: (Score:3)
Re: (Score:2)
Does the LoC have adequate security? I may be wrong here, but I believe part of the NARA's job is to keep classified records (espionage and wikileaks fodder) so they can be released at a later date. I'm not sure if the LoC does anything with classified material.
How Can The US Construct a Big Brother Database (Score:4, Insightful)
When all the records are locked in 8x11 filing cabinets, sealed in Manila envelopes?
And the FOIA headache!
Destroying those records is hard, and some turn up - years after they were declared not to exist!
Re: (Score:2)
They don't even want the garbagemen to know how much they're shredding.
I'd say once Lockheed can actually implement this, you'll start seeing "now that we don't have to worry about paper records anymore" retention laws flowing through with YEAs.
Re: (Score:2)
On one hand you've got images of the stasi shredding everything at headquarters as fast as they can... On the other, they never got hacked.
I feel I should elaborate but I can't put it in words right now, so hopefully you get my drift.
Congressional Record (Score:3)
And he will receive (Score:2)
Distributed object stores (Score:3)
This is actually the perfect place to incubate distributed object stores (e.g. Hadoop on one end, something like Zimbra on the other). One namespace .gov, with sub-namespaces. With a CMIS interface. Anyone see VMWare Project Octopus yet? Well, take that times 10,000 and you have a pretty nice records management system, platform independent. There's also Alfresco [alfresco.com] which is using the JCR spec which I believe can be moved to some type of distributed backend. But it implements CMIS, has a DoD spec records management system.. So the general spec would be a CMIS framework, each department/branch/whatever makes available a service for document retrieval, central .gov listing of the services, basically what Amazon does for literally everything it does. Do not compromise, executive order Jeff Bezos style, everything is a service with a public interface. I think it is possible, but it would take a lot of just plain buying in and our government (the bureaucratic, non-political side) has gotten really really good at dragging their feet and doing nothing. The cuts are coming though, and they will have to improve efficiency just like we all have in the private sector. Of course Defense is the worst, but education can use some work as well.
Re: (Score:2)
You mean like OODT ( ) ? or something more like [apache.org] iRODS [irods.org] ? Both are used by various 'big data' groups (NASA, NIH, NOAA, NOAO, super computing centers) to share data across multiple sites.
As for the indexes .... well, if science.gov [science.gov] and data.gov [data.gov] are any example, they could use some work. Although, hopefully in this case, you're describing bibliographic records, so the necessary metadata is a little more standardized.
In some cases, I'd be better to just put the records out there under standardized open APIs,
Lockheed Martin (Score:5, Insightful)
Re: (Score:2)
Looking at the NARA article, as soon as I saw that some big IT contract was given to Lockheed Martin I saw all I needed to know about this initiative.
how much money did LM or someone closely associated with LM give to the present administration or someone closely associated with the present administration? Like "they" always say, follow the money.
New Republican candidate talking point: (Score:4, Funny)
We must save our children's heritage. President Obama obviously hates America and it's legacy, otherwise, why would he be trying to destroy all the paper records? Undoubtedly, he'll claim that his long form birth certificate was destroyed during the digitization effort. It's obviously an Islamic socialist fascist communist ACORN black panther George Soros funded plot of some sort. Also.
Re: (Score:2)
Since this helps Lockheed Martin, I'm pretty sure the GOP will let it slide. They may make casual references to "Obama increasing scrutiny of US citizens", trying to portray it as an attempt to implement Orwellian Telescreens [wikipedia.org], but that will die out pretty quickly.
Archaeology (Score:4, Insightful)
Re: (Score:2)
In a thousand years, if archaeologists cannot gather sufficient data from other observations besides paper records, then it really wasn't that important anyway.
Re: (Score:2)
Re: (Score:3)
In 1000 years or more, they'll have no idea what we were up to at all. At lease some paper records have a chance of surviving.
I wouldn't worry that much... think of it... USofA has some pretty extraordinary archaeologists: Indiana Jones, Lara Croft, Rick O'Connell, Benjamin Gates [wikipedia.org]... should I continue?
Re: (Score:2)
That's Lady Lara Croft to you, peasant.
Re: (Score:2)
oh really?
what we are living is the _start_ of ridiculously accurate history, with video and images and schoolbooks all in neat packages for the people of the future to examine. 100 years ago contrast between year and year on that isn't going to be huge, but the contrast between now and just several decades ago is huge.
never before has been so much information printed and recorded, never before have so many people lived who are doing their best to record information so that it's available later. never befo
Re: (Score:2)
It's just as easy to destroy a written document as an electronic one. The only way this information will be lost is if the powers-that-be intentionally destroy it, or if something so catastrophic occurs that the internet becomes a historic fad. In the internet age, information is a virus. The media may come and go, but the data will live on, so long as there is another remote system somewhere to copy it to.
George Orwell had it wrong (Score:2)
In order for something like 1984's Ministry of Truth to function, the government would have to be far, far more competent and efficient than is ever to be likely.
Re: (Score:2)
Re: (Score:2)
What the private sector cannot link over time, the US gov can do, medical, other govs.
Any laws that stop the US gov, use private contractors or friendly govs outside the US e.g. Canada, UK.
Databases are now very efficient, data entry is in place in most states in a shareable form.
http://en.wikipedia.org/wiki/Information_Awareness_Office [wikipedia.org] showed the vision before it was los
ALL paper documents? (Score:3, Insightful)
Does that include the Declaration of Independence? I suppose it would be much easier to change in digital form...
Re: (Score:3, Insightful)
You mean the constitution?
Or do you think they'll want to rejoin the British empire?
Re: (Score:3)
do you think they'll want to rejoin the British empire?
That wouldn't be all bad. We'd at least be able to pawn off our debt on someone else.
Re: (Score:2, Insightful)
Re: (Score:3)
And yet that comment gets rec'ced up as being "insightful".
Even more ironically, it's the same folks who love to talk about "life, liberty and the persuit of happiness" the most that seem to forget the bit about the next clause, "That to secure these rights, Governments are instituted among Men" more often than not.
But then, and speaking of editing the Declaration of Independence, Texas did drop Jefferson from its textbooks:
http://www.nytimes.com/2010/03/13/education/13texas.html [nytimes.com]
They'd better do something (Score:5, Informative)
http://www.archives.gov/st-louis/military-personnel/fire-1973.html [archives.gov]
Private Industry Can Do This Better (Score:4, Interesting)
Re:Private Industry Can Do This Better (Score:4, Interesting)
It was given out in a contract, so you are already getting your wish.
Though I think we could save money by having the government do something itself instead of having to pay for Lockheed's profit and overhead.
Re: (Score:2)
Sorry, but I have to disagree. As I pointed out in another post, [slashdot.org] there are a lot of factors to think about when you do something like this, and if you don't have the experience you'll make mistakes.
It's kind of like saying (my favorite distro) Linux/Windows 7 is so easy to set up these days that anyone can do it. If it's just a matter of clicking Next->Next->Next, then yes. And that might even be sufficient for a home computer (ignoring things like backups). But most people reading this will know ther
Will the file formats be publicly documented? (Score:5, Insightful)
This would be a good time to write your congresscritter to point out the problems with undocumented file formats as well as Apis and network protocols.
There are plenty of formats that could be used that are open and vendor neutral.
If congress doesn't require that in it's funding authorization, many of our public records will be stored as word dos or in ms SQL databases.
NARA (Score:2)
IIRC, NARA didn't end the effort, it just stopped further development because it considered it complete.
Dunder Mifflin (Score:3)
Dunder Mifflin is gonna be pissed...
"I want a plan in four months" (Score:2)
In the Archival Trenches... (Score:5, Informative)
As a professional historian who has worked in the National Archives in College Park, MD and at four different presidential libraries, which incidentally are also managed by NARA, I need to interject that this is an immense costly but valuable project.
Remember "the warehouse" from the Indiana Jones movies? NARA is a little like that in terms of size but are better organized. Aisle upon aisle, shelf upon shelf, row upon row, room upon room, floor upon floor, building upon building of neatly indexed banker's boxes with labelled folders of documents. The labels may have been checked by the archivists at NARA, but they may also simply be the labels affixed to the records by the source federal agency. The individual documents in folders are almost never labelled. In the course of my work, I gathered 30k digital pictures of documents over the course of two months. The acquisition process sounds deceptively easy. Look in the index, find key words and request boxes from the archivist. Then you look through folders to locate individual documents. In point of fact, I probably visually scanned 3M pages to see if they were "interesting" and photo worthy for future research, usually taking only a few seconds per page to make a snap judgement. My decisions on which boxes of documents to request were far more time consuming. What is the right keyword for talking about computers in government in 1970? If you said "information automation" then you would be right. A few presidential (Ford especially) libraries have updated electronic files for indexing which is a huge advantage.
On my trips to the archives, it was interesting to see both professionals and amateurs using a range of technologies. I saw really old school researchers using 3x5 note cards and taking notes on legal pads. They sometimes supplemented their work by photocopying really important documents at $.75/copy. Some researchers avoided this cost by using flat bed scanners which they carried in with them. Still other researchers brought in high end digital cameras and tripods. I used a digital camera freehanded. All of these people still need to find a way to actually get to physical proximity with the records. Digitalization would open up a new era in research.
On the metadata issue, most of these records already have copious amounts of metadata recorded in well-established fields that are used by NARA.
On the OCR issue, some documents have hand-written notes on them which would not be machine readable and sometimes are not human readable. It is likely that the documents will have to be digitally scanned and flagged if handwriting is detected.
Making these records available to the general public would be a huge advantage to anyone interested in government and US history. Come to think of it, in terms of size and complexity, it would be a worthy challenge for Google. U.S. government documents run back to the founding of the country and the number of documents only increases over time.
Re: (Score:2)
Remember "the warehouse" from the Indiana Jones movies? NARA is a little like that in terms of size but are better organized.
Does it play the music when you go in there? That's what really sets the mood, you know.
Start with the classified documents (Score:2)
Technology determines information solution? Dumb (Score:2)
IMO
Objective: Information Determines Social Change and Technology Application.
Legacy: Technology Determines Social Change and Information Application.
Yes, a paradigm change. Decision makers (.com/.gov/.mil...) are legacy mind-locked on technology always defining and providing the "Information Technology" (IT) solution.
Yes, a paradigm change. Decision makers (.com/.gov/.mil...) must go to academia to help define the new "Information Management" (IM) market place. IM must determine the required IT ar
NARA? Digital? (Score:2)
So I was at a data.gov meeting in the spring, and got to talking to someone from NARA ... he said their digital archive was um ... I can't remember the exact size, but I want to say it could all fit on a single disk, so given the time, 2TB or less.
Some of the government agencies have PB of storage already ... we'd love to turn it over to NARA for long term archiving, but there's no procedures in place, and I don't think they currently have the infrastructure or personnel to deal with it.
(note, I'm taking a
Good Move (Score:2)
Laserfiche (Score:2)
I heard that Laserfiche is a great tool for document management. As it stands they are on the fore front of the anti-piracy movement, and seem to have a stable version to avoid security issues. Maybe this is what they need?
Can You OCR? Do you know XML? Are you a minority? (Score:3)
If so, I suggest creating your own business and get ready to bid on some work. No one is going to do this in house, they're going to take bids on conversions. I used to work at a company that made quite a bit of money off of paying people, per page, to OCR patents, correct OCR errors, and tag the document in XML. And I can assure you that, because of the way the government works, the majority of the work will go to minority owned small business. The work is easy and you can get college kids to do it for peanuts.
Re: (Score:2)
Because, you know, having to do paper record archive searches is so much cheaper than going digital. That's why all the big corporations insist that all records be stored in triplicate in properly filing cabinits... oh... wait...
Re:Ye$! (Score:5, Insightful)
So, you condemn Obama for things he doesn't do (e.g., reduce costs), then condemn him for doing things (e.g., reducing costs).
Gotcha.
Re: (Score:2)
Re:Ye$! (Score:4, Insightful)
Yeah - as noted, the man can't win. Ask any corporate bean counter about the cost savings (that is, stopping spending money) by going digital.
Also - remember - he's the President. He doesn't make the budget. (That's tied up in the Super Committee.) And unlike the previous President, he hasn't been ruling by fiat, executive order and signing statement.
Re: (Score:2)
I'm assuming this was supposed to be sarcastic?
Re:You've been smoking the hope (Score:5, Informative)
Actually, I went and read the executive order here:
http://www.scribd.com/doc/74042394/Managing-Government-Records-November-28-2011 [scribd.com]
which itself says nothing about Web 2.0 itself. Nor about moving to the cloud. The requirements laid out there are business level, and basically translate to the following: "You have 120 days to come up with system level requirements to move our data from hard copy to soft copy."
With this said, the section from the order that you're quoting is 2-b-i. It refers to the need to have a unified solution for archiving all existing electronic communication. Would you prefer that every department and agency have its own? And here I thought you might be in favor of cutting costs and efficiency.
Finally, your link shows that Obama has issued 17 signing statements in 3 years. That's about 6 per year. Bush issued 161 over 8 years. That's 20 per year. The number of executive orders is similar. And honestly, the Democrats in congress didn't play the cloture games that the Republicans play now. They made a huge stink about the ONE appointment that the Democrats tried to block (remember the chants of "up ur down! up ur down!"). Now, the Republicans won't let a damn thing to the floor of the Senate for a vote that doesn't explicitly further their causes. In other words, false equivalance fail.
Re:You've been smoking the hope (Score:5, Interesting)
Let's see. A difference of an order of magnitude in number of signing statements. The difference between putting the war costs in the budget - and insisting that they all be by special appropriation or would veto. The difference between starting multiple wars of occupation without a declaration and not. The difference between following the law as created by congress and accepting what congress passed (or didn't as law).
Bush was effective towards his goals. Because Obama doesn't play Bush's games, but the Republicans no longer play be the rules, Obama is not effective. That's part of my point.
No, I'm by no means happy with what Obama has (and hasn't) accomplished. But I'm sick to death of the Republicans and their Rovian games and of the charred earth policy of passing nothing that will help the country (see also abuse of cloture) and blaming Obama. The Republicans declared in 2008 that they had exactly one goal: to make sure that Obama failed. And everything that they've done during these years of crisis has been aligned with that goal, while America rots.
Finally, if you've something to say, say it for yourself as opposed to trying to spin what I'm saying into the opposite. You aren't very good at it.
Re: (Score:2)
The republicans are doing exactly what I voted them in to do. "block stupidity" and they are doing a fine job of it.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
No, he condemns Obama for being black and a Democratic president, both unforgivable sins, and together they make him the anti-Christ.
Re: (Score:2)
Re:Ye$! (But not so much in the real world) (Score:3)
Enron
Lehman Bros.
BP Gulf Oil Spill
Exxon Valdez
Fukushima
Bhopal (Union Carbide)
AIG
WorldCom
Washington Mutual
General Motors
CIT Group
Not to mention all the "too big to fail" financial companies that got bailed out on the backs of the taxpayers. It was just revealed this week that the amount of assets back up by the US Treasury was about 77 Trillion $US.
Efficient Business
PS. You're a fucking racist slug.
Re: (Score:2)
Re: (Score:2)
Trollish troll is trollin'.
Re: (Score:2)
This is how governments control the masses in "1984".
So your saying this project is putting a camera in everyone's bedroom and tying rats to their faces? Fear and omnipresence was how the fictional government controled the masses in 1984, the masses knew the official history was manafactured bullshit in much the same way as people today know that Fox is a right-wing bullshit factory.
Re: (Score:2)
Well it is a fictionaly story based on observations of actual stories. Clearly it represents an exaggerated world, but one that is so believable when we look at our own. Oceania wasn't just a place with cameras and torture with rats. Look at Wilson's own job, sitting at a desk, taking in little scraps of paper, processing them, and then disposing of them. It was very much about control of information flow, control of the historical record.
The masses knew the official story was BS, people know it today. And
Re: (Score:2)
Re:This shouldn't cost too much. (Score:5, Interesting)
Questions worth considering:
What are the savings for going digital? (Without a doubt, they exist; if not, we'd still all be filling out forms in triplicate at work.)
What is the up front cost to convert?
How long will it take the up front cost to be absorbed by the savings?
I suspect that it will pay for itself faster than you might think. Paper records searches are expensive to say the least. And they're extremely personal intensive, not to mention inefficient and error prone.
I realize that there are people out there who will condemn anything this administration does out of hand, but at least try to pretend that you think about things before you make a judgement.
It's all about the formats! (Score:2, Interesting)
You missed the most important question worth considering - in what formats will these records be maintained?
And Obama missed it, too. I don't see anything in his directive about it.
Good archival practice entails preserving original documents, not just scanned copies.
And if the purpose is to place documents on the Internet, then it's a GIGO situation. If you allow garbage, closed formats like .doc or .docx or .xls or .xlsx to be put on the Web, you're not serving transparency very well, and you're defeatin
Re:It's all about the formats! (Score:5, Insightful)
If you look at the executive order itself:
http://www.scribd.com/doc/74042394/Managing-Government-Records-November-28-2011 [scribd.com]
you'll find that while formats aren't called out explicitly, it basically instructs the archivist to come up with a comprehensive system within a limited amount of time. It's a pretty high level set of business level requirements; basically, these business level requirements translate to, "give me the system level requirements docs and specifics within four months." I can't imagine that such a system wouldn't include the proposed formats.
Re: (Score:2)
You missed the most important question worth considering - in what formats will these records be maintained?
No. The most important bit is they'll be available. Conversion from one format to another isn't all that difficult, though it may be time consuming to do so, and lossy.
Just look at the output of "man -k 2" on a Linux box and see which ones of those are for converting a proprietary format to a more open format.
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
I'm not so worried about SQL server, as it is a database format. Now, I am concerned about JPEGs of written papers scanned at 36dpi. Considering the dirty tricks politicians use to be less transparent (giving out printed copies of emails to meet FOIA requirements), I would consider SQL to be an improvement*.
* Unless it is a SQL database accessible only via a web front end. I'm sure any enemy of transparency could make this unusable.
Agreed, sad it won't happen... unless Obama is... (Score:5, Insightful)
Agencies who have thus far opted to NOT digitize their records have done so for many reasons. And even though they're being forced to digitize now, they'll find many different methods of making the process cost substantially more than it should have and drag the process out over extended periods. Let us not forget that most of these documents can only be handled by certain staff with high enough clearance given their confidential nature. If the expose writers are to be trusted, there are entire rooms of records of paper where only one highly trusted person is allowed to enter.
Let us also point out that many of these records have been written in cursive which unlike block is a screaming nightmare to handle automatically. That means that the people who hold the clearance to view the records will need to manually enter these records themselves. There will be issues of encrypting the records so that only certain individuals will have access to them. While Obama would like to make it so that there could be some central database per organization, I'd imagine that there will be many individual, sealed networks to guarantee security.
With all these issues, let's be blunt...
1) The agencies will fight it... outright AND through bureaucratic means.
2) The agencies will say "Sure... we did it" and since many of the records are highly classified, no one can actually contradict the statement... so it most likely won't happen. When a given record is asked for they'll claim "oh...we must have missed that box"
3) It will take decades to complete as there are rooms of records where only a single individual is likely to have access and I'm guessing their typing speed isn't 100wpm.
4) Obama is on his way out. Even if he survives this coming election by some miracle (he sucks as much as the next guy, but people know he sucks and are more likely to trust someone else with less of a known suckage) by the time the project is likely to start, it's almost certain whoever takes over will pull the funds from that budget within hours of getting into office.
5) For data security sake, the agencies will most likely have to design the systems themselves using whatever crap engineers they manage to find with high enough clearance that's willing to actually code document management systems. And truthfully... this isn't a TV show... if the agencies have "Super Hackers" on staff, they're probably just as lame as the self promoting idiots you find everywhere else.
So, I'm willing to say... this will cost a tremendous amount to talk about... but will go nowhere. Sad
Re: (Score:2)
Re: (Score:3)
While there may be some agencies that will try that "highly classified" BS story, there are inspectors and people who have security clearance which can go in and verify that even the classified documents are archived in a responsible manner. Some of those inspectors answer only to members of congress (usually something like the CBO or perhaps accountants/inspectors tied to specific committees) and are fully cleared to view any classified material as their need to know is usually within the scope of their o
Re: (Score:2)
Re: (Score:2)
Printing will plummet.
Labor will become greatly productive. Even the worst search engine is faster than a human search.
This has an added advantage of allow citizens to search through it.
Now, one thing that most are missing is that this will ALSO allow the feds to search it quickly. TOTAL INFORMATION.
In spite of the last, I would still do it quickly. I am amazed that he did n
Re: (Score:2)
Re: (Score:2)
BTW, I was meaning, that I was surprised that Obama did not do this earlier.
Re: (Score:3)
> What are the savings for going digital? (Without a doubt, they exist; if not, we'd still all be filling out forms in triplicate at work.)
Will save a lot of time for people looking to leak documents to wikileaks. On those grounds alone, this is my favorite Obama decision to date.
Finally we may see some real freedom of information acting.
Re: (Score:2)
Re: (Score:2)
Not really. If it's going to cost $200 billion to do, and take 150 years to recover the cost, it's probably not worth doing.
Note that above figures were pulled out of my ass, and are not intended in any way to be realistic. Though US Government policies will tend to make the process far more expensive than you might expect, and cost r
Re: (Score:2)
The company I work for has successfully implemented many such systems based on the HP TRIM software (previously Tower software), which seems to do a pretty good job for both paper and electronic records keeping. It doesn't require any custom code to do most things an organisation wants out of the box, which seems to be a key point in its success.
Re:seriously, how hard is this? (Score:5, Interesting)
Is there some complication I don't understand?
Yes. More than one.
Nothing fancy, just a database of scanned forms in pdf format and the like.
There's the first problem. It's never simple.
First issue - if you're going to put documents in, you're going to want to get them out. How do you search for them? You're going to want to define the metadata, and that's a headache. Got lawyers? They'll want client and matter. But those fields are just about meaningless to anyone else. How do you resolve the incompatibility? Do you use different forms for different groups of users? How will the engineering department find the subpoena papers that the lawyers filed?
What fields are globally useful? Are they so generic that any search will retrieve hundreds of documents? Conversely, are they so specific as to make your metadata field selections horribly long and therefore ambiguous? (Free text metadata? Let's not go there.)
Remember that you've got to fill in that metadata any time you add a document. What's the balance between useful and annoying? Too many fields and nobody will want to fill it in. Too few, and you won't be able to find anything.
That's for new documents. When you first implement a DMS, you have a truckload of documents to be imported. You're not going to do it manually, you're going to use an auto-import. But how do you define the metadata for all those millions of documents you're importing? What if you have client/matter, for instance? Hopefully they're all already sorted, and you can use something like Kofax Capture, a seriously powerful and fast scanner, and separator sheets on which you can do forms recognition to define the metadata fields. But there's a lot of work involved up front to get that import working properly.
Don't forget the OCR. Hopefully all your paper documents are clean and will OCR nicely, so you can do full text indexing.
Security. Better get that set up right. Profile level security? It's more secure, but people will complain that they don't know if a document is there and they just need to request access because profile level security means if you don't have permissions to access a document it won't even show up in your search results. Groups. And by the way, remember to define the permissions on all those millions of documents you're importing.
Version control. How do you control check in and check out? Do you control check in and check out, or just audit it?
I've only just scratched the surface of a document management system. Then there's records management. You'll want to make sure your system is DoD 5015.2 compliant. Setting up the retention schedules...hopefully you've got a records retention policy already, otherwise that's months worth of work to define those policies and ensure you comply with all regulatory requirements while still balancing your need to purge/archive old records.
How does something even become a record? Hopefully you've already got knowledgeable librarians (yes, that's what they're called), and you just need to train them on your new RM system.
Are all your boxes already barcoded? Your RM system should be able to register where a record is - building, shelf, box.
You're probably getting the idea. The technology is easy. The processes are complicated, and they get exponentially more complicated as the size of your client base grows.
Re:seriously, how hard is this? (Score:5, Interesting)
Free text metadata? Let's not go there.
Google and it's users seem to be doing a pretty good job of utilizing free text to locate documents.
Re: (Score:2)
Very different problem space. Google doesn't need to have a high precision score in its results. A DM system, on the other hand, needs to have really good precision because its corpus will contain thousands of very similar documents. Content searching isn't going to work very well there - you need specific metadata (e.g. delimit by date of filing with the federal agency).
* If you want to get technical, your tf.idf score is going to be well nigh useless in this case. It's about precision, not so much recall.
Re: (Score:2)
Not quite. Google does a good job of utilizing free text to locate some documents. If all you want is instructions on how to fix your car or the date that an event happened, getting one good result is all you need. If you need to find every document an agency produced about the toxic effects of a chemical, then finding one good result sucks--you need (within certain error levels) every document.
Re: (Score:2)
Yes, exactly. If I'd thought a bit more before writing my reply above, I'd have commented that Google does information retrieval. That's different from document management and VERY different from records management.
Re: (Score:2)
Google and it's users seem to be doing a pretty good job of utilizing free text to locate documents.
Or, to put it another way, the problem you're expecting each of these government DBAs to routinely solve required a 100 Billion dollar company which makes a point of hiring geniuses in order to tackle.
I don't think "but Google can do it!" is synonymous with "that problem isn't a big deal"
Re: (Score:2)
For all the complications you claim, how the hell do you find and manage paper records? Just mapping that 1:1 into an electronic archive with scanned PDFs would be a good start. At least then you have an archive that can be backed up easily and people can access it by opening the PDF rather than requisitioning a paper record which would be extremely much faster and require no manual labor. Then you could start getting fancy with OCR, metadata and such based on cost/benefit. And if it comes down to making a
Re: (Score:3)
People still have to be able to locate those scanned PDFs. Now it's electronic, you need to know where to go to get it. Is it on a network share in a well-organized directory structure? At some point it gets so close to a taxonomy that you get past the simple hierarchical mapping limits.
The traditional way to handle paper records is the method I referred to; you have them stored in a traditional vault and your RM system tracks by building/room/shelf/box. Everything is barcoded to make it quick/efficient to
Re: (Score:2)
You do make a valid point, but here are my counterpoints.
One - a paper library is hopefully categorized and indexed properly. If you go to an old-fashioned book library, everything is arranged according to its appropriate category, filed according to the Dewey decimal system. It's easy - you look up the number in a card catalog (or the electronic equivalent) and it tells you the number is 378.143. You go to the appropriate section of the library and there it is. Why? Because a librarian took that book when
Re: (Score:3, Interesting)
Re:seriously, how hard is this? (Score:5, Insightful)
Yeah, software would be easy to design if it weren't for all those pesky stakeholders.
Whose purpose, exactly, does it serve if the stakeholders are disappointed?
Re: (Score:2)
This is a US government project, so all US citizens are essentially stakeholders. All government agencies are stakeholders. You just can't please that many people, and it's the process of trying to do so that does projects like this in.
I didn't mean to suggest that the guy calling the shots would ignore stakeholders. He's a project manager. This person is in a position to consider ALL input and make fair compromises, instead of trying to create an amalgamation of whimsical directives by those who "outrank"
Re:seriously, how hard is this? (Score:4, Informative)
I used to work for a DMS software company at the corporate level and while the systems are on everything from elementary schools to health care providers to governments, the retrieval is pretty damn nice IF the system is set up properly. A properly set up system for a small pizzashop takes an hour or 2, a gov agency could take weeks or months to perfect. But the user side of things was a breeze.
Re:since President Truman's administration (Score:4, Funny)
Re:Inevitable Release (Score:5, Insightful)
Re: (Score:3)
Two wars on credit combined with high end tax cuts do tend to drain the coffers with a quickness.