Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
United States The Internet

Washington State Archives Go Digital 131

prostoalex writes "USA Today and dozens of others report that Washington state archives went online. Over the past two years project participants scanned 1 million documents issued by state and country authorities. The archive is located in my alma mater Eastern Washington University (go Eagles!) The 800 terabyte storage system was developed by Microsoft and EDS."
This discussion has been archived. No new comments can be posted.

Washington State Archives Go Digital

Comments Filter:
  • Well, (Score:5, Insightful)

    by chewy_2000 ( 618148 ) on Wednesday October 06, 2004 @07:22AM (#10448985)
    Personally I would find this, or something like it, very useful in research, even as just an undergrad History major. The amount of times I've wished for something like this while digging around in musty old archives...

    Although, it has to be said, I hope they make everything accessable for *everyone*, regardless of OS and browser. No doubt a lot of researchers would be using OS X/Linux/Firefox.

  • WWW address (Score:5, Informative)

    by JamesD_UK ( 721413 ) on Wednesday October 06, 2004 @07:22AM (#10448986) Homepage
    Just in case someone actually wanted the address for the archives it's http://www.digitalarchives.wa.gov/ [wa.gov]
    • Re:WWW address (Score:5, Informative)

      by El Cubano ( 631386 ) on Wednesday October 06, 2004 @07:43AM (#10449047)

      Just in case someone actually wanted the address for the archives it's http://www.digitalarchives.wa.gov/

      FYI. Turn on cookies or you receive this extremely helpful error message:

      An error occured on the site. Please try again or come back another time.

      Otherwise, it's pretty cool.

  • ... relate to state anti-competitive actions against Microsoft themselves? :)
  • Hurrah (Score:3, Funny)

    by Anonymous Coward on Wednesday October 06, 2004 @07:23AM (#10448992)
    "The 800 terabyte storage system was developed by Microsoft and EDS."
    Bill Gates and H. Ross Perot; together at last!

    I feel safer already.
    • Bill Gates and H. Ross Perot; together at last!

      hm, according to this [eds.com] link, their CEO is a guy called Michael Jordan ...
      this name seems to attract money alot better than mine ...
  • by Neumsy ( 201524 ) on Wednesday October 06, 2004 @07:26AM (#10448999) Homepage
  • NB Archives (Score:5, Informative)

    by X-Phile ( 176747 ) <tsoucy&nb,sympatico,ca> on Wednesday October 06, 2004 @07:29AM (#10449007) Homepage
    The Province of New Brunswick Provincial Archives [archives.gnb.ca] have been like this for quite some time now, with birth, death, marriage certs and census records. I have been able to search for information about my family history online using their handy dandy search tool, as well as visiting the Archives themselves at University of New Brunswick [www.unb.ca]. It never occurred to me that others might be trying catching up, but I guess that this type of service isn't something that most governments deem necessary for the public.

    • Re:NB Archives (Score:1, Interesting)

      by Anonymous Coward
      Yep! And my girlfriend's uncle works for NB's Provincial Archives! He talked to me about how simpler it's made their lives not to have to answer to a bunch of questions by phone and referring people to the website instead.
  • by vinukr ( 796210 ) on Wednesday October 06, 2004 @07:29AM (#10449010)
    One thing that they have to concentrate on in the future when the number of records grow fast is a nice search strategy. Time taken for search is one thing that can make the mass use this facility.

    As far as i have tried it out in these few minutes, the search strategy is good... there are separate search that researchers can use to know historical data and the like... This is great.
  • by chargen ( 90268 )
    The 800 terabyte storage system was developed by Microsoft and EDS.

    How would windows have enough drive pointers to be able to access this? Would there be a drive AG:? :-)

    -Pete
    • Re:drive letters (Score:1, Informative)

      by Anonymous Coward
      That's right, you'll put all the data online in single partition disks hung off one server. Why didn't I think of that?

      While there is nothing to stop an NTFS partition being 800Tb, it is far more likely that some sort of nearline hierachical storage is being used, the sort of system that is used the world over in workflow/image systems.
  • Privacy (Score:5, Insightful)

    by chewy_2000 ( 618148 ) on Wednesday October 06, 2004 @07:34AM (#10449027)
    The site seems to be slowing a bit, so I can't find details, but surely there are some privacy concerns here. I know that this just replicates the publically avaliable material in the physical archives, but there is a big difference between going to the archives and digging through books, and harvesting info over the web, especially given the sheer amount of info on the site, many of them recent records.
    • Making retrieval difficult is not part of anyone's right to privacy.

      • And once again I am not suggesting that retrieval should be made difficult, merely pointing out that there is potential for abuse that needs to be considered.
    • Re:Privacy (Score:2, Insightful)

      by lamona ( 743288 )

      Absolutely. Making "public" records available universally is a different meaning to "public" in public records in situ. Although the word "public" was used, it really meant the local community. When you change that to "everyone in the world with internet access" you change the context in which the data resides... and for data, context is everything. For one thing, it narrows the scope to a small portion of the population so that accurate identification (or, conversely, less mistaken identity) is facilitated

      • All local records do is make it harder for people without money to get them. People with money have always been able to hire private investigators to track down the records they want. This makes it easier for people without money to do so, or for people who are vaguely interested in something but don't care enough to hire a private investigator (or do it themselves) to do so.

        If you really want to stop abuse, you'll have to make them completely private, not just "private but inconvenient to get to".
    • Re:Privacy (Score:2, Interesting)

      by dtjohnson ( 102237 )
      This is a significant erosion of privacy. Governments require you to provide a lot of information for all sorts of things. Now, they are using new technology to make all of this information available to anyone anywhere in the world with a casual 2-minute search. Where will this stop? Tax records, medical records, personal property records, lawsuits, judgments, military records, etc. may all soon be posted online in this way. This is a first step towards that sort of future where anyone can easily sniff
      • Tax records, medical records, personal property records, lawsuits, judgments, military records, etc. may all soon be posted online in this way.

        Property (real property) records are already public domain--as they should be. There's no good reason for the government not to tell you who owns what land. Whether you find out at the county tax assessor's office or on the Internet is irrelevant.

        Aside from property tax information, I don't foresee other tax information being released to the public. Knowing t

  • Digital twilight. (Score:5, Interesting)

    by haeger ( 85819 ) on Wednesday October 06, 2004 @07:48AM (#10449067)
    How about the "Digital Twilight" that people have talked about? One of the big problems with these kind of archives is that they aren't permanent the way that paper is. Washington could very easily end up the way that Stasi did in East Germany. They have several hundred tapes of data with information about every spy in the west on them but the information is still "safe" since noone no longer knows how the data was saved to disk or which file format was used.

    And I'm still ignoring the fact that machines grow old and has to be replaced. It's a known fact that disks break so You'll need backup but how long could You keep an old storage solution around. Sooner or later You'll have to migrate old backup data to newer media.

    Note that I don't think that this is a bad idea, moving everything online, but there are concequences that I don't think that everyone has thought of.

    Where I live one can go into the royal library and find (and read) an official document written by someone in the 16:th century, but can we be sure that 100 or even 50 years from now someone can read a DLT300-tape?

    .haeger

    • by LousyPhreak ( 550591 ) <lousyphreak@nosPam.gmx.at> on Wednesday October 06, 2004 @07:58AM (#10449109)
      you still can move the data from the old system to a new one if its at the end of its lifetime.

      harddrives can easily be replaced (assuming its a sort of raid with hotswap)

      sql will also stay around really long, and if not there will be at least a gazillion tools to convert to a new format (it is quite sure that the data will be stored on a sql server)

      and as long as the data is safely stored the access mechnism shouldnt be a problem but thats just my .02
      • you still can move the data from the old system to a new one if its at the end of its lifetime.

        Yes, you are correct about this but scale it up a bit. If you have to change media every 10-15 years then the data migration becomes a full time job for someone.


        sql will also stay around really long, and if not there will be at least a gazillion tools to convert to a new format (it is quite sure that the data will be stored on a sql server)

        and as long as the data is safely stored the access mechnism shouldn
        • what i meant with "safely stored" was "archived in some sort of database"

          some form of a database will be around almost forever, if a new form of storage is invented which makes databases obsolete there will be enough tools around to move the data from a sql database to the new system, simply for the fact that to much stuff is already stored in various databases to let go of it.

          additionally if this system is heavily used (i.e really everything enters it) it will grow and with it the hardware will need upgr
          • some form of a database will be around almost forever, if a new form of storage is invented which makes databases obsolete there will be enough tools around to move the data from a sql database to the new system, simply for the fact that to much stuff is already stored in various databases to let go of it.

            But this presumes that someone cares enough to do the migration while there are still "enough tools around". If no one cares for enough technology generations the data format will be so old that no one

          • the only problem could be if the vendor of the frontend goes out of business, the source with its documentation vanishes, and someday iis will be dumped (hopefully ;) ), so the frontend will be unusable. but even in that case it should be possible to reassemble a useable frontend given the case that the system is well designed.

            Ah, yes. This is a good point. What if the vendor... of the frontend, or backend, or any of the systems, goes out of business? Then they will be screwed!

            Unless, perhaps, they we
    • Problem is that the notion that tape is an archive format. It's not, it's a backup format (catastrophic recovery). It's only an archive format while you have the capabilities to read it (if you can read it)

      An archive should be a Write one, read many file system with Active on-disk (not hierarchical on tape) information with multiple copies preferably at multiple sites (depending on how valuable the data is), with programs for active file validation (you need to be sure the file is still there, and still th
    • I don't follow, you're allowed to personally go touch (so that you can turn the pages of and read) a 16th century document? I have a hard time imagining that's what you meant, but if it is don't you have concerns about what your finger acids will do to that historical page in 50 years? In 100 years it would certainly show the effects of humidity changes from breathe from users that had handled it. IF it survived the handling... all media have archival problems. All human product has permanance issues (even
      • Actually yes, that's what I meant. I can (with legitimate reasons) go and read any document as far back as we have records. Any kind of research is a legit reason. A relative of mine has done some digging about our familys history and in doing this been reading document as far back as 13:th century (that's 1400-something, right?). Gloves supplied by the library.

        Yes, I am aware that these records will be destroyed eventually but it has survived more than 500 years of storing without any intervention. I seri
        • Yes, I am aware that these records will be destroyed eventually but it has survived more than 500 years of storing without any intervention. I seriously doubt that in 500 years someone will pick up a CDROM and go "Wow, let's see how they lived around year 2000". I doubt that will be possible in 50.

          Washington State's doing just that experiment. Back in 1992, they created a time capsule using the latest and greatest storage technology: CD-ROMs. The plan is to add new material every 25 years, and in 2492,
    • >>> One of the big problems with these kind of archives is that they aren't permanent the way that paper is. Washington could very easily end up the way that Stasi did in East Germany.

      Paper is not really permanent either. If someone wants to get rid of paper documents, all he needs to do is burn them. Eventuially, in an "accidental fire".

    • Having had conversations with Adam Jansen (the WA digital archivist quoted in several of the news stories), many consequences have been thought of and addressed (as well as could be expected). I also don't believe that this is intended to replace all physical documentation - I wouldn't expect them to shred the various pieces of legislation after the documents been scanned. This is just to provide another, easier method of access to the public and researchers. On top of all of this, tell me how you "back-up"
  • no maps? (Score:5, Insightful)

    by Apreche ( 239272 ) on Wednesday October 06, 2004 @07:49AM (#10449072) Homepage Journal
    Dang, there are no maps in there. The best stuff in the archives at town hall have always been maps of the town and blueprints of various buildings. But nobody scanned those in the archives. Oh well.
    • Re:no maps? (Score:3, Interesting)

      by mikael ( 484 )
      Maps and/or aerial photographs combined together make the best time-lapse animation. It's amazing to see the growth of a city all the way from the first harbour/warehouse in Roman times to the metropolised supercity of today.
    • Re:no maps? (Score:1, Flamebait)

      by Scaba ( 183684 )

      Maps & blueprints? What are you, some kind of terrorist!!?!?

    • Some of the maps are about a quarter mile away in the JFK Library, downstairs in the Government Archives. There are a lot of CIA intelligence materials down there, including maps and reports. One of the featured items of the month is the Postwar Intelligence Report on Iraq or somesuch. Not only are those not scanned in, but they're inconvenient to access. The government documents are all on these moving shelves that never seemed to work properly, and would consistantly beep in the background to say "we'
  • System Spec (Score:4, Funny)

    by LiquidCoooled ( 634315 ) on Wednesday October 06, 2004 @07:58AM (#10449110) Homepage Journal
    The 800 terabyte storage system was developed by Microsoft and EDS.

    Microsoft was able to confirm the system is expandable, and contrary to previous rumours, will infact have enough disk space to install Longhorn.

    They do however state, that to do anything actually useful, more upgrades will be required.
  • by N8F8 ( 4562 ) on Wednesday October 06, 2004 @07:59AM (#10449115)
    If you can't bother to find a link to a web resource in an article about a web resource, you shouldn't post it!

    http://www.digitalarchives.wa.gov/ [wa.gov]

  • Over the past two years project participants scanned 1 million documents issued by state and country authorities.

    If only someone had told them about Kinko's [kinkos.co.uk].
  • Perfect (Score:2, Interesting)

    Oh, great. When (not if) Microsoft is brought to court for antitrust violations again, all MonkeyBoy has to do is enter a secret backdoor password and, *poof* all those documents containing damning evidence suddenly go "missing" -- or perhaps they simply disappear from the index as if they never existed.

    Would you trust a known pedophile to give your kids a bath? If not, then why trust a convicted monopolist who is on the record for purgery with critical documents?
    • I know you are just being funny, but on a side note - I do not think they state got rid of its paper documents (though that would be cool for recycling)...instead I think they just added this as a nice and easy way for society to benefit from technology...
    • What Anti-Trust violations?
  • This is how 800TB can be digitaly locked forever http://www.emc.com/products/systems/centera.jsp/ [emc.com] and still be online.
  • by illtud ( 115152 ) on Wednesday October 06, 2004 @08:25AM (#10449224)
    The system isn't 800TB, but will scale to 800TB, according to this EDS press release [eds.com]. In fact, given that they've spent a mere $2.5M [wa.gov] (powerpoint!) there's not a hope in hell that they've got 800TB! The powerpoint says it's a 5TB EMC SAN & an ADIC tape library for backup.

    An interesting point is that they're delivering the documents using DjVu by Lizardtech [lizardtech.com], which is GPLd, and developed by the creators of DjVu in conjuction with LizardTech (after a period of LT not-getting-it). The DjVuLibre home page is here [djvuzone.org]. LizardTech still have the best encoders for the format.
  • Is this the same EDS that is currently fleecing the US Navy for Hundreds of Millions of dollars in, what has been described by everyone I've talked to as extremely poor computer and network support?
    FTA -- "If you mention NMCI, there is an automatic groan," he says. "I think the phrase is, 'I've been NMCI'd.' "
    The Article [govexec.com]
    • Yep. Not just the US Navy either, but the Army as well. I have the misfortune of having them as my network/computer support at work, and it's really sad. Most of their network guys come to me for help, and I'm a Video Teleconferencing Engineer, not a network guy.

      Of course, I cant bash them too hard. I'm hoping they'll hire me ;)
  • Bug Farm (Score:2, Funny)

    by HangingChad ( 677530 )
    The 800 terabyte storage system was developed by Microsoft and EDS

    Run and hide. If there was ever a combination of resources destined to fail it's Windows and EDS. If it works at all I'll be surprised. If it keeps working I'll be amazed.

  • What are the odds they forget to reboot and it all crashes after 30 days?
  • Truly a match made in heaven.
  • They're hosting the archives here at Eastern? You know something's going really wrong when Slashdot is your source for current events for the university you work and attend classes. Where's my ginseng tea?
  • Go Eagles! (Score:2, Interesting)

    by Pcghost ( 817978 )
    The digital archives is a big step for my University. Five years ago we were facing a hostile take over by the drunken WSU, now Eastern is the fastest growing University in the state. The Microsoft focus is to be expected. Redmond pays a lot of money to keep universities in our state in line. Rest assured Eastern is loaded with disgruntled Linux users being forced to learn Visual Basic in their IT courses. There are even a few IT profs pushing for changes, though they haven't made much headway in their e
    • Re:Go Eagles! (Score:2, Interesting)

      by Sta7ic ( 819090 )
      Fastest growing because we still have something like space. With WSU or UW not taking anyone with less than a 3.6 GPA for reasons of overcrowding, and in-state tuition being around $1200 for 12-18 credits, this place isn't half bad. But our math department was ranked the absolute worst in the state of Washington between the four and two year colleges last year, which seems to hamstring progression through the CS department. One of our profs has a dubious reputation after 3/4 of the class failed a 300-lev
      • Yeah, the dorm networks here suck (I'm a grad student myself, just moved off campus after 4 years in the dorms, er, "residence halls")

        It's not only a problem with the provider and the infrastructure, but also the management (EWU Department of Housing and Residential Life) not having a fsckin' clue how to manage a network of this size. I've looked into it on several occasions (from both the end-user and systems design perspectives). It's not pretty. Pretty fsckin' ugly, actually.

        I don't have enough space h
    • Quoth the fellow Eagle poster:

      The digital archives is a big step for my University. Five years ago we were facing a hostile take over by the drunken WSU, now Eastern is the fastest growing University in the state.

      Yes, it's amazing the strides Eastern has made. I was there in the early-to-mid 90s when they sucked in just about every possible way there is to suck. Now, if my other plans don't pan out, I might actually consider going back there to finish my degree. It is true, however, that EWU is still
    • Rest assured, though, that the VB situation you described is applicable only to the IT guys (I'm not even sure we have an "IT" program. We have MIS, CIS, CS, etc, but nothing that's actually officially called "IT").

      In the Computer Science Department (where I am now a grad student, having gotten my BS degree here last fall), students are taught Java as the base language for the programming classes.

      After that, we offer electives in C++, C, Ptyhon, and a handful of other languages.

      We also have quite a few L
  • Also of note, Administrative login is available here:
    https://www.digitalarchives.wa.gov/WADAAdmi n/logon .aspx?ReturnUrl=%2fWADAAdmin%2findex.aspx

    It appears to not be succeptible to a common IIS/ASP script injection bug: ' or 0=0 --

    Good work.
  • I wasn't able to find my birth record yet. Any mention of how much data is not online yet?
    • And I'm neither married nor a citizen. The names I used for those were fairly common, so there should have been some false hits at least, but nothing.
    • Me too. Neither. Gee, I hope they kept the originals somewhere.
    • Birth records aren't on file with the state archives.

      3. How can I get my birth certificate?


      The State Records Center does not have the authority to distribute any agency records to the public. Birth certificates can be obtained through the Department of Health, Center for Health Statistics at (360) 236-4300 or via the internet at www.doh.wa.gov [wa.gov].
      • Not sure why you say birth records aren't on file with the state archives.

        If that were the case why do they allow you to search for birth records? They even return results too (mainly from Spokane County)... just not my personal record.
      • my bad... birth records aren't available for PURCHASE through that site. anyway.. here's what they have so far.

        Birth Records


        Birth Records contains a listing of people born in the following areas:

        Pierce County (Fox Island, Anderson Island, McNeil Island, and Steilacoom) from 1903 to 1914

        Walla Walla City in January - April 1907

        Spokane County from 1890 to 1906
  • 800TB to store 1M docs means 800MB:doc. It seems cheaper, for storage, transmission and searching, to store most of these docs, which were typed on a machine like a typewriter or wordprocessor, as events and a context. Each doc's colophon in the database would include the font and layout parameters of the process that created the doc, like "1973 IBM Selectric", "TABS: 5, 10, 15", etc, and then a sequence of "UI" events, like keys struck and marks applied. The server could regenerate the docs through simulat [hollanderbooks.com]
  • Are there any systems that actually have this much storage now? What comes after the terabyte? Quadrabyte?
  • Who is "Go Digital", and why are they archiving it? :)
  • by Aidtopia ( 667351 ) on Wednesday October 06, 2004 @11:56AM (#10451226) Homepage Journal

    Has anybody figured out the date formats? I'm seeing a lot like this "02001987". OK, it's either mmddyyyy or ddmmyyyy. But what does 00 mean for month or day? Unknown? It's hard to imagine that they don't have an exact date of death for someone who died as recently as 1987. Or is a zero-based counting system (00 = Jan, 01 = Feb, ...)?

    It's interesting that the death records include Social Security Numbers. Anybody want to harvest a few thousand inactive SSNs?

  • Size is out of wack (Score:2, Interesting)

    by Maxwell ( 13985 )
    A TERABYTE IS 1000G. And 1G IS A 1000M. So A TERABYTE IS 1,000,000 MEGABYTES. Right?
    there are 1 million documents in this database? And it's 800 terabytes? So each doc is 800m in size?
    800m EACH? That's freaking huge. Even if the thing is only 8T in size (far more reasonable), each doc is still 8M in size. Again, pretty massive.

    is this like that time MSFT bragged about their 1T DB of geological data, and then Oracle
    built the same database, with the same content using only 300G of space?

    Inefficiency is not
  • At least for marriages, I doubt the database is complete/finished. Marriage records for myself (King County), my parents (Clark County) and my in-laws (King County) are not there. Death records are there though--at least for my family. As others have said, I too would be afraid of people datamining this for personal gain. I hope there are decent safeguards against this.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...