Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Google To Digitize, Make Available British Library's Historical Holdings

timothy posted more than 3 years ago | from the to-be-young-was-very-heaven dept.

Google 86

pbahra writes with part of an excellent story at the WSJ: "The British Library today announced its first partnership with Google, under which Google will digitize 250,000 items from the library's vast collection of work produced between 1700-1870. The Library, the only British institution that automatically receives a copy of every book and periodical to go on sale in the United Kingdom and Ireland, joins around 40 libraries worldwide in allowing Google to digitize part of its collection and make it freely available and searchable online, at books.google.co.uk and the British Library website, www.bl.uk. ... As well as published books, the 1700-1870 collection will also contain pamphlets and periodicals from across Europe. This was a period of political and technological turmoil, covering much of the Industrial Revolution, the French Revolution, the introduction of UK income tax and the invention of the telegraph and railway. All of these topics are covered, as are the quirkier matters of the day, such as the account, from 1775, of a stuffed hippopotamus owned by the Prince of Orange."

Sorry! There are no comments related to the filter you selected.

Yes, (0)

Anonymous Coward | more than 3 years ago | (#36510280)

Can someone put that in terms of football pitches of information?

Re:Yes, (1)

ciderbrew (1860166) | more than 3 years ago | (#36510384)

Sorry, I only have "the pile would reach to the moon and back x amount" or number of double decker buses jumped by and or Eddie Kidd / Evel Knievel my mate Dave.

Re:Yes, (1)

sonamchauhan (587356) | more than 3 years ago | (#36510514)

No, no ... in terms of cricket pitches.

Or, in multiples of 'Playing fields of Eton'

But the IMPORTANT question is... (3, Funny)

Serious Lemur (1236978) | more than 3 years ago | (#36510282)

What will Apple and Facebook do? They can't afford a British literature gap!

Re:But the IMPORTANT question is... (2)

c0lo (1497653) | more than 3 years ago | (#36510398)

Nah. Wrong question. The really important one is: books being useful to as many as possible? TFA:

Speaking at the official launch, Kristian Jensen, the Library’s head of Arts and Humanities, said: “This process allows books to fulfill their original aim of being useful to as many people as possible.

I thought that is already understood: the copyright should be extended forever, for the profit of the grand-grand-...-grand children of the author (too bad if the author sold the rights to the publisher... but it's irrelevant for the usefulness of books, isn't it?).

Besides, digitization comes with the risk of exposing these "as many" to words, facts and attitudes that are quite sensitive today. I hope that Google will take note: even more recent pieces needed a "translation" to make them politically correct [independent.co.uk] .

Again: can we let the Tea Party and Michele Bachmann [nowpublic.com] be hurt if indiscriminate digitized papers of the time showed that the founding fathers did own slaves (and, possibly, more than own [monticello.org] )?

</sarcasm>

Re:But the IMPORTANT question is... (0)

RPD9803 (669023) | more than 3 years ago | (#36510750)

This, in fact, is a way of extending copyright, in the UK. The UK recognizes rote copying of copyrighted works to be copyrighted themselves (see: UK NPG vs. Wikimedia commons a year or two ago). The articles al state that these books are to be used for 'non-commercial' uses only. And 'non-comercial' is becoming a smaller slice every day (publishing it? commercial. Website with ads? commercial). Google doesn't go what's good for the World, Google does what's best for Google.. if they overlap, it's merely coincidental.

Re:But the IMPORTANT question is... (1)

digitig (1056110) | more than 3 years ago | (#36510912)

So if you cut and paste it then you might be breaching copyright. If you retype the paragraph you are citing then I don't see how you can be. I suspect the real reason for the non-commercial clause is to stop people publishing and selling paper versions directly from the digitized versions.

Re:But the IMPORTANT question is... (3, Informative)

buchner.johannes (1139593) | more than 3 years ago | (#36510922)

Here is a talk by librarian Brewster Kahle on book archiving [ted.com] . He created the Internet Archive internet.org.

With Google, its important to make a contract so that the content is really open to all.

Don't do drugs (1)

Snaller (147050) | more than 3 years ago | (#36512590)

Here is a tip: Don't do drugs before you post rants on slashdot.

Re:But the IMPORTANT question is... (1)

sgt scrub (869860) | more than 3 years ago | (#36512838)

Again: can we let the Tea Party and Michele Bachmann [nowpublic.com] be hurt if indiscriminate digitized papers of the time showed that the founding fathers did own slaves (and, possibly, more than own [monticello.org])?

It will work more to their benefit. Books during those times were filled with Cristian fanaticism and bigotries. In fact. It will be hard to tell Bachmann, and quotes from any t.b/rep, from the books.

Exactly (0)

Anonymous Coward | more than 3 years ago | (#36514774)

Exactly, this and all other published works should be be made available as a soon as possible.

This is the only way we can hope to get rid of the ignorance among christians, hindus, jews, moslems, and other follies. In the long run the illiterate will have access to this information, too, and be able to muster some self respect, based on reason, not belief.

Suddenly, Google is being good again!!!

Re:But the IMPORTANT question is... (0)

Anonymous Coward | more than 3 years ago | (#36511026)

Umm there is already an app for that available on the i-pad store. Granted it only currently has a 1,000 books but the 2nd release is supposed to have over 60,000 by the end of the summer...
http://www.silicon.com/technology/mobile/2011/06/20/photos-want-the-british-library-on-your-ipad-theres-an-app-for-that-39747603/
Don't think Facebook is in the book business - although BL did talk about it on their FB site...

He who controls the past (1)

Anonymous Coward | more than 3 years ago | (#36510296)

Controls the future..

great! (1)

Anonymous Coward | more than 3 years ago | (#36510298)

No doubt there'll be plenty of "ZOMG GOOGLE IS TAKING OVER" comments but this is brilliant. There's so much archived information in Britain that is supposedly public but actually costs a fortune to research as you have to travel to wherever it's stored then pay an archivist to take you into the vault and find the papers etc.

Re:great! (1)

Intrepid imaginaut (1970940) | more than 3 years ago | (#36510600)

Absolutely, I'm delighted at this. On a much tinier scale I've been poring over the reefs of old notes I have to create the rpg in the sig there, I wish someone would offer to digitise that lot for me :/ I keep catching myself looking for the search button in the paper notebooks.

Re:I wish someone would offer to digitise that lot (2)

TaoPhoenix (980487) | more than 3 years ago | (#36510810)

Calling your bluff. What state are you in?

For that to happen for free you need to declare the contents of your game system Creative Commons BY-SA which is Attribution-ShareAlike, and avoids the weird tangles regarding ad revenue vs "non commercial".

Then you have to develop the Literacy Pyramid, which is what every single copyright-clueless entity always falls into, proving that they are about the lawyers instead of the writers. The Literacy Pyramid says that you need a base of some 100 Lurkers to get about 7 Enthusiasts. But the output of Enthusiasts may not be to the standards of the Creator or the Skilled Amateur! So then you need to let 100 Enthusiasts stomp around leaving muddy tracks everywhere to get your 7 Skilled Amateurs. So every time Eric Flint whines on the Baen Free Library that "it's too expensive to digitize old works therefore they will never be republished" he's full of ...jellyBaens because it's somehow magically worth paying the lawyers afterward to sue the Enthusiasts as they stomp around.

So are you ready to do a little carpet cleaning to get your game out there?

Re:I wish someone would offer to digitise that lot (1)

Intrepid imaginaut (1970940) | more than 3 years ago | (#36513774)

I genuinely have no idea what you're talking about?

Re:I wish someone would offer to digitise that lot (1)

Intrepid imaginaut (1970940) | more than 3 years ago | (#36519964)

Oh okay I think I get it - the game will be free for all to use and share upon publication, that's in the blog, issue 1. That's not much help though, I've yet to meet the ocr program that can translate my scribbles.

Re:great! (1, Interesting)

martin-boundary (547041) | more than 3 years ago | (#36510614)

Brilliant is a tall order. Judging from the quality of the scans of old books that are available on Google already, this will be a waste of time.

Older documents and books are notoriously difficult to scan - as it gets old, the paper starts to disintegrate and the ink fades away, and because the books are valuable, people have to be much more careful how they open and handle them.

Bottom line is that old books need to be scanned at much higher resolution AND the blotches and broken characters have to cleaned up much better than when scanning from the last decade only. Google won't do that - they're more interested in quantity and speed rather than quality.

I expect most of the books will be unusable and will have to be redone at some point in the future. I don't know why they bother (*).

(*) British Library, that is.

Re:great! (0)

Anonymous Coward | more than 3 years ago | (#36510696)

So your idea is to have 0% instead of say 80% readability?

"Can not do it 100% perfect why bother..." seriously?

Re:great! (0)

Anonymous Coward | more than 3 years ago | (#36510754)

Look at the quality of what is available now.

Seriously, this is pathetic.

As for OCR, I tried to read "treasure's island", for instance, it was so full of errors it was unreadable... Why bother if the end result is in fact impossible to use?

Re:great! (1)

martin-boundary (547041) | more than 3 years ago | (#36510888)

No, I'd rather the BL do it properly themselves however many years it takes, instead of Google's wham bam thank you ma'am approach. Any job worth doing for future generations is a job worth doing right.

Moreover, it's a necessity. If the scans are shit, then they can't be OCR'd, so all you have is pictures anyway.

It *can* be done, lots of libraries around the world have done proof of concept pilot runs going back to the 90s even, and you can find their collections on the web if you look.

Re:great! (1)

Panoptes (1041206) | more than 3 years ago | (#36511774)

A small, but important, point: 'blotches and broken characters' are precisely what interest bibliographers and researchers, for whom these scans will be of immense value.

Re:great! (2)

digitig (1056110) | more than 3 years ago | (#36510886)

If you are in the UK, your local library should be able to get hold of copies of most British Library material for you, for quite a small fee. Yes, it's slow, and the small fees would build up if you need to access a lot of different things, but the information was already more accessible than you suggest. This is still a great step forward, though.

Now I am intrigued... (1)

rts008 (812749) | more than 3 years ago | (#36510302)

What about the Prince of Orange and a stuffed hippopotamus?
Inquiring minds want to know.

What does one do with a stuffed hippo?

Re:Now I am intrigued... (2)

chill (34294) | more than 3 years ago | (#36510428)

More to the point, what did the Princes of Green, Red, White and Mauve think? And what about the Marquis of Heliotrope?

Re:Now I am intrigued... (1)

dkleinsc (563838) | more than 3 years ago | (#36510774)

I don't know, but the Fresh Prince was jiggy with it, and Prince didn't have The Time to comment about it.

Re:Now I am intrigued... (3, Informative)

pieterbos (2218218) | more than 3 years ago | (#36510520)

Put in your cabinet of curiosities of course, and show to visitors. What else would you ever do with it? The title Prince of Orange is held by the crown prince of the Netherlands. It refers to the french city called 'Orange'. The title still exists, but is not a claim of any sort on the city of Orange, which is part of France. See wikipedia [wikipedia.org] for the rather strange history of the term

Re:Now I am intrigued... (5, Informative)

SMoynihan (1647997) | more than 3 years ago | (#36510814)

Indeed, and the title is older than the English word "orange" itself. This was introduced to English in the early 1500's (just in time for Shakespeare to complain its lack of rhyme...), and is termed after the name for the fruit. Prior to this, the colour was "geoluhread" (yellow-red). Note, we don't call it "carrot", as (yellow-red) carrots were developed in the 1700s.

Now, the house of Orange comes from the city, originally "Arausio", in southern France. This was named for the local Celtic water God of the same name.

Being Irish, I admit I find it somewhat ironic that the "Orange-men" are originally termed for a pagan, Celtic god...

Re:Now I am intrigued... (1)

adavies42 (746183) | more than 3 years ago | (#36513102)

Note, we don't call it "carrot", as (yellow-red) carrots were developed in the 1700s.

and popularized as a symbol of dutch patriotism, iirc

Re:Now I am intrigued... (1)

sgt scrub (869860) | more than 3 years ago | (#36513162)

Now, the house of Orange comes from the city, originally "Arausio", in southern France. This was named for the local Celtic water God of the same name.

Thanks for pointing that out. I looked at it and thought, Arausio was a Gaul camp. Now to figure out why the Celts where in southern Gaul during a period of time when most everyone was trying to get way from the Romans.

Re:Now I am intrigued... (1)

tehcyder (746570) | more than 3 years ago | (#36513892)

Being Irish, I admit I find it somewhat ironic that the "Orange-men" are originally termed for a pagan, Celtic god...

But that's entirelly irrelevant to the current use of the term, which relates purely to the time after William of Orange, it has no connection with the original Celtic god.

You might as well say it is ironic that Christians worship on a Sunday, which is named after the ancientt Sun god.

Re:Now I am intrigued... (1)

SMoynihan (1647997) | more than 3 years ago | (#36516102)

But that's entirelly irrelevant to the current use of the term, which relates purely to the time after William of Orange, it has no connection with the original Celtic god.

You might as well say it is ironic that Christians worship on a Sunday, which is named after the ancientt Sun god.

Begging your pardon (and ignoring the conflation of Christ with Sun gods in early Romano-Christian history); I think the comparison might be more apt if a group of Christians worshipped on Thursday, a day named after Thor, so named themselves Thursians.

Personally, I would find that ironic - perhaps it's that extra step of actually naming yourself after the deity.

However, your mileage may vary.

On a related note, I find it somewhat amusing that many Christians (in my experience) would term saying "Christ" as blasphemy, and think of it as something akin to a surname - not knowing it as the transliteration of the simple Greek "Christos" (Saviour)

Re:Now I am intrigued... (1)

Intrepid imaginaut (1970940) | more than 3 years ago | (#36510590)

Ride it around lashing it with a switch of course. Ah the joys of inbreeding.

Re:Now I am intrigued... (1)

sgt scrub (869860) | more than 3 years ago | (#36512898)

Stuffed Hippopotamus? Is that 1700 goatse?

Not the only one... (3, Informative)

metageek (466836) | more than 3 years ago | (#36510310)

This is not the only British library that gets all publications, The National Library of Wales (http://www.llgc.org.uk/) also gets all publications that are published in the UK (and there is likely one also in Scotland)

Re:Not the only one... (2)

Webspit (798042) | more than 3 years ago | (#36510324)

technically no - I re-read the article - only the BL automatically gets a copy. The welsh like oxford have to request one within a year. The other difference is the copy sent to the BL has to be the same as the best edition whereas the rest are fobbed off with the same edition as the one currently most popular.

Re:Not the only one... (1)

mdransfield (101993) | more than 3 years ago | (#36510336)

As usual, it's slightly more complicated: http://www.legaldeposit.org.uk/background.html [legaldeposit.org.uk]

Re:Not the only one... (3, Interesting)

Geeky (90998) | more than 3 years ago | (#36510420)

Interesting, as it's covered by law in the UK. I wonder how it would apply to self-published books, such as books sold through the likes of Blurb or Lulu.

Those companies are not UK based, so are not covered by the legislation. However, if I (as a UK resident) published a book, for sale to the public, via Lulu, would I be classed as publisher in terms of this legislation?

Legal deposit (1)

Martin Spamer (244245) | more than 3 years ago | (#36510508)

Legal deposit cover printed material, digital publications (Newspapers, scholarly journals, software including games) and online material are covered by a voluntary scheme.

Re:Legal deposit (1)

Geeky (90998) | more than 3 years ago | (#36514428)

Coming back to this late, but Lulu and Blurb are basically print on demand services, so we're not talking digital books. Lulu even let you get an ISBN number for your book.

Re:Not the only one... (4, Informative)

jcupitt65 (68879) | more than 3 years ago | (#36510356)

Actually the BL really is the only one to automatically get all publications. Five other libraries are entitled to a free copy upon request.

http://en.wikipedia.org/wiki/Legal_deposit#United_Kingdom [wikipedia.org]

I know Cambridge gets everything with an ISBN, and from your post it sounds like Wales and Scotland do too. Things like PhD thesis only go to the BL though.

Thesis goes to University Library, not BL (0)

Anonymous Coward | more than 3 years ago | (#36510668)

I know Cambridge gets everything with an ISBN, and from your post it sounds like Wales and Scotland do too. Things like PhD thesis only go to the BL though.

Sorry but I think you have it the wrong way around. My PhD thesis went to the Cambridge library but it did not go to the BL.

Re:Thesis goes to University Library, not BL (1)

jcupitt65 (68879) | more than 3 years ago | (#36510840)

Strange, mine just went to the BL. Perhaps it depends upon the examining institution.

Re:Not the only one... (1)

illtud (115152) | more than 3 years ago | (#36521556)

Things like PhD thesis only go to the BL though.

No, at the National Library for Wales we get the theses from the universites in Wales:

http://www.llgc.org.uk/index.php?id=4653 [llgc.org.uk]

So they don't get everything from the UK (I'm not sure what Scotland does, they have their own National Library).

We've started harvesting e-theses from university repositories as part of the ETHOS project (see link in the url above), the BL will however harvest them on from us (subject to agreement with the originating uni), so they'll get a more complete collection of those.

ps - the BL should have a copy of all material covered by Legal Deposit, but even they have a 'reminder' office that has to chase up publishers, but they have it a lot easier than the rest of us Legal Deposit libraries, who have to put in a claim for each item.

Its worth pointing out... (1)

Richard_at_work (517087) | more than 3 years ago | (#36510318)

From the article:

The new collection will contain only works that are out of copyright under European law.

Google are approaching it correctly this time.

Re:Its worth pointing out... (1)

Anonymous Coward | more than 3 years ago | (#36510346)

Will the digitized copies contain a 'copyright Google' watermark?

Finally, us mere mortals may have a glimpse (0)

Lincolnshire Poacher (1205798) | more than 3 years ago | (#36510406)

The BL blows on about adding to "our shared heritage" but the truth is that they are notoriously fickle and arbitrary about issuing Reader's Passes to actually use their collection.

I have had my application for a pass refused as my research justification was deemed "insufficiently scholarly", even after I had spent 10 minutes being interviewed by the secretary. The average man on the street who wanders in to their London campus will be in for a rude shock.

Even if the staff judge you to be worthy enough to view their precious possessions you have to jump through hoops just to reserve the item.

Whenever I finally publish the fruits of my work I will happily flout the Legal Deposit Libraries Act and refuse to provide BL a copy.

Re:Finally, us mere mortals may have a glimpse (0)

Anonymous Coward | more than 3 years ago | (#36510422)

I don't blame them for not wanting to let you get your grubby northern hands on these books. Are you writing a book on pies?

Surely that would be unecessary... (0)

Anonymous Coward | more than 3 years ago | (#36510920)

Are you writing a book on pies?

Surely the BL would contain Mrs Miggen's recipe book and so writing a new tome would be "like a broken pencil... pointless"

Re:Finally, us mere mortals may have a glimpse (1)

Richard_at_work (517087) | more than 3 years ago | (#36510424)

Considering the items involved that require you to have a readers pass, yes of course it is difficult - they are one of a kind items, often needing to be handled in specific ways and treated with extreme respect, costing millions of pounds to restore, thousands of pounds to store and cannot be replaced. They are exactly the items that need a gate keeper to look after them.

Re:Finally, us mere mortals may have a glimpse (1)

Anonymous Coward | more than 3 years ago | (#36510570)

ALL items in the British Library require a Reader's Pass to view, except for the limited stock that they retain for inter-library loan.

This is regardless of their provenance or rarity.

Re:Finally, us mere mortals may have a glimpse (0)

Anonymous Coward | more than 3 years ago | (#36510586)

What they need is someone to digitize them. I sure hope google Does No Evil TM...

Re:Finally, us mere mortals may have a glimpse (1)

digitig (1056110) | more than 3 years ago | (#36511014)

The BL blows on about adding to "our shared heritage" but the truth is that they are notoriously fickle and arbitrary about issuing Reader's Passes to actually use their collection.

It's automatic if you are doing a postgraduate degree.

I have had my application for a pass refused as my research justification was deemed "insufficiently scholarly", even after I had spent 10 minutes being interviewed by the secretary. The average man on the street who wanders in to their London campus will be in for a rude shock.

You don't accept the possibility that your research justifiction might have been insufficiently scholarly?

Even if the staff judge you to be worthy enough to view their precious possessions you have to jump through hoops just to reserve the item.

You ask the person on the information desk to reserve it for you, or you log in to the electronic catalogue (on-site or on-line), look the item up, press the "reserve" button, and select the reading room to which you want it delivered. If you consider that to be jumping through hoops then it says a lot for the academic standard you are likely to achieve.

Whenever I finally publish the fruits of my work I will happily flout the Legal Deposit Libraries Act and refuse to provide BL a copy.

And nothing of value was lost, I suspect.

Re:Finally, us mere mortals may have a glimpse (1)

Lincolnshire Poacher (1205798) | more than 3 years ago | (#36511392)

Hi there,

Do you hold a BL Reader Pass? Actually they're also now available to undergraduates, but since I am 20 years out of Uni that's not much help to me either

> You don't accept the possibility that your research justifiction might have been insufficiently scholarly?

"A history of astro-navigation" may not be Earth-shatteringly exciting, but who are the BL to judge its merit? I had a case for research work, I showed that pamphlets they held were not available elsewhere but my application was denied for no reason other than the secretary was grumpy that day. She could provide no objective explanation.

> And nothing of value was lost, I suspect.

Exactly the attitude expressed by the BL.

Re:Finally, us mere mortals may have a glimpse (1)

digitig (1056110) | more than 3 years ago | (#36512528)

Hi there,

Do you hold a BL Reader Pass?

Yes.

Actually they're also now available to undergraduates, but since I am 20 years out of Uni that's not much help to me either

They're available to anybody who can make the case for one, irrespective of study level. It's just that doing postgrad studies is one of the objective criteria that automatically makes the case.

A history of astro-navigation" may not be Earth-shatteringly exciting, but who are the BL to judge its merit?

They are the people appointed with the task of making that judgement.

I had a case for research work, I showed that pamphlets they held were not available elsewhere but my application was denied for no reason other than the secretary was grumpy that day. She could provide no objective explanation.

In other words, you failed to make the case and it's somebody else's fault. There is a set of objective criteria to decide whether somebody can get a card. If you fail those tests then you get a second chance with an interview and a subjective judgement. It's meaningless to complain that she could "provide no objective explanation". You'd already failed the objective tests.

> And nothing of value was lost, I suspect.

Exactly the attitude expressed by the BL.

So you are still failing to make your case.

Re:Finally, us mere mortals may have a glimpse (1)

Jeremy Erwin (2054) | more than 3 years ago | (#36513720)

That's strange. The Library of Congress gives readers passes out to most anybody who applies.

Re:Finally, us mere mortals may have a glimpse (1)

tehcyder (746570) | more than 3 years ago | (#36514070)

I will happily flout the Legal Deposit Libraries Act and refuse to provide BL a copy.

What with that and your user name, you're on two strikes. Just as well you're not in the US, or the next time you crossed the road owithout lokking properly you'd be off to prison for thirty years.

Article and Summary wrong... (0)

Anonymous Coward | more than 3 years ago | (#36510478)

"The Library, the only British institution that automatically receives a copy of every book and periodical to go on sale in the United Kingdom and Ireland"

Factually not true. The Bodleian library in Oxford, and two others I can't remember also get a copy. Not the point of the article, but sad that the Wall Street Journal would make such a mistake.

Re:Article and Summary wrong... (0)

Anonymous Coward | more than 3 years ago | (#36510702)

I think you'll find that the WSJ is not mistaken. The BL is the only library to automatically to receive a copy. An additional 5 libraries can request a copy within 12 months of publication, to which the publisher must comply with.

Re:Article and Summary wrong... (1)

flimflammer (956759) | more than 3 years ago | (#36511068)

This has been pointed out and proven wrong a dozen times already in the comments. Only the British Library gets one automatically, the other libraries may request a free copy.

Re:Article and Summary wrong... (1)

Panoptes (1041206) | more than 3 years ago | (#36511874)

The Brotherton at Leeds University is also a copyright library.

Re:Article and Summary wrong... (1)

tehcyder (746570) | more than 3 years ago | (#36514124)

No, you're wrong..

This has already been answered by several people who either knew or could be bothered (like me) to spend ten seconds on Google.

How do they do it? (0)

Anonymous Coward | more than 3 years ago | (#36510550)

Just wondering... what is the process?

The article states it will take three years, be done at a secret location, and we can infer it will cost less than 6 million pounds.

Do they hire temps, or permanent employees hired specifically for this purpose?
Must the individuals be professionals so that documents aren't damaged?
How big a workforce is dedicated to this one effort? Does it take place 'round the clock?
Do they use automated machines, scanning beds, or wands?
Are they OCR'd and then proofread?

Each book is digitized in an average of 6 minutes, so this might give some hints.

Re:How do they do it? (2)

mccalli (323026) | more than 3 years ago | (#36510680)

I worked at company that did the same for the French National Library, about fifteen to eighteen years ago. To go through your questions:

We had a mix of temps and perms, mostly temp scanner operators and perm developers.

Professionals - yes, there were clauses in the contract about how much we paid if things were damaged.

Team size? Smaller than you might think - we had about ten at its peak. Around the clock - not quite, but there were definitely early and late shifts.

We used then-flash Bell & Howell scanners with expensive document feeders to avoid ripping the papers. We used Kofax image processing cards at a staggering 1Mb VRAM (yes - feel the power...) and super-powerful PCs too (486DX2 66Mhz). We stored the resulting TIFFs on a vast network server (a Network 3 1gb machine called Leviathan. Inconceivably it ran out of space so we bought a second called Behemoth). Actual process was to guillotine the books and feed them through the scanners, some books would then be restitched. In the case of rare books we'd photograph them instead (and then scan the photo - this predates digital cameras).

Yes, we then OCR'd them, and the contract stipulates that x pages in 100 have to then be proof-read.

Clearly the tech is now completely outclassed, but I'd be surprised if the contract and physical side has changed much. Am not terribly surprised to hear the British Library have taken the best part of two decades to catch up, we were talking to them at the time and they were terribly, terribly slow to see the potential in this.

Cheers,
Ian

Re:How do they do it? (1)

N Monkey (313423) | more than 3 years ago | (#36510954)

I worked at company that did the same for the French National Library, about fifteen to eighteen years ago. To go through your questions: ...

Actual process was to guillotine the books and feed them through the scanners, some books would then be restitched. In the case of rare books we'd photograph them instead (and then scan the photo - this predates digital cameras).

I thought that Google had tech that could scan the pages of an original book and automatically compensate for any curvature. IIRC** it did something like flash a test pattern onto the page to determine how to straighten the final image.

**but it was a while ago I read this so could easily be mistaken.

Re:How do they do it? (1)

mccalli (323026) | more than 3 years ago | (#36512184)

"I thought that Google had tech that could scan the pages of an original book and automatically compensate for any curvature. IIRC** it did something like flash a test pattern onto the page to determine how to straighten the final image."

We did that too - the Kofax card and driver software could take care of deskewing and it did a reasonably good job. Again, this was a while ago so I imagine things have improved but it wasn't too bad.

Cheers,
Ian

Re:How do they do it? (1)

hackertourist (2202674) | more than 3 years ago | (#36511102)

I've been involved with a similar project in the Netherlands. We found that commercial OCR engines had a high error rate on these old documents. We ended up having each document OCR'ed twice: once by software, once by having a sweatshop in India manually type up the document. The Indians had a lower error rate than the OCR software. By combining the two sources we could achieve an error rate low enough to comply with the project spec.
The project was unusual in that the documents were an index (of the minutes of parliament meetings); this meant it was full of words without context (incl. loads of names), and part of the information was in numbers, so we couldn't use a spelling checker to increase accuracy.

Using a spelling checker on century-old documents is iffy anyway, since you need one that has the then-current vocabulary instead of modern spelling.

Re:How do they do it? (1)

zevans (101778) | more than 3 years ago | (#36510946)

Mod this up, interesting discussion.

I'd guess the answer to 1 and 2 is "it depends." There must be rarities for which a full-on expert is required with white gloves and a wand (and in their spare time they supplement their income as street magicians.)

The proofreading is at least partly through reCAPTCHA. "Currently, we are helping to digitize old editions of the New York Times and books from Google Books." http://www.google.com/recaptcha/learnmore [google.com]

Re:How do they do it? (1)

martin-boundary (547041) | more than 3 years ago | (#36511128)

Heh, reCAPTCHA isn't exactly foolproof. There's more spambots solving them per minute than humans. So if a human gets it right but two spambots already agreed on a wrong answer, guess what the system does...

Re:How do they do it? (1)

tehcyder (746570) | more than 3 years ago | (#36514166)

Do they use automated machines, scanning beds, or wands?

No, they're transcribing everything by hand using quill pens and ink, then typesetting it on proper hot metal presses, then finally photographing each page with an 11 x 14 plate camera and emailing the images one page at a time to everyone who has a gmail account.

This is going to be incredibly great (3, Insightful)

davide marney (231845) | more than 3 years ago | (#36510720)

The 18th century saw the birth of both the Industrial Age [wikipedia.org] and the Age of Enlightenment [wikipedia.org] . This was a time of profound change on a global scale that easily rivals the impact of our own information age.

You may ask what is the point in studying history -- who cares about the impact of steam power, for example? Here's the thing: although technology improves over time, people basically remain the same. By understanding the dislocation of farmers to factories in 1750, you can gain insight into the dislocation of national workers to global workers today.

To get access to literally every single published work from this period is going to be amazing. Bravo UK and Google!

Re:This is going to be incredibly great (1)

elrous0 (869638) | more than 3 years ago | (#36512284)

people basically remain the same

Well, yeah, but they smell a lot better now.

Re:This is going to be incredibly great (0)

Anonymous Coward | more than 3 years ago | (#36512930)

You have never been to France in the summer have you?

Moving towards a link free internet! (0)

Anonymous Coward | more than 3 years ago | (#36510776)

Is it just me or is the trend for news/blog site to omit external links to the site or institutions referenced in articles getting bloody annoying. This is a web page talking about other web pages/sites for Christ's sakes!! This one has the links de-linked, and in plain text, argh!

Future Libraries? (1)

Subratik (1747672) | more than 3 years ago | (#36510806)

I wonder what they will look like... If someone hasn't thought of it before, someone should start drawing up plans for futuristic libraries where instead of checking out paper books you can check out books for your kindle or some other device... on top of that, I think it would be cool for it to look like a traditional library, but server racks instead of bookshelves.. (this probably just seems cool to me because I'm a nerd, I have a lot of friends who are 'conservative' when it comes to paper books.. A lot of the English majors I know treat technology like the anti-christ.

Re:Future Libraries? (1)

MichaelSmith (789609) | more than 3 years ago | (#36510830)

I wonder what they will look like... If someone hasn't thought of it before, someone should start drawing up plans for futuristic libraries where instead of checking out paper books you can check out books for your kindle or some other device... on top of that, I think it would be cool for it to look like a traditional library, but server racks instead of bookshelves.. (this probably just seems cool to me because I'm a nerd, I have a lot of friends who are 'conservative' when it comes to paper books.. A lot of the English majors I know treat technology like the anti-christ.

I think your electronic library will look like this [google.com] and the server racks will be located somewhere with cheap power and air conditioning.

Out-of-copyright works (1)

SirGarlon (845873) | more than 3 years ago | (#36510966)

So my question is, since the original material is in the public domain (copyright expired), is Google's digitized copy in the public domain as well?

Re:Out-of-copyright works (0)

Anonymous Coward | more than 3 years ago | (#36511098)

Unlikely. Their images of the works will be copyrighted to Google, but the content on the page is public domain. So you can copy that yourself, but won't be able to use their image directly. Daft isn't it?

Re:Out-of-copyright works (2)

Amorymeltzer (1213818) | more than 3 years ago | (#36512114)

Possibly. In the US, the Bridgeman V Corel case decided that copies of public domain works are not copyrightable, but that of course has no bearing in the UK. There is a sense there that the ruling is reasonable, but straight up copies are definitely deemed copyrighted works thanks to (imho, inane) concepts like lighting and photogenicity. In this case, nobody's likely to complain, and surely not Google, but image copyright in the UK lies in the act of taking the photo and not generally in the creativity involved therein.

let's see what's actually happening (1)

Hazel Bergeron (2015538) | more than 3 years ago | (#36511060)

The British Library has just handed the copyright on a load of uncopyrighted work to Google, and Google in return gets exclusive commercial rights to the work. This is awful. And for only £6 million, by their estimate, they could have done it themselves - considering the broad range of interested parties, donations could easily raise that amount. Their effort would be far better, too, if the standards of Google's old archives are anything to go by.

This is just another example of the British "public private partnership", where one guy does an under-the-table deal with another guy to do something seemingly simple and relatively inexpensive in an unnecessarily convoluted and costly manner, ending up with a product/service far worse than it could otherwise have been.

The guilty party is the British people for allowing the government to engage in an ongoing sale of the country.

Fuck off, Google. It was OK when all you wanted to do is control the future - the future's not that interesting, if the last three decades can be extrapolated - but now you want to control the past.

Re:let's see what's actually happening (1)

Ksevio (865461) | more than 3 years ago | (#36513466)

No, they didn't hand over any copyrights, it even states that all digital rights revert back to the Library. Google already has the expertise to scan these books and the infrastructure to distribute them.

Basically they saved tax payers £6 million plus whatever the hosting and distribution costs would be AND the books are now easily accessible to anyone in the world! Do you really think the British Library could have done a better job than Google in house?

Re:let's see what's actually happening (0)

Anonymous Coward | more than 3 years ago | (#36521334)

I take it you have never seen a Google scan. Hands in the scanner, folded pages, blurred pages, missing pages, weird JPEG2000s that don't show in most PDF readers, missing images, horrific bitonal compression. For a scan of anything with any historical value, or anything with illustrations, Google scans are totally useless. The best scans are often the ones done by Microsoft - the Internet Archive has thousands of them.

hello webmaster (1)

formation (2241238) | more than 3 years ago | (#36511152)

Check to see if your Company name is available http://bit.ly/m2IHF4 [bit.ly]

newbie (0)

Anonymous Coward | more than 2 years ago | (#36540236)

im newbie help...
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?