Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Google To Digitize Much of Harvard's Library

timothy posted more than 9 years ago | from the that's-a-lot-of-library dept.

Books 296

FJCsar writes "According to an e-mail sent today to Harvard students, Google will collaborate with Harvard's libraries on a pilot project to digitize a substantial number of the 15 million volumes held in the University's extensive library system, which is second only to the Library of Congress in the number of volumes it contains. Google will provide online access to the full text of those works that are in the public domain. In related agreements, Google will launch similar projects with Oxford, Stanford, the University of Michigan, and the New York Public Library. As of 9 am on December 14, a FAQ detailing the Harvard pilot program with Google will be available at hul.harvard.edu."

Sorry! There are no comments related to the filter you selected.

Nice! (1)

mind21_98 (18647) | more than 9 years ago | (#11079425)

But aren't there projects that are already doing this?

Re:Nice! (1)

ravenspear (756059) | more than 9 years ago | (#11079431)

Already digitizing the Harvard library?

No.

Re:Nice! (1)

Meetch (756616) | more than 9 years ago | (#11079433)

More targets to avoid the pressure of /.ing?

Re:Nice! (4, Informative)

RollingThunder (88952) | more than 9 years ago | (#11079531)

Well, there's the Distributed Proofreaders [pgdp.net] project for Project Gutenberg [gutenberg.net] ... but PG isn't a "we must be the source" attitude from what I've seen. As far as PG is concerned, the more eBooks, the better.

DP probably isn't threatened either - they just shift focus to books that are not in the Harvard collection to avoid duplication of effort.

Re:Nice! (2, Insightful)

dvdeug (5033) | more than 9 years ago | (#11079581)

DP probably isn't threatened either - they just shift focus to books that are not in the Harvard collection to avoid duplication of effort.

Are they really going to provide proofread texts? A novel might only take a couple hours to process, but math is going to take hand markup, and some of the more complex critical editions are a bear. Even at only 2 hours a book (and that's not including scanning time), 4 million volumes adds up to 8 million man-hours or a million man-days. At seven bucks an hour that's 56 million dollars. I expect we'll get scans and OCR, but no hand work; there will still be a place for DP. In fact, we'll be better off, with a huge source of scans to work from.

Re:Nice! (1, Interesting)

Anonymous Coward | more than 9 years ago | (#11079734)

But also: PG books are full of errors, and there is no source info or scans available to fix against in any sort of easy way. Many books Such as Wealth of Nations went through a number of editions during the author's lifetime. It would be nice to have the various early editions for collation. And often times new editions come out long after the death of the author with bullshit editorial changes in order to claim a new copyright. A library like Harvard will have many of the first number of editions of classic works.

Re:Nice! (4, Informative)

happyemoticon (543015) | more than 9 years ago | (#11079576)

I happen to work for one.

It's focused on putting otherwise one-of-a-kind materials online for preservation and ease of access, rather than Byron: The Critical Anthology or Cather on the Rye. It's kind of a mammoth, innefficient beaurocracy, though; I don't agree with some of the practices (such as sending texts off to India to be scrivened, rather than just using OCR software), they're very, very slow to incorporate data, and there are a lot of other problems which stem from the fact that most of them are not computer people, but MIMS holders (librarians).

The fact that Google is doing it gives me hope. Hell, maybe I can jump ship.

One more reason... (2, Insightful)

Anonymous Coward | more than 9 years ago | (#11079436)

to never leave my apartment.

Not Just Harvard (-1, Redundant)

amigoro (761348) | more than 9 years ago | (#11079437)

According to this story [marketwatch.com] here, it is not just Harvard.

Moderate this comment
Negative: Offtopic [mithuro.com] Flamebait [mithuro.com] Troll [mithuro.com] Redundant [mithuro.com]
Positive: Insightful [mithuro.com] Interesting [mithuro.com] Informative [mithuro.com] Funny [mithuro.com]

Re:Not Just Harvard (4, Funny)

BizidyDizidy (689383) | more than 9 years ago | (#11079447)

Also according to the summary, Einstein.

Re:Not Just Harvard (1)

ravenspear (756059) | more than 9 years ago | (#11079484)

Also according to the summary, Einstein.

Yes but the FS is starting to go the way of the FA as far as the number of actual readers is concerned. I admit to occasionally falling victim to this unfortunate disease myself. Sometimes I only read the headline, and with some of the YRO ones that take up nearly the whole width of my 1280px wide monitor, sometimes I can't even get through all of that.

Re:Not Just Harvard (1)

rhennigan (833589) | more than 9 years ago | (#11079759)

Soon we are going to start seeing people saying "I didn't RTFS, but...". I think this shows us the direction we are all headed with /.

ads (5, Funny)

clovercase (707041) | more than 9 years ago | (#11079439)

will there be ads for particle accelerators, scanning tunneling microscopes and tokamaks in the margins?

Re:ads (5, Funny)

IntelliTubbie (29947) | more than 9 years ago | (#11079612)

will there be ads for particle accelerators, scanning tunneling microscopes and tokamaks in the margins?

Yes, but it'll be mixed in with ads for V14gr4, male "enhancement", and Nigerian wealth opportunities. When the scientists complain, the humanities faculty will protest that spam is a perfectly valid epistemology, and that the scientists' attempt to impose an orthodoxy of "truth" in advertising is simply a power grab to extend Western, white male hegemony. At which point, the scientists will defect to MIT's library down the street.

Cheers,
IT

Google Cars (3, Funny)

Zilverfire (819134) | more than 9 years ago | (#11079445)

Google is diversifying extravagently, pretty soon all of us geeks will be driving google cars that can cross reference the library of congress

Re:Google Cars (1)

Televisor (827008) | more than 9 years ago | (#11079521)

No, that'd involve going outside.

Will it be like google scholar? (5, Interesting)

baronben (322394) | more than 9 years ago | (#11079448)

Ever since they introduced Google Scholar [google.com] , I've been wanting something like this for my university [utoronto.ca] . For those of you who don't know, finding articles on a subject can be a pain in the ass, as subjects are indexed on several different systems (depending on subject, date, and journal). None of them, not one, has a decent interface or gets results that are as good as google. Google scholar lets you search through academic texts, but its limited to what's available, usually working papers or pre-published drafts. If there is some way that google could team up with Academic printers to index as many journals and texts as possible, this would make everyone's life a lot better.

I think this is a great start, There's incredible profit here too, universities spend millions for catalogue systems. If I could use one interface to search for books, chapters, and articles on a subject, I could spend more time actually learning, and less time looking at the same damn "no results" page on GeoWeb. Grrrr.

Re:Will it be like google scholar? (1)

adeydas (837049) | more than 9 years ago | (#11079475)

its happening all over... IIT's in india are digitising libraries too... i guess it won't be long before every thing will start with e...

Re:Will it be like google scholar? (2, Interesting)

ISEENOEVIL (206770) | more than 9 years ago | (#11079500)

As long as we don't have something like Google comes in and picks up these prestigious library resources, Yahoo comes and gets another set, and then Microsoft picks still more. I have a feeling some of these resources are wanting to be universally accessed. This is one step closer, but still not close enough if you have to use 3+ different major search engines. My library fees that are tacked onto tuition would actually be used if I could use my preferred search engine to access everything my university is paying so much for in one place. As it stands now I cringe when I have to navigate our electronic resources.

-Stormy

Re:Will it be like google scholar? (4, Interesting)

Txiasaeia (581598) | more than 9 years ago | (#11079520)

"If I could use one interface to search for books, chapters, and articles on a subject, I could spend more time actually learning, and less time looking at the same damn "no results" page on GeoWeb. Grrrr."

Or finding that perfect article in the MLA database, only to find out that nobody in Canada subscribes to the journal, nor does anybody have the journal on fulltext. I'd rather have a more comprehensive fulltext database in plaintext rather than digitalised copies of everything anyway - makes searching a hellova lot easier.

Re:Will it be like google scholar? (4, Insightful)

baronben (322394) | more than 9 years ago | (#11079539)

That's a great point, that I think should be addressed (it has a bit, with some free-online journals, but nothing major). In the world of digital publishing, why do journals cost thousands of dollars a year. Its certainly not in costs, academics pay the journals to defray the cost of publishing, and editors and referees generally get only an honorarium, if anything.


Sure, the company needs to get some money to cover the costs of printing, distribution, and other things, plus the associations that sponsor the journal want some money to help hold conferences, but why, oh why, must they price journals so expensively that many colleges can't even afford them?

Billy Corgan of Smashing Pumpkins Found Dead (-1, Offtopic)

Don Giovanni (300778) | more than 9 years ago | (#11079638)

I just heard this on CNN!!!! This is
terrible.

Maybe someone can post a link.

hah (2, Funny)

usernotfound (831691) | more than 9 years ago | (#11079449)

Doesn't matter if they do Purdue's, I think we have the 11th worst library in the Big10. I already use Google for my papers, anyways.

first kumquat! (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11079450)

First kumquat [slashdot.org] on this story!

Re:first kumquat! (0)

Anonymous Coward | more than 9 years ago | (#11079478)

Are you trying to google bomb 'kumquat'? If so, the effort so far looks rather weak.

pHe4r the kUmqu4t! (0)

Anonymous Coward | more than 9 years ago | (#11079510)

I know, it's been slow going so far. But, you do what you have to.

That said... kumquat! [slashdot.org]

So... (4, Funny)

Anonymous Coward | more than 9 years ago | (#11079451)

If I download a book, when do I have to upload it again? What is the late fee if I forget?

What the fuck is wrong with you? (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#11079595)

This type of lame humor isn't tolerated in real life, so why the fuck do you think it's OK here?

Re:What the fuck is wrong with you? (0, Flamebait)

tarunthegreat2 (761545) | more than 9 years ago | (#11079694)

Because in Soviet Russia, the Library (and Lord Dredd from Captain Power) digitizes YOU!
Also because in Soviet Russia, real life tolerates YOU!
I, for one, welcome our PDF-making adsense-offering overlords.

Re:What the fuck is wrong with you? (0)

Anonymous Coward | more than 9 years ago | (#11079713)

You must be new here.

Let me be the first to say (1)

slinky259 (827395) | more than 9 years ago | (#11079454)

That is funking awesome!

~stephen

http://slinky259.blogspot.com [blogspot.com]

Google to cache the Universe (4, Funny)

sjrstory (839289) | more than 9 years ago | (#11079456)

Seeing as Google cached the entire Internet (the last page of the Internet can be seen here): http://www.google.ca/search?q=cache:dQrQDn0dHW8J:w ww.1112.net/lastpage.html+the+end+of+the+Internet& hl=en&client=firefox-a [google.ca] Google is now looking to cache everything else in the Universe :)

get your scuba gear... (2, Insightful)

uighur (818297) | more than 9 years ago | (#11079459)

because its time to dive into the deep web. Projects like this are the key to unlocking the vast stores of important which are currently not readiy accessed online. Personally I'd like to see a Google-run free access Lexis-Nexus project.

Re:get your scuba gear... (1)

burns210 (572621) | more than 9 years ago | (#11079770)

scholar.google.com

They are getting there.

15 million volumes? (3, Funny)

Anonymous Coward | more than 9 years ago | (#11079465)

Please, give me the the values in standard metrics, like Libraries of Congress!

Re:15 million volumes? (0)

Anonymous Coward | more than 9 years ago | (#11079516)

Please, give me the the values in standard metrics, like Libraries of Congress!

If they were written on stone tablets, these volumes would weigh aproximately 75 million long cwt (hundredweights), which is the equivalent of 9.8 billion slugs, or roughly 8.42 billion lbs avoirdupois. Hope this helps.

Re:15 million volumes? (2, Funny)

HoneyBunchesOfGoats (619017) | more than 9 years ago | (#11079550)

From Fascinating Facts About the Library of Congress: [loc.gov]

The Library of Congress is the largest library in the world, with nearly 128 million items on approximately 530 miles of bookshelves. The collections include more than 29 million books and other printed materials, 2.7 million recordings, 12 million photographs, 4.8 million maps, 5 million music items and 57 million manuscripts.

So to answer your question, it's about 0.52 LoC if you count only the books. :)

Re:15 million volumes? (-1, Offtopic)

tarunthegreat2 (761545) | more than 9 years ago | (#11079708)

So to answer your question, it's about 0.52 LoC if you count only the books. :)

Sorry, the acronym LoC is owned by India and Pakistan. It refers to the ceasefire line dividing Indian and Pakistani Kashmir (Line of Control [wikipedia.org] ). That being said, as an Indian, ALL YOUR KASHMIR ARE BELONG TO US. SOMEBODY SET UP US THE BOMB!

Re:15 million volumes? (1)

Afrosheen (42464) | more than 9 years ago | (#11079789)

I wonder how many miles of classified documents they or the Pentagon have under wraps, just waiting to be discovered?

Re:15 million volumes? (5, Informative)

pmc (40532) | more than 9 years ago | (#11079791)

The Library of Congress is the largest library in the world, with nearly 128 million items on approximately 530 miles of bookshelves.

The British Library (www.bl.uk) has 150 million items (but fewer bookshelves) so the claim of "largest" is a bit dubious.

For /. readers 1 BL = 1.17 LoC

Images and formatting? (2, Insightful)

MacFury (659201) | more than 9 years ago | (#11079476)

I should RTFA but what about images and general formatting? I suppose you could find the relevant text, then try and get the physical book...but if you could view the book in it's original formatting...that would be sweet.

Just how much storage space will all this data consume? It seems like a massive undertaking.

Money to blow! (1)

Anonymous Coward | more than 9 years ago | (#11079480)

Wow, so I guess Google doesn't know what to do with their IPO money and is just blowing it on a me-too project!

Are these volumes stored as text or pictures? (2, Insightful)

wealthychef (584778) | more than 9 years ago | (#11079486)

I am ambivalent about this. Will the books be stored as text to enable searching? If so, given that part of a book's character is its font and typesetting, will ALL the flavor of these books really be captured, in the same way that it would be to read them? Something seems likely to be "lost in translation" here.

Re:Are these volumes stored as text or pictures? (3, Insightful)

clovercase (707041) | more than 9 years ago | (#11079508)

i think your comments would be salient if they were going to scan the documents and the BURN the originals. putting massive content on the web for free is the best way to push content all over the world. some internet user in sri lanka doesn't have the bandwidth to download images of the pages, and would never have the opportunity to view the actual documents in a library at harvard. if everyone digitized all the valuable content (and i presume that much of the content in harvar's libraries are valuable), and made it freely available, the world would be a much better place. would you be satisfied if there was a link on each page to view an image of the actual page?

Re:Are these volumes stored as text or pictures? (3, Interesting)

robla (4860) | more than 9 years ago | (#11079513)

I would hope the handle it in just like catalog.google.com [google.com]

Re:Are these volumes stored as text or pictures? (0)

Anonymous Coward | more than 9 years ago | (#11079620)

For non public domain works they will probably only provide access to a low resolution image.

Re:Are these volumes stored as text or pictures? (4, Insightful)

Txiasaeia (581598) | more than 9 years ago | (#11079526)

I think you're missing the point. I'm not so much concerned with getting rid of dead tree books (I love reading paper books for enjoyment); I would, on the other hand, prefer all my academic sources to be electronic. As I mentioned in reply to another poster, it's a huge pain to look something up on MLA or Expanded ASAP only to find out that my university doesn't carry it and the interlibrary loan system can't get it for two or three weeks because it's backlogged as it is. I could care less about the spiffy fonts and typesetting; give me the plaintext so I get my research done!

Both Images & Uncorrected OCR should be availa (4, Insightful)

dananderson (1880) | more than 9 years ago | (#11079584)

Typically, both page images and uncorrected OCR are made available. Correcting OCR is too labor-intensive for thousands of books.

The uncorrected OCR is very useful for indexing (by Google or others), as the 5% or fewer typos are not enough to interfere with indexing keywords. Uncorrected OCR can also be corrected later.

The page images are tied with the uncorrected OCR so you can see exactly what's there.

For an example, see books at University of Michigan's Making of America (MoA) Exhibit [umich.edu] , which has thousands of 19th century books and periodicals available.

Yeah but Harvard? (1, Funny)

Anonymous Coward | more than 9 years ago | (#11079488)

Everyone knows that Harvard sucks [harvardsucks.org] .

Ivy Exchange (1, Informative)

Anonymous Coward | more than 9 years ago | (#11079489)

I know Brown has been digitizing all journals coming in for a while...

On another note, all the Ivies except Haavad participate in interlibrary loan program. There's over 40 million bound volumes overall. Check it out here [brown.edu] .

The Fight against Plagiarism (5, Interesting)

manmanic (662850) | more than 9 years ago | (#11079491)

One reason why this is in the interest of big old universities like Harvard is that it will make it much easier to detect plagiarism in students' essays. If published books were included in Google's index, a plagiarism detection service like Copyscape [copyscape.com] would also be able to check whether content was lifted from printed material, as well as from the web.

Flipside: The false positive problem (2, Insightful)

rsborg (111459) | more than 9 years ago | (#11079528)

Ok, so this is just a bit of devil's avocate, but what happens if you just *happen* to have a writing style similar to someone else who was printed before... what if you read something, and unknowingly wrote something in a similar vein in your essay? I assume you could check it yourself, but then that would just introduce extra cost to even write the essay in the first place... or worse, the plagiarists could just "tweak" their papers ensuring that they're "below the radar" by changing enough style to not be recognizeable...

False positives can be double-checked manually (2, Insightful)

wrinkledshirt (228541) | more than 9 years ago | (#11079679)

The professor can just wait until the match comes up, and then double-check at that point.

You'd want to do a thorough overview of any potential instance of cheating anyway. A quick run-through would determine whether or not a paper happened to contain an identical sentence clause or three identical paragraphs.

I think the bigger problem would be the second one you described -- that students could plagiarize and then go through each paragraph, changing the wording slightly so as to avoid positive matches. Still, you could argue that this is pretty much what academics is anyway, just with footnotes and a bibliography.

Re:Flipside: The false positive problem (2, Interesting)

Gori (526248) | more than 9 years ago | (#11079693)

Well, there are such things as references.

Using work of other people in academic work is not only possible, but greatly encouraged. Just make sure that it is very clear what comes from whom.

In many ways, science is done exactly as Open Source software. Take what you need, modify and improve it where appropriate, and make sure you give full credit where due.

As a teacher, I have given full points to a paper that has hardly any text of their own, as long as they are properly referenced, and used together to make a valid point, not made by any of the sources.

So I do not think students should bother staying below the rarad. Just reference everything,and voila, you are doing science

Re:The Fight against Plagiarism (0, Offtopic)

hussar (87373) | more than 9 years ago | (#11079617)

It will also be interesting to see if anyone runs a project to see how much of the historical material was lifted from earlier writers. For example, how much did the US founding fathers "borrow" from other published works? A number of the early US statesmen attended Harvard (the second President of the United States, John Adams springs to mind), and it would be interesting to see how much, if any, of their writing was copied. John Locke's writing influenced much of the political opinion around the time of the founding. Did he "contribute" more than we know?

Crapload of storage... (1)

killa62 (828317) | more than 9 years ago | (#11079494)

So does this mean that the movies/audiotapes will be archived too. That's a crapload of storage.

Yeah, but, Harvard? (-1, Redundant)

Anonymous Coward | more than 9 years ago | (#11079501)

Digitize this [harvardsucks.org] .

Loebs!!!!! (1)

canicus (670885) | more than 9 years ago | (#11079506)

Maybe they'll put the Loebs up! No more $20 a pop when you live in a really
obscure town.

Labour force (1)

stonda (777076) | more than 9 years ago | (#11079517)

So what do they have for the task itself, little children from foreign countries?

How will the books be scanned? (2, Interesting)

supersat (639745) | more than 9 years ago | (#11079525)

About two months ago, Jeff Dean (an employee of Google) gave a talk [washington.edu] at the University of Washington about the inner workings of Google. One thing he mentioned was Google Print and how they scan books: they slice 'em up into individual pages, and then feed them through a scanner. This doesn't seem like an acceptable way to archive a library's collection. So, how are they scanning them in? Why not use this method for Google Print?

"Slice and scan" is used for new books only (3, Informative)

dananderson (1880) | more than 9 years ago | (#11079558)

I'm not familiar with Google Print, but typically "slice and scan" is usually used for new books only. That's because there's multiple copies of the book available and the paper is usually flat and dust free.

For older books, most archivists use a cradle and photograph the pages. It's easier on the book, requires no slicing, and there's no scanner to clog with dust.

The disadvantage is the scanner operators need a little bit more training, but that's not a big problem.

But will you be allowed to copy the materials? (1)

Animats (122034) | more than 9 years ago | (#11079527)

Or will they try to lock them up with an EULA, the DMCA, and some eBook system?

Re:But will you be allowed to copy the materials? (1)

QuantumG (50515) | more than 9 years ago | (#11079571)

well even if they do try to lock em up I can't see how they'd win a case if you were copying material that is in the public domain.

I beg your pardon... (0)

Anonymous Coward | more than 9 years ago | (#11079530)

Make that the SECOND largest library system after the Library of Congress! University of California Library system is the largest.

Re:I beg your pardon... (0)

hussar (87373) | more than 9 years ago | (#11079629)

That may be true, but it is not all in one place. It is strategically scattered along the San Andreas Fault.

Reminds me of the U of Michigan and U. Microfilms (2, Informative)

Ungrounded Lightning (62228) | more than 9 years ago | (#11079538)

Back around the '60s or so the University of Michigan cut a similar deal with University Microfilms.

U Microfilms set up and ran a microfilming operation in the library system, microfilming everything that wasn't under copyright (and much that was with permission of the copyright holders, such as several large newspapers and many magazines and other periodicals), along with much of the University's records. Rare books, etc.

(If I have this right) the U got microfilm prints of the documents for free and didn't have to pay for the microfilming of its records. University Microfilms made its money by selling microfilms of the various publications (forwarding royalties, where appropriate, to the copyright holders). The rare books, for instance, could now be studied on microfilm with no further stress on the original, and their content became available at many other colleges and libraries. Good deal all around.

University Microfilms was founded by a regent, who was later slammed for conflict of interest. He dropped out of the Board of Regents but the business deal continued.

clinton (1)

sewagemaster (466124) | more than 9 years ago | (#11079541)

if this is clinton's "library" that's tp be "googlized" and "digitized", then that'll be an interesting "shot"... ;)

Homer: mmmmmm digitized google....

Sweet job Harvard (1)

jtbauki (838979) | more than 9 years ago | (#11079543)

That is awesome. I just wonder how the book publishers will respond? Imagine being able to read any textbook without paying for it? How will those textbook publishers who keep raising prices and reprinting books with "new" editions make money... I'm imagining an RIAA-like attack on online books. Watch out Google!

Education should be free. Especially now that information can be distributed so cheaply and so efficiently

On a side note, I believe the government should create some standardized books. For example, calculus. It's the same equations and theorems that each school teaches and it hasn't changed much in a long, long time. Teachers can dictate which parts to emphasize. We can have a committee of well-established professors write the book in the same way that any other calc book is written. The book can undergo revision every 10 years or more. Think of the money that can be saved for students!!! It can even be available online! Of course some books can't be standardized like history, where different viewpoints produce different versions of history. /P

Re:Sweet job Harvard (1)

hussar (87373) | more than 9 years ago | (#11079701)

Education should be free. Especially now that information can be distributed so cheaply and so efficiently

You have confused information and education. Information is the raw material. Education is the (never-quite-completely-) finished product.

You have also discounted the value of teachers and professors guiding you through the mass of information available in order to help you to use it to get an education. Their efforts are valuable and worthy of compensation. They can be paid for by your tuition fees, by a tax on all of us (in which case, you owe us some benefit in return for you studies), or a combination of the two. But, there is no free (as in beer) education.

Re:Sweet job Harvard (1)

aconbere (802137) | more than 9 years ago | (#11079717)

ugh... I don't know if anyone else has had experience with "standardization" done by the government, but I know that my school in junior high switched to standardized texts and class room materials, whcih frankly, were just terrible. This is just asking for failure. (dons tin foil cap) Horace mann was one of the founding father of public education in the US. His primary reason for promoting public education? "It is the best way to indoctrinate the youth with protestant republican (like "the repulic" not "the party") values." If you look at our public education system, this type of think still holds true in many areas. It's true of the no child left behind policy, it's true of state mandated curiculem, it runs rampent. Examples are clearly evident in history text books, but can also find their ways into math texts. We all know that there are better ways of teaching math than what's going on in the states. We can look at how mathematics works in china and say, wow, they teach children basic fundementals of sets and numbers then teach them to aply that to arithmatic. But the current wealth of standardized texts in math continue to press the same old, memorize the theorem, do a million of the same problems, forget the theory move on. Without the wealth and variet offered by competing text books, and freedom of education we are doomed to continue down the same path to eternity. Once we let Government dictate how we learn, how to do we regain control over content? We all know how beauracracy works, do we want it to take that long to update text books? Do we want it to be that hard to get corrections made? Education needs the competition of distributed text books, too many are crap, schools and teachers need the power to decide what text book is proper for their curiculem. The education system in america is in sad sad shambles, this is hardly the way to go about fixing it. Anders

Harvard is a small player, actually (1)

organum (210431) | more than 9 years ago | (#11079544)

40,000 volumes. Compared to 8 million for Stanford and 7 million for Michigan. The latter already has almost 20 million pages online.

University of California is anti-digital (5, Informative)

dananderson (1880) | more than 9 years ago | (#11079545)

This is great. Compare this pro-digitalization attitude of Harvard, Stanford, and others with the University of California's (UC's) anti-digital position.

For books in Special Collections, they won't allow copies to be digitalized unless they are (1) paid a fee to scan the book (fair enough) and (2) paid a royalty to post the book to the web.

The royalty amounts to hundreds or thousands of dollars per book (about $100/page or image). This allows the libraries to act as a "profit center" for the universities. This policy applies to all UC campuses (I've tried UCB, UCLA, UCI, UCSD).

This is true even though the book is in the public domain (because they have physical possession and nobdy can make copies until you sign a license agreement). This is true even if you're using the book for non-commercial purposes (such as free posting to the web).

Something is wrong here. People donate to UC libraries (either books or money) for the public good. They don't donate so the library can start a business licensing public-domain books.

Despite that, I have been able to scan many books (by using books in open stacks or purchasing them). These books concern Yosemite history and are at http://www.yosemite.ca.us/history/ [yosemite.ca.us]

Re:University of California is anti-digital (1)

rritterson (588983) | more than 9 years ago | (#11079719)

Too bad I can't mod you up, because I just had to reply instead. I go to UCB and often hear us brag about how we have the second largest 'public' collection in the nation (or is it world?), after the Library of Congress (Harvard is bigger, but is privately owned). It makes me quite sad that is our policy if what you say is true. Donations to the library are down, funding is short, and access to many journals has been cut. Digitizing books would save money and resources, and benefit everyone. Public Universities exist for public good, not for state profit. Heck, it's hard to get into the library without a UC-issued ID, and even then you can't take anything out of the library. Now, just how many out of the 15 millions of books can one student body actually use at one time?

Re:University of California is anti-digital (2, Interesting)

JoshuaDFranklin (147726) | more than 9 years ago | (#11079744)

Got a link for that policy?

Ever tried a Freedom of Information Act (FOIA) request? Strange as it may seem, that apparently works in the State of Washington.

Planned for 05' (1)

after (669640) | more than 9 years ago | (#11079555)

Microsoft will do the same

don't forget... (1)

kaedemichi255 (834073) | more than 9 years ago | (#11079563)

don't forget that amazon [amazon.com] has the "search inside the book" feature that has been available for a few years now. i guess the main difference is that google is targeting a lot of academic sources, while amazon gets its database of book texts from publishers. if the two were combined... then maybe they could form ibdb.com, the Internet Books Database ;)

Their entire library? (0)

Anonymous Coward | more than 9 years ago | (#11079564)

I bet we are going to find Goosebumps books in it. Millions of them.

Text of Dec 13th Email (5, Informative)

olvr (840066) | more than 9 years ago | (#11079570)

December 13, 2004

Dear Colleague,

I am writing today with news of an exciting new project within the Harvard libraries. As all of us know, Harvard's is the world's preeminent university library. Its holdings of over 15 million volumes are the result of nearly four centuries of thoughtful and comprehensive collecting. While those holdings are of primary importance to Harvard students and faculty, we have, for several years, been considering ways to make the collections more useful and accessible to scholars around the world. Now we are about to begin a project that can further that global goal-and, at the same time, can greatly enhance access to Harvard's vast library resources for our students and faculty.

We have agreed to a pilot project that will result in the digitization of a substantial number of volumes from the Harvard libraries. The pilot will give the University a great deal of important data on a possible future large-scale digitization program for most of the books in the Harvard collections. The pilot is a small but extremely significant first step that can ultimately provide both the Harvard community and the larger public with a revolutionary new information location tool to find materials available in libraries.

The pilot project will be done in collaboration with Google. The project will link Harvard's library collections with Google's resources and its cutting-edge technology. The pilot project, which will be announced officially tomorrow, is the result of more than a year of careful consultation at many levels of the University. We could not have achieved a meaningful pilot project without the efforts of the Harvard Corporation; the President, Provost, Chief Information Officer, and Office of General Counsel; the University Library Council; and senior managers within the College Library and the University Library.

A full description of the pilot program follows here, with further materials available on the Harvard home page tomorrow.

With best regards,
Sidney Verba
Carl H. Pforzheimer University Professor and
Director of the University Library


Project Description:
Harvard's Pilot Project with Google

Harvard University is embarking on a collaboration with Google that could harness Google's search technology to provide to both the Harvard community and the larger public a revolutionary new information location tool to find materials available in libraries. In the coming months, Google will collaborate with Harvard's libraries on a pilot project to digitize a substantial number of the 15 million volumes held in the University's extensive library system. Google will provide online access to the full text of those works that are in the public domain. In related agreements, Google will launch similar projects with Oxford, Stanford, the University of Michigan, and the New York Public Library. As of 9 am on December 14, an FAQ detailing the Harvard pilot program with Google will be available at http://hul.harvard.edu.

The Harvard pilot will provide the information and experience on which the University can base a decision to launch a large-scale digitization program. Any such decision will reflect the fact that Harvard's library holdings are among the University's core assets, that the magnitude of those holdings is unique among university libraries anywhere in the world, and that the stewardship of these holdings is of paramount importance. If the pilot is deemed successful, Harvard will explore a long-term program with Google through which the vast majority of the University's library books would be digitized and included in Google's searchable database. Google will bear the direct costs of digitization in the pilot project.

By combining the skills and library collections of Harvard University with the innovative search skills and capacity of Google, a long-term program has the potential to create an important public good. According to Harvard President Lawrence H. Summers, "Harvard has the greatest university library in the world. If this experiment is successful, we have the potential to provide the world's greatest system for dissemination as well."

In addition, there would be special benefits to the Harvard community. Plans call for the eventual development of a link allowing Google users at Harvard to connect directly to the online HOLLIS (Harvard Online Library Information System) catalog (http://holliscatalog.harvard.edu) for information on the location and availability at Harvard of works identified through a Google search. This would merge the search capacity of the Internet with the deep research collections at Harvard into one seamless resource-a development especially important for undergraduates who often see the library and the Internet as alternative and perhaps rival sources of information.

Eventually, Harvard users would benefit from far better access to the 5 million books located at the Harvard Depository (HD). If the University undertakes the long-term program, Harvard users would gain online access to the full text of out-of-copyright books stored at HD. For books still in copyright, Harvard users could gain the ability to search for small snippets of text and, possibly, to view tables of contents. In short, the Harvard student or faculty member would gain some of the advantages of browsing that remote storage of books at HD cannot currently provide.

According to Sidney Verba, Carl H. Pforzheimer University Professor and Director of the University Library, "The possibility of a large-scale digitization of Harvard's library books does not in any way diminish the University's commitment to the collection and preservation of books as physical objects. The digital copy will not be a substitute for the books themselves. We will continue actively to acquire materials in all formats and we will continue to conserve them. In fact, as part of the pilot we are developing criteria for identifying books that are too fragile for digitizing and for selecting them out of the project.

"It is clear," Verba continued, "that the new century presents unparalleled challenges and opportunities to Harvard's libraries. Our pilot program with Google can prove to be a vital and revealing first step in a lengthy and rewarding process that will benefit generations of scholars and others."

This is great! (1)

Goldrush80 (463177) | more than 9 years ago | (#11079574)

This will be sweet,I just hope that we dont get to many authors getting pissed.

Dead authors tell no tales . . . till now (2, Insightful)

dananderson (1880) | more than 9 years ago | (#11079598)

This will be sweet. I just hope that we don't get too many authors getting pissed.

Only public-domain books will be scanned. In all or most cases the author's are dead. However, this will revive a great body of work and widen access to many.

One class of author may be pissed will be authors who take older works and just slap a foreword or introduction to the front and collect royalties. I've seen this done for many histories. But author's of todays works can count on royalties for themselves, their children, and their grandchildren (if the book is still selling). The copyright term is too long in the U.S., but that's another story . . .

Re:This is great! (1)

tepples (727027) | more than 9 years ago | (#11079682)

They're DEAD for cricket's sake. How can they get drunk?

Oxford University gets every UK book published (3, Informative)

aegilops (307943) | more than 9 years ago | (#11079579)


The library of the University of Oxford, i.e. the Bodleian Library [ox.ac.uk] , was the first "copyright" library in the UK - one of only three - which means that it automatically gets a copy of every book published in the UK [ox.ac.uk] .

Aegilops

Re:Oxford University gets every UK book published (4, Informative)

Jon Chatow (25684) | more than 9 years ago | (#11079745)

Actually, they don't automatically get copies. They have the right to get one, but they don't have much space, so they only get copies of publications that they feel like getting. The British Library would be a more interesting one to team up with, as they get a copy of every publication...

Re:Oxford University gets every UK book published (1, Informative)

Anonymous Coward | more than 9 years ago | (#11079746)

No it does not. It has the right to get every book published, but it has to ask for them within a year of publication. Only the British Library gets the books automatically under law.

Just what percentage... (1)

EvilMidnightBomber (778018) | more than 9 years ago | (#11079582)

Google will provide online access to the full text of those works that are in the public domain Just what percentage of the current works are public domain?

All well and good, except (1, Offtopic)

sulli (195030) | more than 9 years ago | (#11079594)

Harvard Sucks [harvardsucks.org]

(they admit it themselves!)

they are just doing what they said they will (1)

rasmajx (809228) | more than 9 years ago | (#11079597)

Google always emphasized what's their purpose. To organize the world's information to be useful and to serve us.

baooooooo

Speaking of education... (1)

quivrnglps (572909) | more than 9 years ago | (#11079600)

As of 9 am on December 14, a FAQ detailing the Harvard pilot program...

Don't you mean an [purdue.edu] FAQ?

Seriously though, I can't help but wonder if projects such as this will help or hurt the overall literacy of the populace. It seems to me that the ability to extract excerpts quickly without having to peruse the context could lead to a less educated society. Some of the most interesting facts I have learned have been things I've accidentally run across in a book while looking for something else.

Don't get me wrong, I fully support the idea of having quick access to any information that might be needed. I am simply speculating that some other steps might need to be taken to ensure that future generations still benefit from the subtleties of knowledge that come from reading a book.

Just a thought.

-Daniel

I may sound like a troll... (0)

Anonymous Coward | more than 9 years ago | (#11079603)

But does it work under Lynx?

F1rst tr011!

"Girls seem to go for sensitive-type guys, so you've always got to act like you're listening to whatever it is they're yapping about, and pretend you give a rat's butt about stupid stuff like flowers and recycling. Oh yeah, be sure to wear plenty of aftershave!" Homer J. Simpson

Do no evil. (4, Funny)

nels_tomlinson (106413) | more than 9 years ago | (#11079613)

Their corporate motto is ``do no evil'', and we've all applauded that, but this is such a great thing that I think we could give them a pass on at least one evil act.

Maybe they could do something really evil to Microsoft, and then we could say: ``Well, you digitized Harvard's library, so we'll let it pass this time.''

Amen (2, Informative)

lavaface (685630) | more than 9 years ago | (#11079627)

It was just a matter of time before a project of this scope got off the ground. I would like to see them team up with Project Gutenberg [gutenberg.org] (and perhaps archive.org [archive.org] ) to provide images of the material. Throw in the little transcoder [xanadu.com.au] and perhaps wikipedia and we will soon have a killer information resource that can be cross-referenced to silly proportions. This is a boon for research. Projects like this and the public library of science [plos.org] will add much to collective knowledge. It would also be nice to see them team up with the newspaper project [neh.gov] ! Next stop--public domain LOC!!!

It's about Time! (2, Interesting)

Shafe (72598) | more than 9 years ago | (#11079698)

I've been emailing them asking them to do this for years. I'm glad someone is finally doing it! There is only one problem: how do they get past copyright violations? I tried to get Cornell to do this on campus, but they said a lot of their volumes (periodicals, in particular) were still under copyright and hence cannot be scanned. No, it doesn't make any sense to leave these carbon books literally fall apart when we can preserve them forever digitally, but that's the name of the game.

Someone hurry up with nanostorage so I can store the entire content of human knowledge on a postage stamp (with nanosecond seek time and gigabyte transfer speeds, of course)

Mailing Lists (2, Interesting)

lousyd (459028) | more than 9 years ago | (#11079712)

Call me mundane, but I want Google to index mailing lists, with a nice interface like their "Groups".

Re:Mailing Lists (1)

FuturePastNow (836765) | more than 9 years ago | (#11079758)

"Call me mundane, but I want Google to index mailing lists, with a nice interface like their "Groups"."

You're a spammer, aren't you?

New York Times article and print.google.com (1)

dixon (34495) | more than 9 years ago | (#11079722)

Better story at the New York Times [nytimes.com] . There's also http://print.google.com and the odd http://www.google.com/print/

second only to the Library of Congress. . . (2, Informative)

Leonig Mig (695104) | more than 9 years ago | (#11079723)

... are you sure , - doesn't it mean (as is so often the case - "within the united states?" what about the British Library? What about the Bodelian at Oxford?

I can't wait (0)

Anonymous Coward | more than 9 years ago | (#11079733)


for Miskatonic University's library to get the same treatment, mwuhahahaha.....

U of Michigan (4, Informative)

truesaer (135079) | more than 9 years ago | (#11079736)

It looks like the largest portion of this will be 7 million items from the University of Michigan (compared to only 40,000 from Harvard). Good article [freep.com] from the Detroit Free Press.

Berkeley? Yale? (1)

tavilach (715455) | more than 9 years ago | (#11079767)

Stanford only has 6,865,158 books, and the University of Michigan only has 6,973,162. What about schools like Berkeley and Yale?
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?