Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

25000 Books Proofread By Project Gutenberg Distributed Proofreaders

Unknown Lamer posted about a year and a half ago | from the get-your-free-knowledge dept.

Books 29

New submitter fritsd writes "Project Gutenberg Distributed Proofreaders, a volunteer site which helps provide public domain books to Project Gutenberg, announced that their 100 000+ volunteers have reached the milestone of 25 000 books scanned, OCRed, and then meticulously proofread." The 25000th title is The Art and Practice of Silver Printing by Capt. Abney and H. P. Robinson.

Sorry! There are no comments related to the filter you selected.

meticulously proofread (1)

slashmydots (2189826) | about a year and a half ago | (#43413285)

If I'm not mistaken, they mean meticulously proofred by us in reCAPTCHAs. I think that was the organization that got into that a little.

Re:meticulously proofread (1)

buchner.johannes (1139593) | about a year and a half ago | (#43413371)

No, this is more similar to GalaxyZoo approach, showing a page at a time and letting the proof-reader compare OCR and image side-by-side. See first link.

The more interesting question is, will this serve as a test data set to improve OCRs?

Re:meticulously proofread (3, Insightful)

cruff (171569) | about a year and a half ago | (#43413469)

If I'm not mistaken, they mean meticulously proofred by us in reCAPTCHAs.

When I was proofreading on DP, all rounds of proofreading involved examining the scanned images and comparing it to the OCR text and making corrections. The later rounds of proofreading involved increasing attention to various details of correctness and formatting. All of this was done directly in the DP web interface. I didn't see any mention of the use of captchas in the OCR process.

Re:meticulously proofread (3, Informative)

Halotron1 (1604209) | about a year and a half ago | (#43413681)

Yep, multiple rounds, and multiple levels of proofers and formatters
who have to earn the right to access those higher rounds
by completing hundreds of pages and passing a few tests.

Re:meticulously proofread (5, Informative)

butalearner (1235200) | about a year and a half ago | (#43414959)

I signed up and proofread a few pages when I saw someone mention this site in the comments a few weeks ago. It's pretty interesting stuff and is mostly intuitive, but there are some tricky corner cases, e.g. hyphenated words that span two lines. Back in the day, publishers were pretty inconsistent about what words were hyphenated (e.g. to-day), and Project Gutenberg is (rightly) adamant that the text maintains the original spelling and hyphenation.

The only thing I completely missed was that I didn't put an extra newline at the top of the page when the first line was the start of a new paragraph. Those instances were found and corrected by the second-round proofreader. There is a third round of proofing, two rounds of formatting, two rounds of post processing, and then an optional "Smooth Reading" round that anyone can do. I've checked out a few of the finished products, and they are much, much better than the naked OCR'd texts of old.

Re:meticulously proofread (3, Informative)

mspohr (589790) | about a year and a half ago | (#43416911)

I have read quite a few of their books and have found them all to be high quality edits.
I would like to thank everyone who has worked on the project for the excellent job they are doing.

(In contrast, I recently purchased a Kindle copy of Paul Theroux's The Happy Isles of Oceania which is about 20 years old and they obviously produced the electronic copy by OCR and from the looks of it did little or no proofreading. There were obvious typos on every page. It's irritating that a publisher who actually get's paid to do this work can't be bothered to do even cursory proofreading.)

Makes you appreciate the fine work the Gutenberg people are doing.

Re:meticulously proofread (1)

wbr1 (2538558) | about a year and a half ago | (#43413513)

Who cares. Captcha performas a needed service. I have seceral domains that would be over run by spambots if not for it. If it performs a secondary service, great.

Re:meticulously proofread (1)

fatgraham (307614) | about a year and a half ago | (#43413525)

Oh surely that's a genius CAPTCHA system!
"Correct this flagged-as-wrong OCR text". OCR-bots would surely get it wrong, and the humans would contribute to the greater good!

Of course this assumes the humans can spell and will do correct corrections. MayB Nt th3n!!11!.

Re:meticulously proofread (0)

Anonymous Coward | about a year and a half ago | (#43415203)

So, are you critizising the existing, widely used recaptcha system? See the recaptcha description [google.com] .

Thanks! (3, Informative)

Tim the Gecko (745081) | about a year and a half ago | (#43413309)

Many thanks to Project Gutenberg and their volunteers. There is a lot of great public domain material out there, and I've especially enjoyed Dickens, Wilkie Collins and Trollope. Also Jules Verne's work is pretty good for French learners.

whose paying these guys? (1)

Thud457 (234763) | about a year and a half ago | (#43413531)

proof that socialism is a failure.

Re:whose paying these guys? (1)

Anonymous Coward | about a year and a half ago | (#43413583)

proof that socialism is a failure.

Volunteering || Charity != Socialism.

Re:whose paying these guys? (3, Insightful)

chipschap (1444407) | about a year and a half ago | (#43414021)

proof that socialism is a failure.

Proof that people can in fact be decent, generous, and caring.

Re:whose paying these guys? (3, Interesting)

Kjella (173770) | about a year and a half ago | (#43414267)

Proof that people can in fact be decent, generous, and caring.

Or bored. many years ago I had this temp job of staffing the front desk, really quite little traffic and the occasional call, collecting the mail and various other small duties but a lot of downtime and no interest in training me for more since it was a rather short contract. Project Gutenberg seemed like a good way to pass the time, and they were cool with it as long as I tended to my other duties when they needed tending. Seem like a better use of my time than playing solitaire.

Re:whose paying these guys? (1, Troll)

GLMDesigns (2044134) | about a year and a half ago | (#43414229)

Socialism is top-down government officials commanding individuals to obey the collective. Many socialists desire that people belong to the state.

Melissa Harris-Perry ... professor at Tulane, has endorsed the concept of human ownership by the state ... saying in a promo for MSNBC that "we have to break through our kind of private idea that kids belong to their parents or kids belong to their families and recognize that kids belong to whole communities." http://news.investors.com/ibd-editorials/040913-651296-msnbc-host-says-children-belong-to-k [investors.com]

All activities that people, on their own, join and participate in is part of the free-market. Yes, that means food co-ops where everyone hates capitalism are part and parcel of the free-market.

Re:whose paying these guys? (1)

GLMDesigns (2044134) | about a year and a half ago | (#43420949)

To the person who moderated this post as "troll" Any particular reason why you so marked it that way? I directly answered the OPs post. Or, is it that your moderation is based on how you feel about the opinion stated?

Re:whose paying these guys? (1)

Anonymous Coward | about a year and a half ago | (#43414329)

Proof that you are a moron: you don't know the difference between 'whose' and 'who is'.
Or what socialism is.

Re:whose paying these guys? (2)

LeadSongDog (1120683) | about a year and a half ago | (#43414829)

Proof that you are a moron: you don't know the difference between 'whose' and 'who is'...

That's why he's pissed at PD. They didn't like his work product.

Re:whose paying these guys? (0)

Anonymous Coward | about a year and a half ago | (#43416221)

Volunteer work != socialism
For this to be socialism, the government would have to force its citizens to contribute their time and energy to the project.
As an example, this correction of your erroneous statement is not socialism, even though you didn't pay for it.

Re:Thanks! (2)

Anonymous Coward | about a year and a half ago | (#43414499)

Jules Verne's work is awsome. I'm reading it now and learning french. And for you who are also learning, there are some good free audio books out there, e.g. http://www.litteratureaudio.com/

Re:Thanks! (1)

Tim the Gecko (745081) | about a year and a half ago | (#43414727)

Jules Verne's work is awsome. I'm reading it now and learning french. And for you who are also learning, there are some good free audio books out there, e.g. http://www.litteratureaudio.com/ [litteratureaudio.com]

Also librivox.org has some good French content. "Ezwa" has a great reading voice and does many of the chapters in this book - http://librivox.org/le-tour-du-monde-en-quatre-vingts-jours-by-jules-verne/ [librivox.org]

Hitting the Sonny Bono wall (2)

tepples (727027) | about a year and a half ago | (#43415407)

There is a lot of great public domain material out there

So what happens once Project Gutenberg has finished releasing all notable books in the English language that were first published on or before 1922?

Re:Hitting the Sonny Bono wall (2)

slash.dt (701002) | about a year and a half ago | (#43419409)

So what happens once Project Gutenberg has finished releasing all notable books in the English language that were first published on or before 1922?

Since lots of things are in the public domain in other countries that are not in the PD in the US, maybe there could be a Project Gutenberg.uk ?

Re:Thanks! (2)

Livius (318358) | about a year and a half ago | (#43417567)

It's a fantastic contribution to human intellectual heritage. Once in digital form, it will be easy to make copies and ensure a high degree of redundancy so that this knowledge and culture will not be lost even if civilization suffers a setback.

Michael Hart (2)

ShanghaiBill (739463) | about a year and a half ago | (#43417889)

Many thanks to Project Gutenberg and their volunteers.

Also many thanks to Michael Hart [wikipedia.org] , the founder, heart, and soul of Project Gutenberg. Michael passed away in 2011. Although I never met him face-to-face, we exchanged many emails, and even spoke on the phone a few times. He was a generous and selfless man, and somewhat eccentric (but in a good way). We love you Michael, and we miss you. You made the world a better place.

Slashdot proofreading (0)

Anonymous Coward | about a year and a half ago | (#43413517)

Wisely, UnknownLamer kept his summary to a minimum to avoid the inevitable jokes about the poor quality of writing by Slashdot editors.

I'm glad he's doing someting with his time, (2, Funny)

Anonymous Coward | about a year and a half ago | (#43413537)

I'm glad Mr. Guttenberg [imdb.com] is doing something with his time and money with such a noble project as this. I guess it makes up for the Police Academy movies [imdb.com] he did.

Now if I only read books.

Re:I'm glad he's doing someting with his time, (1)

GLMDesigns (2044134) | about a year and a half ago | (#43414239)

not bad :-)

Solution to their problems. (0)

Anonymous Coward | about a year and a half ago | (#43432151)

why don't they have something like this:

When they scan pages and some are not lined out, they should make a bigger scan area where everything is scanned out.
then the scanned pages go to the computere where an auto eraser, erases all the lines that are out of the a-4 pages.
then the auto eraser centers all the left over page filled with words and pictures, and paste them all behind each other.

so then they can do complete books after each other without someone ever has to look back if it went right or wrong.

people are stupid, I hate this place.

- Anoymous Coward.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?