Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Counting the World's Books

Soulskill posted about 4 years ago | from the not-goin'-anywhere-for-a-while? dept.

Books 109

The Google Books blog has an explanation of how they attempt to answer a difficult but commonly asked question: how many different books are there? Various cataloging systems are fraught with duplicates and input errors, and only encompass a fraction of the total distinct titles. They also vary widely by region, and they haven't been around nearly as long as humanity has been writing books. "When evaluating record similarity, not all attributes are created equal. For example, when two records contain the same ISBN this is a very strong (but not absolute) signal that they describe the same book, but if they contain different ISBNs, then they definitely describe different books. We trust OCLC and LCCN number similarity slightly less, both because of the inconsistencies noted above and because these numbers do not have checksums, so catalogers have a tendency to mistype them." After refining the data as much as they could, they estimated there are 129,864,880 different books in the world.

cancel ×

109 comments

Sorry! There are no comments related to the filter you selected.

NOSE (0)

Anonymous Coward | about 4 years ago | (#33164690)

Dont inclue the copyrighted books OPEN SOURCESSDSDV

I propose a new filesystem (0)

TrisexualPuppy (976893) | about 4 years ago | (#33164962)

In order to count and house all the world's books, we, of course, are going to need a new filesystem. I propose to call it TSPFS. The fundamental unit of the said filesystem is a BLoC, representing 115M books. And of course, 640K BLoCs should be enough for anyone...

Re:I propose a new filesystem (0)

Anonymous Coward | about 4 years ago | (#33165008)

ahem [slashdot.org] if u dont get it

Mod parent up (0)

Anonymous Coward | about 4 years ago | (#33165250)

-=][Interesting][=-

Re:I propose a new filesystem (0)

Anonymous Coward | about 4 years ago | (#33166076)

given that a BLoC is a Burning Library of Congress, which from the link converts to about 4 petajoules, I'm confused as to why your units of measure are units of energy.

I'm not being critical, I just mean, how does a file system store data as energy? Is it potential energy?

How do you define "different book"? (3, Interesting)

jonnythan (79727) | about 4 years ago | (#33164698)

Look at textbooks - new editions that are almost indistinguishable from the previous editions have new ISBNs. Do we count every single one as a different book?

Re:How do you define "different book"? (1)

bluefoxlucid (723572) | about 4 years ago | (#33164730)

Same thing with any other book. Second editions and republishings (the Del Rey version versus the Pyr version, etc) with the same exact text unedited; multiple publishers of public domain works; etc.

Re:How do you define "different book"? (1)

jd (1658) | about 4 years ago | (#33167650)

Also hardback vs. paperback, publishing in different regions as a distinct book, etc. Maybe ISBNs could be extended so that it encodes all these different fields in additional digits so that there is a component that is unique to a specific book (regardless of edition, publisher, etc), extra information that uniquely* identifies which specific edition/version/variant of the book it is and then yet more information that uniquely identifies which publisher circulated that book.

*A SHA-2 or SHA-3 hash of the book's contents + cover + publusher would probably be close enough to unique, given that it's rare that editions run into sufficient numbers that a collision is even remotely likely, and would avoid any arguments over which publisher had what number or how to identify which version was what - especially for older books where there may be no unique way to determine this or the information simply no longer exists. A hash will always work.

Re:How do you define "different book"? (1)

icebike (68054) | about 4 years ago | (#33168082)

And every goddamed one of them is scanned by google, foisted by Barnes and Noble and Amazon and everybody else as a separate book.

I once counted twenty different versions of the same popular (copyright lapsed) classic, all scanned by Google, many from the exact same edition found in various libraries. Some horrible, some quite readable.

I'm not sure anything is served by having both the 1902 and the 1903 versions of any popular fiction available in ebook form. Any serious researcher would search out the physical books and not rely on a scan anyway.

Re:How do you define "different book"? (0)

Anonymous Coward | about 4 years ago | (#33168742)

Oh, YOU aren't sure if anything is served, so it's bad? Thank goodness we have you here to police technological advances for us. How about, instead, we get all the data it's possible to get and the people who want to use it can, and the people who don't, they don't have to?

The only real issue here is being able to distinguish between things which is solved by proper metadata. After that, it's a simple exercise in filtering.

So ~200TB = "All The Books" (1)

billstewart (78916) | about 4 years ago | (#33168890)

A typical book is in the range of 1-2MB of text, assuming you're representing actual letters, as opposed to scanned images of the text, and ignoring illustrations, pictures, etc. So if there are about 130 million books, that's about 200TB to store them uncompressed, maybe 50TB compressed. If you've got multiple versions that are almost identical (e.g. Third Printing from Paperback Publisher B has a different copyright page than First printing from Hardback Publisher A, and maybe a different cover page illustration and blurbs on the back cover), then the different versions add a percent or two.)

As correlation, Wikipedia says the Library of Congress has about 20 million books (in a collection of 100 million things), and The InterWebs say that the Library of Congress is about 20TB (not clear if that's just books or not.) So that says 130 million books would be about 130TB uncompressed; it fits on the back of the same envelope.

So for about $5000 of computer equipment, your town or school could have its own copy of The Library, with All The Books.
So far, The Internet Archive [archive.org] has digitized about a million books - you could probably fit that onto 1-2 BlueRay disks.

Re:How do you define "different version"? (1)

blair1q (305137) | about 4 years ago | (#33164884)

I can tell this topic is going to be dominated by people who never had to deal with the internals of a revision-control system, much less a configuration-management system, because the issues are somewhat trivial once you get past your fear of the variables.

Re:How do you define "different version"? (1)

natehoy (1608657) | about 4 years ago | (#33164900)

Also by people who have never read the article, where it explains in some significant detail how they try to determine what constitutes "a book" for the purposes of their counting.

Re:How do you define "different version"? (2, Funny)

dgatwood (11270) | about 4 years ago | (#33166370)

You read the article?

Impostor! Burn the witch!

Re:How do you define "different version"? (3, Informative)

Smauler (915644) | about 4 years ago | (#33167046)

Look at textbooks - new editions that are almost indistinguishable from the previous editions have new ISBNs. Do we count every single one as a different book?

From TFS : if they contain different ISBNs, then they definitely describe different books

If they're using this method, GP's point is valid. The books are not really new books, they're essentially the same as previous editions but have different ISBNs. In essence, these new editions with new ISBNs are being counted twice (or more) for very small revisions to the same book.

Re:How do you define "different version"? (2, Informative)

natehoy (1608657) | about 4 years ago | (#33167310)

From TFA: Well, it all depends on what exactly you mean by a “book.” We’re not going to count what library scientists call “works,” those elusive "distinct intellectual or artistic creations.” It makes sense to consider all editions of “Hamlet” separately, as we would like to distinguish between -- and scan -- books containing, for example, different forewords and commentaries. (emphasis mine)

For Google's definition of what constitutes a unique work as used to derive the stated quantity, the use of ISBN as described is perfectly valid. They are OK with "almost the same work" != "the same work".

So their counting methodology would consider "Fundamentals of Math 3rd Ed by I. M. Counting" to be a distinct work from "Fundamentals of Math 4th Ed by I. M. Counting".

In fact, if the publisher released a paperback version, it would be considered another separate work, because the typesetting and page layouts may differ, and might include different forewords, different pages on the index, etc.

It's a separate and distinct work, from Google's point of view, where they are trying to index the works that they want to scan.

Remember, their goal is to capture as much as possible of the entire sum of human writing. A different foreword is a unique work to them.

Of course, you can then disagree with Google's counting methodology, which is fine. If you do, then the number they have reached for their purposes is meaningless to you and you'd better start counting based on your own definition.

It'll take a while, good luck, and let us know what you come up with. :)

Re:How do you define "different version"? (1)

Smauler (915644) | about 4 years ago | (#33170302)

I was not proposing a new method of counting books... I was only supporting the OP in his assertion that their method contains limitations regarding repetition of works with minor differences.

I was mainly responding to those who just said RTFA without seeing basic facts in TFS.

Re:How do you define "different book"? (0)

Anonymous Coward | about 4 years ago | (#33165088)

Different language and different markets often have different isbns as well.

Re:How do you define "different book"? (0)

Anonymous Coward | about 4 years ago | (#33165138)

Does it contain the exact same information? If not I would think it's a different book.
If one has pictures and the other doesn't, to me that is a different book. Even if one sentence or word was changed i would consider that a revision of some sort and a totally different book because it contains different information. One word can have a major impact on reading just compare the 1000's of "Bibles", i'd consider them all different books.

Re:How do you define "different book"? (1)

jd (1658) | about 4 years ago | (#33167744)

Again, this is why I'd like to see additional information encoded in an extension to the book's ISBN number, such as a hash of the contents. Regardless of what the extension is, the split should permit you to identify "works that descend directly from a single work" plus "works that differ in content" (regardless of what they descend from). Then there would be no problem. You would be able to extract the level of information you wanted and no information would risk getting lost because such-and-such a group didn't think it important.

Re:How do you define "different book"? (1)

Suki I (1546431) | about 4 years ago | (#33165570)

With the advent of self-publishing and individuals purchasing their own ISBN blocks, the possibility of different works getting the same ISBN increases greatly. Especially when they are not using a distribution service like Amazon that *might* check to see if that ISBN is already in use.

Re:How do you define "different book"? (1)

Jeng (926980) | about 4 years ago | (#33165784)

Also, if a publisher purchases a title from another publisher it gets a new ISBN with the new publisher even though it is the same book.

Re:How do you define "different book"? (2, Informative)

gpf2 (1609755) | about 4 years ago | (#33165994)

What about translations? What about bootlegged copies from the 18th century? What about languages that have no direct concept of "editon?" The International Federation of Library Associations and Institutions (IFLAuhas been wrestling with this for a while. Their solution -- of sorts: Functional Requirements of Bibliographic Records (FRBR). http://www.ifla.org/en/publications/functional-requirements-for-bibliographic-records [ifla.org] Pretty dense and not consistently adopted.

Re:How do you define "different book"? (1)

natehoy (1608657) | about 4 years ago | (#33167436)

It's a one-page article, and contains a really good explanation of what they mean by a book for the purposes of their counting, and why.

The following sentence from the article really which cuts straight to the heart of their concept of uniqueness:

It makes sense to consider all editions of “Hamlet” separately, as we would like to distinguish between -- and scan -- books containing, for example, different forewords and commentaries.

So, yes, if they scan textbooks they'll scan all versions they can get, and treat them as separate works.

Re:How do you define "different book"? (1)

pilgrim23 (716938) | about 4 years ago | (#33168594)

In the 1480s a edition of Dante's Divine Comedy was printed in Venice. In 1481 another was printed in Florence. Each is the exact same text barring printer mistakes and if you are lucky enough to have the Florence one which includes the plates; illustrations. Each is also an absolute work of art in its own right and distinct from the other. Should these be recorded as one book or two?

Re:How do you define "different book"? (0)

Anonymous Coward | about 4 years ago | (#33170968)

You mean they're exactly the same but with different problem sets...

What about self published works? (0)

Anonymous Coward | about 4 years ago | (#33164770)

And what about self published books? They wouldn't have an ISBN unless they became wildly successful and then maybe not even then.

Re:What about self published works? (1)

insertwackynamehere (891357) | about 4 years ago | (#33164822)

That's not true. Getting an ISBN isn't hard and self publishing companies will generally assign you one as part of the deal.

Re:What about self published works? (1)

Suki I (1546431) | about 4 years ago | (#33165654)

That's not true. Getting an ISBN isn't hard and self publishing companies will generally assign you one as part of the deal.

Amazon's Kindle, for example, will assign you an ISBN. However, if you bought your own ISBNs you can use them too. You are supposed to assign a different one to the eBook, paperback, audio and hardback. However, if you use the same one for all there are not many checks to stop you if you are using multiple services.

Re:What about self published works? (1)

dgatwood (11270) | about 4 years ago | (#33166648)

That's not true. Getting an ISBN isn't hard and self publishing companies will generally assign you one as part of the deal.

Depends on the size of the publishing house and the expected sales volume. If you're selling through a major bookstore chain, yeah, you're going to have an ISBN. For an independent author selling a few hundred copies of a book on the history of Three Way [google.com] in a local bookstore, you probably won't have an ISBN---particularly if the book printing and binding was done at the Kinko's in Jackson. The single ISBN would cost as much as you'd make on the whole book.

ISBN numbers are very much geared towards large volume commercial publishing. The system grudgingly handles smaller publishing to a point, but beyond that point, a lot of stuff falls through the cracks.

Re:What about self published works? (1)

Itninja (937614) | about 4 years ago | (#33165462)

Indeed. QOOP, Blog2Print, etc have alone printed thousands (if not tens of thousands) of different books for individuals; who then publish their own work. Maybe those don't "count".

8 or 9-place estimate (2, Insightful)

Anonymous Coward | about 4 years ago | (#33164782)

estimate would be about 130 million, not 129,864,880

Re:8 or 9-place estimate (2, Insightful)

SomeJoel (1061138) | about 4 years ago | (#33164862)

But 130 million can't possibly be right! We better assign some false precision to make our estimate believable. Significant digits are for science teachers and marriage counselors!

Re:8 or 9-place estimate (1)

langelgjm (860756) | about 4 years ago | (#33165774)

Significant digits are for science teachers and marriage counselors!

Ok, what am I missing here?

Re:8 or 9-place estimate (0)

Anonymous Coward | about 4 years ago | (#33166386)

Penis...the significant digit.

Re:8 or 9-place estimate (2, Funny)

dgatwood (11270) | about 4 years ago | (#33166672)

Ring finger, presumably.

Re:8 or 9-place estimate (1)

aynoknman (1071612) | about 4 years ago | (#33167922)

But 130 million can't possibly be right! We better assign some false precision to make our estimate believable. Significant digits are for science teachers and marriage counselors!

Why stop at 8 or 9? 18 is much better and just as meaningful: 129,864,880.461938427

Re:8 or 9-place estimate (0)

Anonymous Coward | about 4 years ago | (#33169134)

I suspect that what the summary meant was "we identified 129,864,880 unique books".

Whew....almost done! (1, Funny)

SQLGuru (980662) | about 4 years ago | (#33164788)

I'm almost done reading them all!

Re:Whew....almost done! (1)

Capt.DrumkenBum (1173011) | about 4 years ago | (#33165312)

Damn, I need to spend a lot more time reading.

Re:Whew....almost done! (0)

Anonymous Coward | about 4 years ago | (#33168836)

Lucky we have copyright to promote the creation of more books for you.

Re:Whew....almost done! (1)

XSpud (801834) | about 4 years ago | (#33171148)

I'm almost done reading them all!

That's my next challenge - once I've finished reading the web.

Re:Whew....almost done! (1)

SQLGuru (980662) | about 4 years ago | (#33171528)

I can just ruin the ending for you....
http://www.wwwdotcom.com/ [wwwdotcom.com]

Foreign books? (0)

Anonymous Coward | about 4 years ago | (#33164790)

how about all the books printed in china, the rest of asia, middle east etc that don't have ISBN's?

Re:Foreign books? (1)

jeffmeden (135043) | about 4 years ago | (#33164886)

If they don't have the will to obtain an International Standard Book Number for their Internationally published book, then why bother counting it at all? After all, I wrote a book in first grade, consisting of 16 pages of poorly drawn pictures and brutal (if accurate) grammar... Should this be counted too?

Re:Foreign books? (0)

Anonymous Coward | about 4 years ago | (#33165082)

After all, I wrote a book in first grade, consisting of 16 pages of poorly drawn pictures and brutal (if accurate) grammar... Should this be counted too?

Absolutely!

Your book could still be a boon to anthropologists studying pre Mayan calendar destruction societies.

Re:Foreign books? (1)

cablepuller (1683824) | about 4 years ago | (#33168980)

I'd like to look at that. I have seen many unique Sketchbooks. Given that all humans who are able to write, sooner or later scribble into their calendars, etc., I would estimate roughly: there are as many written pages as there were produced sheets of paper (minus the amount of drawings, test-pages, and official documents). Go get your ISBN, your first grade book may be the missing puzzle-piece in the evolution of mankind ;)

Stupid estimate (1)

xemc (530300) | about 4 years ago | (#33164966)

That's a stupid estimate. Since they admitted there is so much uncertainty, they should have just said 130 million. (Or better, 0.13 billion to retain the significant digits)

Re:Stupid estimate (1)

hvm2hvm (1208954) | about 4 years ago | (#33165740)

no, 1.3E+8

Re:Stupid estimate (1)

Flea of Pain (1577213) | about 4 years ago | (#33166400)

How many Libraries of Congress is that?

Re:Stupid estimate (1)

maxwell demon (590494) | about 4 years ago | (#33166158)

0.13 Gigabooks.

adasd (0)

Anonymous Coward | about 4 years ago | (#33165118)

http://rlslog.in/wallpapers/3909-widescreen_40.html

That's an ESTIMATE? (3, Interesting)

wealthychef (584778) | about 4 years ago | (#33165192)

I'm very suspicious about their numerical precision. IF it's an estimate, then they are saying it's 129,864,880 +/- 10. That is, they are pretty sure there aren't 129,864,980 books. I think they should make their estimate something like "we think there are about 130,000,000" or whatever accuracy they actually believe.

Re:That's an ESTIMATE? (1)

NixieBunny (859050) | about 4 years ago | (#33165252)

For sure. Even gravity can't be specified to that many significant digits, and it's a bit more knowable than the number of books in the world.

Re:That's an ESTIMATE? (1)

Caledfwlch (1434813) | about 4 years ago | (#33165394)

Also, what is the date and time of this estimate? How many books are published a day around the world?

Re:That's an ESTIMATE? (1)

demonbug (309515) | about 4 years ago | (#33165520)

If you RTFA (blasphemy, I'm sure), Google doesn't say that 129,864,880 is an estimate - they say that is the number of books, total (at least until Sunday).

The only estimate is mentioned is "16 million bound serial and government document volumes".

Surprise surprise, subby is the culprit that turned such an exact number into an "estimate".

Re:That's an ESTIMATE? (1)

Lunix Nutcase (1092239) | about 4 years ago | (#33165758)

That's the point. There is no way in hell that their accuracy is that great.

Re:That's an ESTIMATE? (1)

wealthychef (584778) | about 4 years ago | (#33166394)

Google might not say in TFA, but the number they came up with includes approximations and estimates. The precision is not as given.

Re:That's an ESTIMATE? (1)

city (1189205) | about 4 years ago | (#33165688)

I'm suspicious about the accuracy of numbers in general, I use 'some' for a few things and 'many' for more. I estimate there are many books in the world.

Re:That's an ESTIMATE? (0)

Anonymous Coward | about 4 years ago | (#33167162)

One, two, some, many... somemanysometwomanymanysome?

Re:That's an ESTIMATE? (1)

MattskEE (925706) | about 4 years ago | (#33168440)

You lose accuracy by representing error bounds simply by the significant digits of the number. It is convention-dependent that the last sig fig is assumed to be +/- 1 (zero being assumed non-significant unless followed by a decimal point, unless the zero is already after a decimal point). That's what I remember from high school chem. And it's a convention that makes sense for, say, reading a temperature off of a thermometer. You don't know if the actual value was rounded up or down to give the instrument readout. Of course assuming that a thermometer which has 0.1 degree celsius resolution also has accuracy to 0.1 degree is a not necessarily valid, but that's another topic.

But this is an estimate of the number of books, there is no instrument being read here. This is simply their estimate with error bounds that are obviously much greater than the last significant digit. If they had said 130. million then the value would be assumed to be between 129 milling and 131 million, based on the aforementioned convention which some people use. But if their error bound is +/- 1 million for a given percent certainty then they are more accurate by saying 129,864,880 +/- 1 million than by stating 130. million, even though the two are very close.

By doing a detailed analysis of how accurately their algorithm determines the number of records based on a random sampling of records they could perhaps come up with a way to determine their error bounds. But such an analysis would probably take a great deal of effort, and I think that they just want to give us their best guess at this time.

As a Data Collector... (0)

Anonymous Coward | about 4 years ago | (#33165406)

> Various cataloging systems are fraught with duplicates and input errors, and only encompass a fraction of the total distinct titles.

You callin' me a liar?

Wow (2, Insightful)

demonbug (309515) | about 4 years ago | (#33165438)

They should write a book!

Re:Wow (1)

Suki I (1546431) | about 4 years ago | (#33165690)

I became your fan for that :)

I considered that, but there's a problem... (1)

N0Man74 (1620447) | about 4 years ago | (#33170780)

So if I wrote a book about this, should I call it "The 129,864,880 Books That You Must Read Before You Die", or "The 129,864,881 Books That You Must Read Before You Die"?

Seriously... (1)

clickclickdrone (964164) | about 4 years ago | (#33165444)

Who cares? Does it matter?

Re:Seriously... (3, Insightful)

SomeJoel (1061138) | about 4 years ago | (#33165504)

Who cares? Does it matter?

Does anything?

Re:Seriously... (1)

Flea of Pain (1577213) | about 4 years ago | (#33166446)

Mod parent up...+1 emo.

Re:Seriously... (1)

Smauler (915644) | about 4 years ago | (#33167408)

No... don't be an asshole, GP won't like that. Mod GP down -1 Emo.

Helluva encryption ratio (0)

Anonymous Coward | about 4 years ago | (#33165448)

OK so now we can represent every text in the world with a 32 bit key. We just need the world's fanciest decryption algorithm to recover the texts...

Re:Helluva encryption ratio (1)

dgatwood (11270) | about 4 years ago | (#33166722)

Ooh. I've got it. We'll call it the Library of Congress crypto scheme. We could use it for encrypting other stuff, too. Any arbitrary word could be encoded as an LOC identifier, a page number ,and an offset in bytes or words. Man, wouldn't that suck to decrypt?

Re:Helluva encryption ratio (0)

Anonymous Coward | about 4 years ago | (#33169364)

Yeah, hmm..... evil grin

1 in 50 people wrote a book (1)

doconnor (134648) | about 4 years ago | (#33165478)

If you divide the number of books by the current world population, you get that there are one unique books for every 50 people, or on average one in 50 people wrote a book, including many poor, illiterate and children.

Of course, some book writers have died and many have written more the one book, but I suspect that most books have been written recently and their writers are still alive.

If you only include adults who live a comfortable western lifestyle, it may be as maybe as high as one in 10.

Re:1 in 50 people wrote a book (2, Insightful)

SomeJoel (1061138) | about 4 years ago | (#33165546)

I suspect that most books have been written recently and their writers are still alive.

And I suspect that you are full of crap.

Re:1 in 50 people wrote a book (1)

longhairedgnome (610579) | about 4 years ago | (#33166188)

I wish I had mod points for you sir.

Re:1 in 50 people wrote a book (1)

doconnor (134648) | about 4 years ago | (#33166378)

90% of all scientists who ever lived are alive today and many of the books have been written by scientists.

While the percentage may not be has high for all authors, but I think it would be close.

Re:1 in 50 people wrote a book (1)

SlippyToad (240532) | about 4 years ago | (#33167018)

Given the enormous explosion in literacy and printing press technology over the last 100 years, I would say he's probably closer than you think. Also, it's estimated that human knowledge doubles every 7 years -- that would mean a doubling of the number of things written down or published.

What would resolve this is to discover how many books existed 100 years ago, and 50 years ago.

Re:1 in 50 people wrote a book (1)

Smauler (915644) | about 4 years ago | (#33167552)

A suprisingly large proportion of the humans who ever have lived are actually alive now (most people estimate it about 10%). It is _way_ easier now to publish a book than it was even 100 years ago.

I'm not saying you're wrong about GP's assumptions made, but personally I'd guess he's right. That's just a guess though ;).

Re:1 in 50 people wrote a book (1)

maxwell demon (590494) | about 4 years ago | (#33166486)

I suspect that most books have been written recently and their writers are still alive.

Indeed, just yesterday I met Shakespeare. He was talking with Lewis Caroll and Douglas Adams. Unfortunately I couldn't talk to them, because Plato was just coming around the corner, arguing with Aristoteles and Kant about some philosophical problem, and I would have been in their way. On the other side of the room, Mao was arguing with the evangelists about who has written the better Bible. Karl Marx didn't help Mao, because he was too busy talking to Adam Smith about whether the invisible hand was good or evil. Dante and Kafka were talking about if the hell was absurd, while Agatha Christie was arguing with Arthur Conan Doyle and Edgar Allan Poe about how to write good criminal stories.

Re:1 in 50 people wrote a book (1)

doconnor (134648) | about 4 years ago | (#33166836)

Faulty generalization [wikipedia.org]

Re:1 in 50 people wrote a book (1)

mattack2 (1165421) | about 4 years ago | (#33170858)

So you're dead, and talking to us from Riverworld, right?

Re:1 in 50 people wrote a book (1)

mcgrew (92797) | about 4 years ago | (#33166490)

Isaac Asimov wrote over 500 books. I don't know know haw many Terry Pratchett has written but the number is in the dozens. There's Clarke, Heinlein, Niven... and those are just a few science fiction writers (yes, Asimov also wrote nonfiction and Pratchett is known mainly for fantasy). Serious authors write more than one book each.

So your average is a little meaningless.

Re:1 in 50 people wrote a book (3, Informative)

pz (113803) | about 4 years ago | (#33167772)

Isaac Asimov wrote over 500 books. I don't know know haw many Terry Pratchett has written but the number is in the dozens. There's Clarke, Heinlein, Niven... and those are just a few science fiction writers (yes, Asimov also wrote nonfiction and Pratchett is known mainly for fantasy). Serious authors write more than one book each.

So your average is a little meaningless.

No, averages are very meaningful. Extremely meaningful. They are the AVERAGE (usually the mean), which means that some values will be above, and some values will be below. The idiocy comes in when people mistakenly jump to the conclusion that just because an average exists, it means that every value must be exactly the same as the average. Or, just because you can find extreme values far away from the average that again the average is not meaningful.

If the average states that 1 in 50 people have written a book, then, by gum, it will be easy to find plenty of people who have written zero books, somewhat fewer who have written exactly one (something below 1 in 50), much fewer who have written exactly two, even fewer who have written exactly three, etc. That does not mean that example authors with hundreds of books cannot exist, it only bounds how frequent they can be.

Of the myriad of ideas that the academic community has utterly failed in educating the general public about, it's the relationship between averages and distributions. One more time: just because an average exists, it does not mean that every datum has the same value as the average. As an example, just because the average male in the US is 5' 9", it does not mean that every single male is that tall, nor that you will not find ones that are shorter, taller, or even much shorter or much taller. The tallest man (according to my 20 seconds of research through Google) was 8' 11", and the shortest was 1' 10" ... does that lessen the meaningfulness or utility of the average male height? Rather the contrary: it provides important information as to the extent of the distribution of heights.

Now, I suspect that the parent poster is trying to say that because -- by loosely founded speculation -- most authors are professional authors ("serious authors") and therefore will have more than one book to their name, the classification of people into authors and non-authors will be skewed against 1:50. I would not argue against that (in fact, I indirectly argued for it above). Nevertheless, using the utterly non-scientific sample of the books above my desk, most authors have only one book to their name, so the number isn't going to be much worse than 1:50, perhaps 1:55 or 1:60. That kind of pure, unadulterated speculation is exactly the sort I would love to see proved wrong with hard data.

Units (0)

Anonymous Coward | about 4 years ago | (#33165592)

I'm not sure I follow.... How much is that in Libraries of Congress?

Old News (1)

rssrss (686344) | about 4 years ago | (#33165598)

Qoh.12 [12] ... Of making many books there is no end,

Ph D Thesis ? (0)

Anonymous Coward | about 4 years ago | (#33165814)

Do Ph. D. thesis manuscripts (and other academic writings) count as books ? If so, I bet there's much more than "only" 130e6...

They could just use (1)

kilodelta (843627) | about 4 years ago | (#33165816)

The same checksum they use for UPC codes. Sum up the 10 significant digits. Then take that sum(S) and push up to the next tens unit(T). The difference of T-S = check digit.

E.g. UPC code 54556 39824. Sum is 51. Next tens is 60. 60-51=9 so the check digit is 9. The same basic formula could work for ISBN numbers too.

can't grok the numbers... (1, Funny)

Anonymous Coward | about 4 years ago | (#33165848)

129,864,880 different books? What is that in Libraries of Congress?

Re:can't grok the numbers... (1)

bannable (1605677) | about 4 years ago | (#33166190)

Just about one and a half.

Re:can't grok the numbers... (0)

Anonymous Coward | about 4 years ago | (#33166254)

129,864,880 published books, that is. (4, Insightful)

andrewagill (700624) | about 4 years ago | (#33166920)

How about the books that people write and spread around to friends or books published by small in-house printshops, often as promotional material? Books written before ISBN that are still in libraries but no longer published (Bodoni's type specimens come to mind, though it looks like some of these are indeed catalogued by WorldCat)? Books that were printed years ago that we know we lost to the ages (the lost Gospel of Barnabas--not the forged Gospel of Barnabas--comes to mind). What about the books that we never knew existed?

This estimate isn't bad for published works, but it does not adequately answer the question posed, ``Just how many books are out there?''

Stately homes (0)

Anonymous Coward | about 4 years ago | (#33167126)

Visit some of the stately homes of England and it will be obvious that there are lots and lots of books that are unlikely to be in very many libraries but which would contain lots of fascinating historical and geographical info. Things like the history of our county, memoirs of my service as a priest in this parish. Many of these homes are operated by the National Trust but often the home and contents is still privately owned. It would take a lot of work to get access to scan this stuff, but I would love to see it done. There are thousands of small local museums and libraries throughout the world with lots of regional information, garnered from the estate of prominent citizens who died. Google has only scratched the surface with their scanning to date.

All You Need is ONE Book (0)

Anonymous Coward | about 4 years ago | (#33167724)

Dianetics by L. Ron Hubbard...

(Bet you thought I was going to say the Bible. Wrong, I'm crazier than that!)

129,864,880 different books (0)

Anonymous Coward | about 4 years ago | (#33168468)

there are 129,864,880 different books in the world

So how many library of congresses is that?

ISBN sucks for digital books (3, Insightful)

bcrowell (177657) | about 4 years ago | (#33168536)

ISBNs suck as identifiers for digital books, especially digital books that are free. There are two problems.

Problem number one is that they cost money. Let's say someone writes up a really nice manual documenting some open-source software. He wants the manual to be free, just like the software. But now if he wants an ISBN, he has to pay money to get the ISBN, which means expending dollars on a book that is not going to be bringing in any dollars. The fact that ISBNs cost money is out of step with the fact that we have this thing called the World Wide Web, which is basically a huge machine for letting people do publishing without the per-copy costs that are associated with print publishing.

The other problem is that ISBNs are supposed to uniquely identify an edition of the book. This makes sense for traditional print publishing, where the economics of production forced people to make discrete editions widely spaced in time. It makes no sense for print on demand or for pure digital publishing. I've written some CC-licensed textbooks. When someone emails me to let me know about a typo or a factual error, I fix it right away in the digital version, and I usually update the print-on-demand version within about 6 months. No way am I going to assign a different ISBN every 6 months.

We can say that ISBNs are for printed books, not for ephemeral web pages, but that doesn't really work. The two overlap. My textbooks exist simultaneously as web pages, pdf files, and printed books. Amazon sells a book for the kindle using one ISBN, assigning a different ISBN to the printed version. Print-on-demand books share some characteristics with printed books (e.g., they're physical objects) and some with the web (can be updated continuously).

By the way, why do you think library catalogs don't show ISBNs? It's because ISBNs are meant as commercial tools, like the barcode on a box of cereal. If google finds ISBNs useful for other purposes than selling copies of books, it's probably because google is trying to deal with a massive number of books using a minimum amount of human labor.

what about pre-20th century works? (1)

morethanapapercert (749527) | about 4 years ago | (#33169094)

OK, I'm a bad little slashdotter, I actually RTFA. I noticed a few things:

1)TFA actually acknowledges that the ISBN is very North America-centric, but the other cataloging types are also either N.A-centric or at least western world-centric.
2) The entire article is based on efforts to simply compile a list of books by aggregating and loosely filtering/sorting several other lists. The lists mentioned are, as far as I know, all heavily biased toward 19th and 20th century works. (The article explicitly mentions that one problem is that it doesn't include numerous works not intended for commercial consumption, such as doctoral theses and so on.)

I would argue that the most important works to digitize first is not the low-hanging fruit of works already cataloged and in most cases, existing in multiple copies in multiple locations. (we are at little risk of losing the works of Dan Brown (cited in the article) to the depredations of time during the scope of this project.) To me; the most important works to get digitized are those works where there are only one or two copies, are possibly hundreds of years old and are moldering away forgotten on the back shelves of some monastary or filed and forgotten in the bowels of some museum.
What I'd like to see is Google and a few other digital data industry leaders get together and create a bounty system for old books. Simply put: The Global Translation Movement will pay say a buck a page multipled by the confirmed age of the book in question. (similar pay scales would have to be worked out for those really old "books" that consist of wood tablets, bamboo or papyrus strips and so on.) The project would need to go out of its way to contact old monastaries, nunneries, temples, museums and so forth. A 200 page folio that is 250 years old nets 50,000$ for the monastary that scans it and shares the digital copy with the world. My inspiration for this came from the Islamic Translation Movement of medieval times.
You could do similar bounties for translations as well into four or five of the world's most widespread languages. (Chinese, English and Arabic come to mind.)
If I were some kind of intellectual or academic authority, this is something that I'd seriously pitch at the next Ted Talk...

129,864,880 different books in the world (1)

briniel (916290) | about 4 years ago | (#33169524)

and there are even more in Lucien's library in the dreaming.

I Like 'Em (Books) (0)

Anonymous Coward | about 4 years ago | (#33171084)

I am curious about the characterization of ancient texts. Does the ISBN system take account of books written before the ISBN was created? After all, books have been around for a very long time. The printing press made books inexpensive and pervasive, but books existed long before.

Take a famous example, the Gutenberg Bible. Does it have an ISBN number? Now a much more difficult one: How about the Code of Hammurabi, which was "published" on clay tablets? How about the Dead Sea Scrolls, at least the intact ones? And what about some of the Mayan books, which are incredibly rare? How about some of the Egyptian texts, written on papyrus?

It would be interesting to know what qualifies as a "book".

I'm pretty sure... (1)

johosaphats (1082929) | about 4 years ago | (#33171162)

Steven King has written at least that many.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>