×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Google To Offer Free Database Storage for Scientists

Zonk posted more than 6 years ago | from the are-they-supporting-science-or-science! dept.

107

An anonymous reader writes "Google has revealed a new project aimed at the scientific community. Called Palimpsest, the site research.google.com will play host to 'terabytes of open-source scientific datasets'. It was originally previewed for scientists last August . 'Building on the company's acquisition of the data visualization technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder team, Google will also be offering algorithms for the examination and probing of the information. The new site will have YouTube-style annotating and commenting features.'"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

107 comments

mining for ads (5, Funny)

spud603 (832173) | more than 6 years ago | (#22112652)

So will they be mining the data for contextual ads?
I'd be curious what their algorithms think my data says I want to buy...

Re:mining for ads (-1, Troll)

Anonymous Coward | more than 6 years ago | (#22112898)

I keep a databse of all the guys I've bottomed for (so I know wear to go to find 12 inch uncut black dick on thursday nights). Is that considred science?

-- Rob Malda

Re:mining for ads (3, Interesting)

Seto89 (986727) | more than 6 years ago | (#22112902)

It managed to pick ads accurately even when I view a GPG encrypted [wikipedia.org] emails through the web-interface - it gave links to proprietary PGP, some Fedora related sites and a page about encryption - all that from a standard header and encrypted text...

Re:mining for ads (2, Insightful)

Anonymous Coward | more than 6 years ago | (#22113170)

This is more than likely "tweaked" by a savvy google employee. Think of it as the way "750 ml in shots" gives you the right answer. It's clever, but "it" didn't manage to do it; it was just some Google engineer's Friday project which made it to release, because google isn't entirely soulless yet.

Re:mining for ads (-1, Flamebait)

Anonymous Coward | more than 6 years ago | (#22113118)

AHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHAH FUCK OFF YOU FUCKING JIZZQUEEN.

Do you really think people are interested in what you have to say, or what? Honestly, 118 comments and every single one of them could have been made by a retarded black toddler with rabies. Why don't you just die, Jizzqueen? Just die.

Re:mining for ads (0)

mrvan (973822) | more than 6 years ago | (#22113246)

The scary part here is not the privacy of the scientist. But many social or behavioral scientists (social networks, psychology, ...) have datasets about real people, either from surveys or from observations. Often, the same group of people (eg college students) that participate in these surveys also have a lot of info online. Consider the marketing opportunities / privacy implications there, if google or some other source were able to match them...

Re:mining for ads (4, Informative)

mikael (484) | more than 6 years ago | (#22113640)

These are data sets that have already been placed in the public domain by the scientists. These could be astronomy images, multi-spectral image photography, remote satellite imagery, seismology recordings, MRI/NMR/CAT scans and many other types of volume, image and signal data.

Re:mining for ads (1)

rtb61 (674572) | more than 6 years ago | (#22114140)

More likely contextual patents based upon data mining scientific research. Also it gives them valued data to sell to three letter government agency about what particular scientists are working on and more importantly what other people are interested in that research.

Got to be careful, a passing interest in bacteriological research might land you on the extreme better safe than sorry terror watch list. Where the systematically dismantle your household in search of dangerous substances, for untidy and scruffy computer geeks a very likely possibility of guaranteeing an immediate prosecution for harbouring and supporting microbes of mass destruction.

Just another googlite publicity stunt gone wrong, likelihood that scientific researchers will trust the googlites with their data, perhaps a scientist can do research on the subject, publish it with google, and we can all search it ;D.

Re:mining for ads (0)

Anonymous Coward | more than 6 years ago | (#22114672)

Lets play some other things back.
    1. Google cooperates with Chinese search censors and turns in searchers so that they can have their brains washed/ or skinned to sell the collagen in their skin.
    2. Now Google wants to store the scientific data of major scientific researchers in this country.
    3. It is known that the Chinese are after all the scientific data that they can mine no matter how supposedly insignificant.
    Therefore: 4. Google and is acting as a not so covert enemy agent for the Chinese.

And in Redmond.... (1)

seeker_1us (1203072) | more than 6 years ago | (#22112656)

From TFA, to get masssive amounts of data to Google:

(Google people) are providing a 3TB drive array (Linux RAID5). The array is provided in "suitcase" and shipped to anyone who wants to send they data to Google.

Google doing this. And they use Linux "suitcases" for transport.

Hide the chairs.

Re:And in Redmond.... (0, Troll)

calebt3 (1098475) | more than 6 years ago | (#22112706)

Hide the chairs./quote. Nah. Programmers might write smaller code if they can't sit until they are done. Easy way to prevent those features nobody wants!

Re:And in Redmond.... (0)

Anonymous Coward | more than 6 years ago | (#22112812)

You fail at quoting.

Re:And in Redmond.... (3, Informative)

tomhudson (43916) | more than 6 years ago | (#22113854)

3 terrabytes isn't that much any more. You can get 750 GByte hard drives for $160 - 5 drives ($800) gives you your 3TB.

Or 4 x 1TB hard drives ($180 ea) gives you $720, so throw in $10 to boot the os off a usb key.

Cheap linux box? Well, you don't need to supply a monitor, keyboard, mouse, speakers, or even much ram - you do the math.

OMG WTF THIS SUX (5, Funny)

User 956 (568564) | more than 6 years ago | (#22112690)

The new site will have YouTube-style annotating and commenting features.

And hopefully the commentary will be just as insightful and poignant!

S.C.U.M. (-1, Offtopic)

Anonymous Coward | more than 6 years ago | (#22112696)

Life in this society being, at best, an utter bore and no aspect of society being at all relevant to women, there remains to civic-minded, responsible, thrill-seeking females only to overthrow the government, eliminate the money system, institute complete automation and destroy the male sex.

It is now technically feasible to reproduce without the aid of males (or, for that matter, females) and to produce only females. We must begin immediately to do so. Retaining the male has not even the dubious purpose of reproduction. The male is a biological accident: the Y (male) gene is an incomplete X (female) gene, that is, it has an incomplete set of chromosomes. In other words, the male is an incomplete female, a walking abortion, aborted at the gene stage. To be male is to be deficient, emotionally limited; maleness is a deficiency disease and males are emotional cripples.

The male is completely egocentric, trapped inside himself, incapable of empathizing or identifying with others, or love, friendship, affection of tenderness. He is a completely isolated unit, incapable of rapport with anyone. His responses are entirely visceral, not cerebral; his intelligence is a mere tool in the services of his drives and needs; he is incapable of mental passion, mental interaction; he can't relate to anything other than his own physical sensations. He is a half-dead, unresponsive lump, incapable of giving or receiving pleasure or happiness; consequently, he is at best an utter bore, an inoffensive blob, since only those capable of absorption in others can be charming. He is trapped in a twilight zone halfway between humans and apes, and is far worse off than the apes because, unlike the apes, he is capable of a large array of negative feelings -- hate, jealousy, contempt, disgust, guilt, shame, doubt -- and moreover, he is aware of what he is and what he isn't.

Although completely physical, the male is unfit even for stud service. Even assuming mechanical proficiency, which few men have, he is, first of all, incapable of zestfully, lustfully, tearing off a piece, but instead is eaten up with guilt, shame, fear and insecurity, feelings rooted in male nature, which the most enlightened training can only minimize; second, the physical feeling he attains is next to nothing; and third, he is not empathizing with his partner, but is obsessed with how he's doing, turning in an A performance, doing a good plumbing job. To call a man an animal is to flatter him; he's a machine, a walking dildo. It's often said that men use women. Use them for what? Surely not pleasure.

Re:S.C.U.M. (0)

Anonymous Coward | more than 6 years ago | (#22113162)

Get lost, Valerie. Men are talking.

Scientific Research (2, Funny)

Anonymous Coward | more than 6 years ago | (#22112768)

This should come in handy for my research on normal variants of the female mammary glands.

Are they insane? (5, Funny)

Hognoxious (631665) | more than 6 years ago | (#22112778)

Why would you want to store a scientist in a database?

Re:Are they insane? (3, Funny)

jd (1658) | more than 6 years ago | (#22113378)

Because you can then replicate the really good ones. I would have thought that obvious.

Re:Are they insane? (3, Funny)

Malevolent Tester (1201209) | more than 6 years ago | (#22113574)

Might be a way to get them to join a union.

Re:Are they insane? (1, Flamebait)

CastrTroy (595695) | more than 6 years ago | (#22113754)

That'll never happen. Scientists know that unions are for people who hate their job, and don't actually want to do any work. Scientists at least most that I've met, love their jobs, and love to actually work while at their job.

Re:Are they insane? (2, Interesting)

jd (1658) | more than 6 years ago | (#22114136)

The whooshing sound you heard was the set logic joke flying overhead.

Even so, though, unions only have a bad rep in America. Interestingly, America is also the country with the greatest number of stress-related illnesses in the western world (more than twice as many heart attacks from stress as in England), and that is tied to their self-destructive yet amazingly narcistic "work ethic" which simultaneously creates unbearable stresses on the human frame whilst producing only minimal extra productivity. Trade unions were founded as a form of cooperative, providing heath benefits, life insurance, education and training, back in the days of King James II. Remind me, when precisely did Americans provide these to their workforce? Oh, you mean 50% of them still don't have them? How quaint.

Unions as a political, rather than a socialist, entity is partly because many in America also hate all forms of socialism. This explains why the rest of the world regards them as anti-social. So much time and effort has gone into linking socialism with communism, communism with Communism, and Communism with Stalinism (even though none of those are even remotely connected) that all you have left is a bunch of paranoid spoiled rich kids and a bunch of equally paranoid serfs. This is a violently unstable system which must either correct itself or risk the fate of other violently unstable civilizations. Oh, the US won't vanish overnight, no matter what. Even the Roman Empire survived in some form or other for a millenium after it imploded. There will likely be an identifiable United States of America in 3000 AD for that reason alone. The question is, will it a stagnating copy of how it is now, or something that has learned from its mistakes and corrected them?

Re:Are they insane? (1)

kamapuaa (555446) | more than 6 years ago | (#22115620)

Uh, wow, as a reply to "scientists don't like Unions" you state "The Unites States won't last as long as the Roman Empire" and continue on with a lengthy, somewhat nonsensical anti-US rant. It really had very little to do with what the poster stated, or with Google offering free database storage - obviously you're looking for the slightest provocation to rave on against the US, whether or not is has anything to do with the subject at hand.

I realize Slashdot attracts anti-social nerds who often have weird agendas to promote, I just wish it wasn't getting modded up to a level where I had to read it.

Re:Are they insane? (2, Informative)

ryanov (193048) | more than 6 years ago | (#22115090)

I like my job, I'm a sysadmin, I'm on call right now, and I'm a committee chair for my union. Guess you don't know everything.

Re:Are they insane? (1)

jd (1658) | more than 6 years ago | (#22114038)

It also means you can subtract differences. Creationist scientists won't like it, as it's possible to have alternative views.

Re:Are they insane? (5, Funny)

jma05 (897351) | more than 6 years ago | (#22113670)

> Why would you want to store a scientist in a database?

So that these geeks can have normal relationships.

Re:Are they insane? (2, Funny)

Amitz Sekali (891064) | more than 6 years ago | (#22115764)

> Why would you want to store a scientist in a database?

So that these geeks can have normal relationships.
But they probably won't perform as well as before normalization. After all. there will be performance hit in joining tables.

Yes. (0)

Anonymous Coward | more than 6 years ago | (#22113894)

Their insanity is proven by the following statement: The new site will have YouTube-style annotating and commenting features. Anyone who would use *youtube comments* as an inspiration for their site is obviously in need of mental help.

Ob. Simpsons (-1)

Anonymous Coward | more than 6 years ago | (#22112788)

Homer: Batman's a scientist!

Fantastic for Students and New Researchers (5, Informative)

cheesethegreat (132893) | more than 6 years ago | (#22112840)

If this actually happens, and researchers are willing to make their data-sets open source, it would be a huge boon for budding researchers. It would allow students to do more than just work with a sample dataset out of a textbook. Graduate students learning how to do advanced modeling would be able to work with real datasets, vastly improving their skillset and employability. Just consider these two lines on a CV, and ask yourself which one jumps out at you.

"Designed a model for the dataset on the CD-ROM included with the Modeling Organic Systems textbook"

"Designed a model for the WISK-III heart output dataset published in 2006."

New entrants to a field would have instant access to enormous amounts of data very quickly and easily. Although the big kudos comes when you can do totally original work (new data, new analyss), a researcher who could come up with a new critique of older papers and studies would definitely get themselves noticed.

Overall, this is a really positive step for everyone on the lower rungs of the scientific ladder, and especially positive for those with limited resources.

Re:Fantastic for Students and New Researchers (4, Insightful)

ushering05401 (1086795) | more than 6 years ago | (#22112904)

I feel your optimism, and support this idea, but the cynical side of me must speak out.

Isn't this information more likely to be capitalized upon by those who already dominate the commercialization of research?

Yes, noobs would have enormous amounts of raw material at their disposal, but wouldn't they find applications derived from this data already covered by patents that were distilled from the data sets through analysis performed by labs full of trained corporate monkeys before they can get their own foot in the door of innovation?

I would love to awaken one day and find that I am just being a jaded fool, but I believe developments like this will help the commercialized overlords more than anyone else as they are the ones with sufficient resources to throw at privatizing the results of scientific research.

Re:Fantastic for Students and New Researchers (3, Insightful)

xenocide2 (231786) | more than 6 years ago | (#22113032)

Isn't this information more likely to be capitalized upon by those who already dominate the commercialization of research?
Can't it be both? It's not like by subscribing you're depriving others. And the data uploaded will be made freely available.

You cannot patent mere data, or interpretations of data. Patents are for machines, processes, and the like. Of course, the publication of data doesn't preclude people from patenting a chemical process that results in a specific gene, but this is already happening elsewhere.

In fact, I suspect the entire point of this is for Google to take over maintenance of the Genomic Databases and create new such databases. Many times the academic databases are.. poorly maintained, and certainly not compatible, despite the very similar contents. There's already efforts to make them more compatible, but Google appears to be able to offer some very neat stuff on top of it all. The silliness about shipping RAID arrays mostly seems to be for unis not already hooked up to I2.

Re:Fantastic for Students and New Researchers (0)

Anonymous Coward | more than 6 years ago | (#22113600)

You seriously underestimate just how screwed up the USPTO is right now.

They already patented half the human genome. And are allowing it to be enforced by the pharmecuticals. What the patent law says, and what the reality is, have little to nothing in common.

Re:Fantastic for Students and New Researchers (0, Redundant)

gotzero (1177159) | more than 6 years ago | (#22112934)

Making this data available can be very valuable, both as a vetting process and as a way to compound knowledge. I think it is nice to at least offer.

Re:Fantastic for Students and New Researchers (5, Insightful)

cortex (168860) | more than 6 years ago | (#22113042)

As a neural engineering researcher who routinely generates terabyte size datasets, I have to say that I both like this idea and think it is unlikely to succeed. I would love to have a place to store large datasets and access them from wherever I am at. However, since these datasets will be open sourced, I will be extremely unlikely to put any dataset on google until I am certain I have extracted all of the publishable findings from it. I think that most researchers after putting in years of effort and a lot money into acquiring a dataset will also think twice about open sourcing their data. If the TOS where to include some means for controlling publications which resulted from analysis of the data, then it might be more likely to succeed.

Re:Fantastic for Students and New Researchers (3, Insightful)

JanneM (7445) | more than 6 years ago | (#22113414)

If the TOS where to include some means for controlling publications which resulted from analysis of the data, then it might be more likely to succeed.

But in that case, would you want to go anywhere close to someone else's data, for the risk of "contaminating" your research and perhaps end up in a protracted brawl over discovery rights?

I mostly agree with everybody else: it's a neat idea but for a lot of people it's not going to fly.

The one area I think it could be good is for datasets that are already open and that are meant to be shared. In vision research, for instance, or in various fields in machine learning there's quite a lot of sort-of-standard test data sets created by various groups that can make it easier to compare models directly. Having all of those collected in one place would certainly make it easier to find and actually use them rather than reinventing the wheel once again.

Re:Fantastic for Students and New Researchers (3, Insightful)

CastrTroy (595695) | more than 6 years ago | (#22113836)

That's really weird that this appeared on Slashdot tonite, just as I was downloading the historical weather data [ec.gc.ca] for Canada. Still waiting for it to download. I was thinking that it would be a nice data set that would be interesting to work with. It's not a huge dataset by any means, only 200 MB zipped, but it's still bigger and more real than any of the stuff I got to use in university. And a lot larger than any real data set I could generate on my own. Does anybody else have any links to interesting open data sets?

Re:Fantastic for Students and New Researchers (1)

Dr. Zowie (109983) | more than 6 years ago | (#22115274)

I will be extremely unlikely to put any dataset on google until I am certain I have extracted all of the publishable findings from it


That's so twentieth century. The scarce resource these days is not data, it is mindshare in the science community. In the 1990s, many of the SOHO [nasa.gov] instruments experimented with opening up their data sets to all comers immediately, and those instrument teams have generated about an order of magnitude more publications than their less-forward-thinking cohorts.

You should be so lucky that someone tries to get into your data and publish stuff from it.

Re:Fantastic for Students and New Researchers (3, Interesting)

cortex (168860) | more than 6 years ago | (#22115356)

20th century or not, the fact is that if I don't publish papers with my name as first or last author I don't get tenure. I'd be happy to have people publish papers using my data as long as I have already gotten a few first author papers out of it. Of couse, that would only apply to my data that is several years old. Also, what is to stop someome from publishing using my data and not having me as an author at all? The TOS to access the data are going to be very important.

Re:Fantastic for Students and New Researchers (1)

Dr. Zowie (109983) | more than 6 years ago | (#22115422)

...what is to stop someone from publishing using my data and not having me as an author at all?


Nothing! On the other hand, it would be a pretty foolish person who tried to do that -- if you made the data you're likely the only one who truly understands it. Other threads in this discussion talk about that problem in the context of elementary particles. For solar observations it is similar -- there are plenty of "gotchas" in every data set, and you'd better be working with the instrument team if you want to make a fool of yourself.

Another angle: if you really do deserve tenure, then your problem is probably the opposite: you've got too many interesting ideas to explore and data sets to analyze, and you're likely never to get around to doing some of the necessary-but-more-tedious analyses of your back data. If you hold on to the data, it will never get analyzed by anyone.

Re:Fantastic for Students and New Researchers (1)

Dr. Zowie (109983) | more than 6 years ago | (#22115428)

and you'd better be working with the instrument team if you want to make a fool of yourself.


Hmmm, I seem to have omitted an embarrassing "don't", as in "if you don't want to make a fool of yourself.".

Re:Fantastic for Students and New Researchers (2, Informative)

cortex (168860) | more than 6 years ago | (#22117276)

Nothing! On the other hand, it would be a pretty foolish person who tried to do that -- if you made the data you're likely the only one who truly understands it. Other threads in this discussion talk about that problem in the context of elementary particles. For solar observations it is similar -- there are plenty of "gotchas" in every data set, and you'd better be working with the instrument team if you want to make a fool of yourself.

This is exactly why this system is likely to fail. No scientist is going to spend millions of dollars and years of effort just to put their data on a server where someone else can analyze it, publish the results, and therefore get most of the credit and reward. The end result of this process is the person actually collecting the data doesn't get tenure and ends up shutting down their lab.

In terms of understanding the data and "gotchas", we alway have meta-data to explain the details of the experiment and the data. Through collaborations with specific individuals in which publications authorship is discussed up front, I have allowed other to analyzed my data.

We design and build our instrumentation ourselves, or have in built at an outside contractor. In either case we always validate every piece of experimental equipment. So I think it is safe to say that we are cognizant of the subtleties of our data.

Another angle: if you really do deserve tenure, then your problem is probably the opposite: you've got too many interesting ideas to explore and data sets to analyze, and you're likely never to get around to doing some of the necessary-but-more-tedious analyses of your back data. If you hold on to the data, it will never get analyzed by anyone.

Its not a case of deserving tenure or not. You need to have peer-reviewed documentation of scientific productivity and standing. This is why I have graduate students and postdocs. Typically, a senior graduate student or a postdoc ends up being first author on a paper, while I am last author. And this is what tenure review committees look at - How may first and last author papers do you have. Having a lot of papers with my students/postdocs as first author demonstrates that I am being a good mentor and advancing the careers of my the people in my lab. Having a lot of last author publications demonstrates that my lab is in general being productive. They also factor in the quality and prestige of the journals where the work is published.

As I stated earlier, after my lab has gotten a few publications out of a data set, I would be OK with publishing in an open database. However, I would still insist on having some control over how future publications are credited.

Re:Fantastic for Students and New Researchers (1)

TheRaven64 (641858) | more than 6 years ago | (#22116316)

20th century or not, the fact is that if I don't publish papers with my name as first or last author I don't get tenure.
I'm intrigued. What is significant about last author in your field? For us, contributors are listed in alphabetical order when their contributions are equal and in order of their contributions when they are not. The last author is always the guy who did the least work (maybe just proof-read, but for various political reasons still gets his name on the paper) or the guy[1] with the surname that comes last alphabetically.

Re:Fantastic for Students and New Researchers (1)

maubp (303462) | more than 6 years ago | (#22116506)

In Biology and Chemistry at least, the supervisor or project leader is often named last, while the student/researcher who did the bulk to the grunt work is named first. Of course, it wasn't always this way.

What field are you talking about.

Re:Fantastic for Students and New Researchers (1)

cortex (168860) | more than 6 years ago | (#22117044)

In Neuroscience, Neural Engineering, Biomedical Engineering... The first author is the person who did the most work and wrote the paper. The Last author is typically the Principle Investigator (PI), i.e. the lab/project supervisor who wrote the grant that funded the project. While the PI is usually the person who does the least of the day-to-day work, they are often the person who makes the most intellectual contribution in terms of experimental design and problem solving. Other authors are typically listed in the order in which they contributed, either intellectually or actual work, to the project.

Re:Fantastic for Students and New Researchers (1)

mark-t (151149) | more than 6 years ago | (#22116398)

Also, what is to stop someome from publishing using my data and not having me as an author at all?

Where I come from, that's called plagiarism, and is not only a serious academic offense at every school I've ever heard of, but is also an infringement of copyright law.

Of course, if you can't be bothered trying to protect your copyrights because you're too busy doing other things, than you just have to have enough faith that the segment of population that would be interested in your data isn't particularly likely to want to be inclined to plagiarize. If you feel you cannot trust your peers in this regard, one is compelled to wonder if you felt that you yourself could get away with plagiarism occasionally, would you commit it? (not that I'm asking you that question specifically, just offering something to think about.)

The TOS to access the data are going to be very important.

That would likely be the same TOS that applies to any copyrighted work... it may not be copied without consent of the author, with the standard exemptions to infringement also applicable.

Re:Fantastic for Students and New Researchers (5, Insightful)

Gromius (677157) | more than 6 years ago | (#22113286)

As a researcher myself (particle physics), I echo others comments in this thread that a) its a nice idea but b) isnt going to happen. There are three main problems, the first two are solvable, the third isnt

1) trivially, 3TB is no where near enough to store my data

Bit of a non issue for the overall concept but if google wants my data, they really are going to have to up the storage by a few orders of magnitude.

2) as others stated, we work really really hard to acquire our data, research is about 10% inspiration, 90% perspiration. We are not giving up our data till we have milked it for all its worth.

This again is solvable, we release our data after we have all the publishable results we can think of and them let others have a crack. Somebody might find something useful and if not, well its great for younger scientists as you say. At the very least, people can reconfirm results at a later date easier. Main reason I like it.

3) The deal killer, for my field and I suspect others, it is really really difficult to understand our data and its really easy to misinterpret it.

New particles have been "discovered" so many times by grad students (and some professors who should know better) in particle physics data that I'm terrified of what somebody with no training outside the system might conclude from the data. At CDF (a fermilab expt) it took us (800 physicists) about 2-3 years to understand the data from the experiment enough to get proper physics results out of it. Even now, it takes a new comer about a year to get upto speed and thats with help from all the experts. But its very easy to think you understand things after a few weeks when infact your missing some incredibly subtle point and so I'm sure we would be flooded by bogus results due to misinterpretations from the data if we release it.

Anyway this all comes from a particle physics view point but I suspect quite a few other fields will be similar.

Re:Fantastic for Students and New Researchers (0)

Anonymous Coward | more than 6 years ago | (#22113396)

I don't know if the third one sounds like such a problem. If you are afraid of bogus results, just ignore them the same way you ignore nutty proofs of the Riemann hypothesis. There is something a bit elitist about saying "the common people aren't smart enough to handle our data." Scattered among the common people are a few young geniuses who can handle our data better than any of us, and I'd like for them to have access to it.

Re:Fantastic for Students and New Researchers (4, Informative)

Gromius (677157) | more than 6 years ago | (#22116026)

Yes I can see how it can appear elitist. And yes it is elitist in a sense. Because its really hard. A PhD student typically has to do about 3 years hard work to get out an analysis sufficient quality and thats with help from experts. Before that they have 4 years of advanced physics. I'm not saying the common man cant do it, just it'll take them years of hard work to understand it to analysis physics results which have already had 800 physicists pour over it to extract most things of value. However as I said, its really easy to think you've understood the data in a week or so and produce bogus results which I suspect most people would do.

As for the few geniuses who can handle the data better than any of us, yes its a noble idea and it sounds nice in practice. However these geniuses are still going to have to slog through the data and its still going to be hard, even for them to do it by them selfs. Its not something some wiz kid will pick up and by the afternoon have a nobel prize. However if they are really interested, they can stop by their local particle physics lab and talk to the people there. Its not as if we dont ever give out our data, lots of students (undergraduates and 6th formers (high schoolers for yanks) over the years have been given a copy and helped to understand it. If you want it badly enough you'll probably get some sort of access to old data. Sure some may fall through the cracks but thats unavoidable.

Also incidentally the most bogus results I'm afraid of are not from the general public but from our theoretical colleagues who are actually the people we are most concerned about hiding the data from :) A lot (but not all) think that data analysis is easy and have a vested interest in proving a certain model so subconsciously they might misinterpret the data or not rigorously check it when it looks like its proving what they want it to prove. Then all of a sudden you have headlines like Prof. X from Ivy League University Y has found a new physics Z in Tevatron/LHC data which if true would be the most significant discovery in physics in the last 30 years and so is splashed all over the media. The public and media just knows this guy is an ivy league professor but doesnt know that he is little more qualified to analysis the results than they are so they believe him. Arguments would then ensure of the significance of the finding and then eventually a retraction is printed. But this would be in the public and in the media and I think this is damaging to science as the general public starts thinking "these stupid scientists, always changing their mind, should we believe anything they say". Plus you would get an increase in the usual crazy science results but this time with data whose analysis most people cant tell is rubbish. Slashdot would be happy as they tend to like crappy science :) but its not something scientists would be happy with.

Re:Fantastic for Students and New Researchers (2, Interesting)

Unoti (731964) | more than 6 years ago | (#22113482)

But its very easy to think you understand things after a few weeks when infact your missing some incredibly subtle point and so I'm sure we would be flooded by bogus results due to misinterpretations from the data if we release it.
You sound very intelligent and I'm sure you're correct. But I couldn't help but think how much that sounds like the reasons why the Catholic Church conducted mass in Latin for so long, and why they were initially reluctant to have the Bible translated to English.

Re:Fantastic for Students and New Researchers (1)

mako1138 (837520) | more than 6 years ago | (#22114454)

Terabytes of scientific data don't purport to hold the answers to life, the universe, and everything. A limit on CP violation probably doesn't have anything to do with getting into heaven.

Re:Fantastic for Students and New Researchers (1)

Mad_Rain (674268) | more than 6 years ago | (#22115034)

But I couldn't help but think how much that sounds like the reasons why the Catholic Church conducted mass in Latin for so long, and why they were initially reluctant to have the Bible translated to English.

Yeah, and look how that turned out! We end up with complete Christian loonies [wikipedia.org] instead of reasonable Catholics. (sarcasm intended)

Re:Fantastic for Students and New Researchers (0)

Anonymous Coward | more than 6 years ago | (#22115124)

Yeah, and now it's in English and what do you get? Huckabee for president.

E pluribus unum et al.

Re:Fantastic for Students and New Researchers (1)

BlueParrot (965239) | more than 6 years ago | (#22115908)

You sound very intelligent and I'm sure you're correct. But I couldn't help but think how much that sounds like the reasons why the Catholic Church conducted mass in Latin for so long, and why they were initially reluctant to have the Bible translated to English.


Nonsense, scientific experiments are supposed to be carried out in a reproducible way, meaning that if the guy who wrote a paper won't give you your data you should be able to just go do the experiment yourself. If the GP was arguing scientists shouldn't document their assumptions then you would have a point, but that was not what he was saying. The situation is more akin to everybody already having a bible, and somebody saying "There are more three letter words than four letter words in it" and then refusing to say how many there are of each. If you don't believe him you can just take your own bible, count the respective words and see if he is right. A situation which would be analogous with what you described would be if scientists said "we did this experiment which proves there are no gravitons, but we won't tell you how we did it.". This was not what the GP was suggesting. Any papers published from his lab would without doubt state his assumptions and describe the experiment, he just won't bother giving you all the "pm tube #5 triggered 300 times, pm tube #4 triggered 200 times ... etc...".

Re:Fantastic for Students and New Researchers (1)

TheRaven64 (641858) | more than 6 years ago | (#22116330)

Nonsense, scientific experiments are supposed to be carried out in a reproducible way, meaning that if the guy who wrote a paper won't give you your data you should be able to just go do the experiment yourself.
Except the grandparent was talking about particle physics. For any given experiment, there are likely to be at most two sites in the world where it can be reproduced and you need to book time years in advance to use them and often justify why your experiment is worth performing. If the reason for performing it is 'I don't trust this guy's results' you may well be denied. This means, unless you have a few billion dollars sitting around to build your own particle accelerator, you can't reproduce the experiments.

Re:Fantastic for Students and New Researchers (2, Insightful)

dogmod (702959) | more than 6 years ago | (#22115838)

Seems to me that each of these deal killers is a red herring. 1. My data set is too large = I have no idea what's essential. 2. I worked too hard to get this data, I'm not going to give it away = I'm a mediocre scientist competing against a lot of other mediocre scientists - this data might be my one chance to win the lottery. Oh, and just for the record, I don't really give a shit about the progress of my field - fuck 'em, fuck 'em all, me, me, me. 3. Newbies will misunderstand my data and pervert it = What the fuck am I posting on Slashdot for? As one of the mathematicians involved in computation of the Kazhdan-Lusztig-Vogan polynomials for E8, I say - they now exist, they're marvelous, and you're welcome to them. (I suspect you'll pass though - I would if I were in closer proximity to some other marvel.)

Re:Fantastic for Students and New Researchers (2, Insightful)

tenco (773732) | more than 6 years ago | (#22118942)

1. My data set is too large = I have no idea what's essential.

Yes, that's exactly the point. I am a physics student and the first thing that was told to us before we began our first lab course was: "Don't throw away any data! Even if you think it's unimportant, equipment failure, ...". New discoveries have been postponed for years because someone simply threw away data which seemed to be unimportant at this time. There's simply no way of telling if some data set is essential or not. If you're thinking this way, you should be more than satisfied with results alone.

Re:Fantastic for Students and New Researchers (0)

Anonymous Coward | more than 6 years ago | (#22116410)

Seems to me that the more important reason it won't work for particle physics is that, after LHC, there aren't any significant new colliders lined up for the future. No bucks, no Buck Rogers.

Re:Fantastic for Students and New Researchers (1)

LaskoVortex (1153471) | more than 6 years ago | (#22113934)

Keep Dreaming. Its hard enough to get the average researcher to make sure he or she includes accession numbers for mandatory deposition of data related to publication. Getting them to a contribute to a big community database is sheer fantasy. Plenty of opportunities for this already exist. Centralizing it won't help matters much. Scientists are just like anyone else. They need to make a buck and they don't give away products (data) for free and they certainly don't go out of their way to make it accessible. Now, if google could buy all of the journals and force scientists to deposit data, publish accession information, and formalize meta-data of said data, then we might be getting somewhere.

Main problem, publication rights (1)

Per Abrahamsen (1397) | more than 6 years ago | (#22115988)

I have been in a couple of large scientific projects, and the main problem with making the data public has been to ensure that the researchers who collect the data are getting "author credit" in scientific publications.

The scientists who collect the data are often other people than those who analyze the data, and fit them to the models. As long as everybody is working on the same project, it is possible to ensure that the people who collect the data will be listed as authors in the papers, even if they are usually written by the people who analyze the data.

Once the data has been published, all bets are off. People will analyze the data, and write articles about it, with themselves as authors and a proper acknowledgment to the project that collected the data.

As science works today, being listed as author is paramount. It is the only criteria used by the bean counters to judge whether a scientist is doing any active research. With zero publication follows zero grants, and soon after, zero paycheck.

The way we have done it has been to delay publication of the raw data until the first batch of scientific papers has been accepted. After that, everyone have access.

Given the pace of Google evolution... (1)

unforkable (956731) | more than 6 years ago | (#22112842)

If a computer will ever be able to invent something, or make a scientific discovery, it'll certainly be (IMHO) a computer directly related to Google.

open source (0)

Anonymous Coward | more than 6 years ago | (#22112844)

so now anything that is publicly available fits under the heading open source? you guys are really trying to give yourselves too much credit.
 
source is code. if what you're offering isn't code than it can't be open source. public domain should be referenced as such as to avoid further degradation of terminology.

just wait for the *clerical error* over at the DOJ (1)

MacarooMac (1222684) | more than 6 years ago | (#22112852)

and watch in ecstasy as one of Google's suitcase drives SLURPS up the FBI's *real* datasets on 9/11, Elvis... oh, and that schematic for a site-to-site transporter beam that I knocked up a while back, which they somehow stole off my google docs.

Is it limited? (1)

jonsmirl (114798) | more than 6 years ago | (#22112890)

Researchers I know would fill up a yottabyte if they were allowed to. I hope Google has plans for keeping growth of the datasets under control.

Re:Is it limited? (1)

jd (1658) | more than 6 years ago | (#22113802)

There are those who would argue that this is an open invitation by Google for scientists to try and DDoS their systems, and those who would argue it might not be a bad thing if they succeed. Personally, I disagree with the last part, but DO think that this could lead to Google developing vastly superior search technology. They can search gigantic data sets, sure, but the percentage of false hits is way too high. When you move into scientific data and multi-dimensional non-simply-connected non-linear search spaces, you need far better search algorithms than currently exist. AMS work on 10-20 MeV particle accelerators requires operators to largely guess the regions of interest, and atomic mass spectrometry is used in just about every field of science. It's incredibly well-understood and exact in comparison to many techniques you're likely to see used where help and compute cycles would likely be interesting to scientists.

I don't doubt Google's good intentions (or their desire to improve on their search technology by honing it on such gigantic, complex data sets) but I have severe doubts they have the knowledge or the skills to produce a search and analysis system of this level of complexity even in specific fields, hence their problems producing a high positive hit rate merely for web pages. Producing a generalized pattern finder that will work on any problem...

Re:Is it limited? (1)

seededfury (699094) | more than 6 years ago | (#22114186)

It is commonly abbreviated YB. As of 2007, no computer has achieved one yottabyte of storage. In fact, the combined space of all the computer hard drives in the world does not amount to one yottabyte. According to one study, the world's computers stored approximately 160 exabytes in 2006, with nearly 1,000 exabytes projected by 2010.[1] When used with byte multiples, the SI prefix may indicate a power of either 1,000 or 1,024, so the exact number may be either:

According to Wiki [wikipedia.org]

It'll All End In Tears (4, Insightful)

turgid (580780) | more than 6 years ago | (#22112940)

This is a Bad Idea. Too much of the world now depends on Google. And people are running to Google, willing to give their data and identity.

/me shakes walking stick and creeps back into cave.

Re:It'll All End In Tears (4, Funny)

Anonymous Coward | more than 6 years ago | (#22113200)

Do you have any datasets to back up this claim?

Re:It'll All End In Tears (1)

turgid (580780) | more than 6 years ago | (#22116014)

Do you have any datasets to back up this claim?

I'm old and grumpy now. I don't need no stinking data. Get off my lawn!

Re:It'll All End In Tears (0)

Anonymous Coward | more than 6 years ago | (#22116366)

> Do you have any datasets to back up this claim?

Nope, but Google does.

Horrible Idea - What are the TOS? (4, Insightful)

teknopurge (199509) | more than 6 years ago | (#22112946)

Does google get ownership of anything that is uploaded? I wonder how foolish scientists will be as to unknowingly forfeit their copyrights, IP, etc.

Re:Horrible Idea - What are the TOS? (5, Informative)

hostguy2004 (818334) | more than 6 years ago | (#22113272)

Google are offering this service to store PUBLIC DOMAIN data. If people don't want to release the data as public domain, then this aint the service for them. See http://en.wikipedia.org/wiki/Public_Domain [wikipedia.org]

Re:Horrible Idea - What are the TOS? (1)

Walt Dismal (534799) | more than 6 years ago | (#22113588)

Dear Scientists; please store your sensitive nuclear data with Google. We promise not to give it to the Chinese. Our company motto: Do No Evil You Can Get Caught Profiting From.

Re:Horrible Idea - What are the TOS? (1)

mark-t (151149) | more than 6 years ago | (#22116442)

Where do you see that it says public domain? I see nothing in the article or on google's research page that suggests they must surrender their copyright(s).

Re:Horrible Idea - What are the TOS? (1)

hey hey hey (659173) | more than 6 years ago | (#22113326)

Does google get ownership of anything that is uploaded? I wonder how foolish scientists will be as to unknowingly forfeit their copyrights, IP, etc.

Doesn't matter how foolish the scientists are, as the contracts will have to be vetted by the various University legal departments. I'm quite confident that the lawyers will be very careful about their legal rights.

Re:Horrible Idea - What are the TOS? (1)

Elendil (11919) | more than 6 years ago | (#22116140)

> I wonder how foolish scientists will be as to unknowingly forfeit their copyrights, IP, etc.

I assume you're not aware that they already do just that when they publish an article in most scientific journals? The publisher owns the copyright to the article, not the authors.

creators offering free survival for human 'race'/ (-1, Offtopic)

Anonymous Coward | more than 6 years ago | (#22113084)

man'kind'. let yOUR conscience be yOUR guide. you can be more helpful than you might have imagined. there are still some choices. if they do not suit you, consider the likely results of continuing to follow the corepirate nazi hypenosys story LIEn, whereas anything of relevance is replaced almost instantly with pr ?firm? scriptdead mindphuking propaganda or 'celebrity' trivia 'foam'. meanwhile; don't forget to get a little more oxygen on yOUR brain, & look up in the sky from time to time, starting early in the day. there's lots going on up there.

http://news.yahoo.com/s/ap/20071229/ap_on_sc/ye_climate_records;_ylt=A0WTcVgednZHP2gB9wms0NUE [yahoo.com]
http://news.yahoo.com/s/afp/20080108/ts_alt_afp/ushealthfrancemortality;_ylt=A9G_RngbRIVHsYAAfCas0NUE [yahoo.com]
http://www.nytimes.com/2007/12/31/opinion/31mon1.html?em&ex=1199336400&en=c4b5414371631707&ei=5087%0A [nytimes.com]

is it time to get real yet? A LOT of energy is being squandered in attempts to keep US in the dark. in the end (give or take a few 1000 years), the creators will prevail (world without end, etc...), as it has always been. the process of gaining yOUR release from the current hostage situation may not be what you might think it is. butt of course, most of US don't know, or care what a precarious/fatal situation we're in. for example; the insidious attempts by the felonious corepirate nazi execrable to block the suns' light, interfering with a requirement (sunlight) for us to stay healthy/alive. it's likely not good for yOUR health/memories 'else they'd be bragging about it? we're intending for the whoreabully deceptive (they'll do ANYTHING for a bit more monIE/power) felons to give up/fail even further, in attempting to control the 'weather', as well as a # of other things/events.

http://video.google.com/videosearch?hl=en&q=video+cloud+spraying [google.com]

dictator style micro management has never worked (for very long). it's an illness. tie that with life0cidal aggression & softwar gangster style bullying, & what do we have? a greed/fear/ego based recipe for disaster. meanwhile, you can help to stop the bleeding (loss of life & limb);

http://www.cnn.com/2007/POLITICS/12/28/vermont.banning.bush.ap/index.html [cnn.com]

the bleeding must be stopped before any healing can begin. jailing a couple of corepirate nazi hired goons would send a clear message to the rest of the world from US. any truthful look at the 'scorecard' would reveal that we are a society in decline/deep doo-doo, despite all of the scriptdead pr ?firm? generated drum beating & flag waving propaganda that we are constantly bombarded with. is it time to get real yet? please consider carefully ALL of yOUR other 'options'. the creators will prevail. as it has always been.

corepirate nazi execrable costs outweigh benefits
(Score:-)mynuts won, the king is a fink)
by ourselves on everyday 24/7

as there are no benefits, just more&more death/debt & disruption. fortunately there's an 'army' of light bringers, coming yOUR way. the little ones/innocents must/will be protected. after the big flash, ALL of yOUR imaginary 'borders' may blur a bit? for each of the creators' innocents harmed in any way, there is a debt that must/will be repaid by you/us, as the perpetrators/minions of unprecedented evile, will not be available. 'vote' with (what's left in) yOUR wallet, & by your behaviors. help bring an end to unprecedented evile's manifestation through yOUR owned felonious corepirate nazi glowbull warmongering execrable. some of US should consider ourselves somewhat fortunate to be among those scheduled to survive after the big flash/implementation of the creators' wwwildly popular planet/population rescue initiative/mandate. it's right in the manual, 'world without end', etc.... as we all ?know?, change is inevitable, & denying/ignoring gravity, logic, morality, etc..., is only possible, on a temporary basis. concern about the course of events that will occur should the life0cidal execrable fail to be intervened upon is in order. 'do not be dismayed' (also from the manual). however, it's ok/recommended, to not attempt to live under/accept, fauxking nazi felon greed/fear/ego based pr ?firm? scriptdead mindphuking hypenosys.

consult with/trust in yOUR creators. providing more than enough of everything for everyone (without any distracting/spiritdead personal gain motives), whilst badtolling unprecedented evile, using an unlimited supply of newclear power, since/until forever. see you there?

"If my people, which are called by my name, shall humble themselves, and pray, and seek my face, and turn from their wicked ways; then will I hear from heaven, and will forgive their sin, and will heal their land."

meanwhile, the life0cidal philistines continue on their path of death, debt, & disruption for most of US. gov. bush denies health care for the little ones;

http://www.cnn.com/2007/POLITICS/10/03/bush.veto/index.html [cnn.com]

whilst demanding/extorting billions to paint more targets on the bigger kids;

http://www.cnn.com/2007/POLITICS/12/12/bush.war.funding/index.html [cnn.com]

& pretending that it isn't happening here;

http://www.timesonline.co.uk/tol/news/world/us_and_americas/article3086937.ece [timesonline.co.uk]

From the times-ran-away-from-the-PR-department (0)

Anonymous Coward | more than 6 years ago | (#22113202)

Ohh.. terabytes of storage.... wow. I have terabytes of warez in my home, which makes this PR blitz look really petty. What, they're not gonna announce 10MiB of homepage and 20MiB email storage too?

Comment System? (2, Funny)

dysfunct (940221) | more than 6 years ago | (#22113420)

[...] YouTube-style annotating and commenting features.

I'm looking forward to "OMG, ur resrch is teh sux" comments and "CHEEP FUNDING M0RTG4GE" spam from elite universities around the world.

We foreit IP anyway (0)

Anonymous Coward | more than 6 years ago | (#22113440)

> I wonder how foolish scientists will be as to unknowingly forfeit their copyrights, IP, etc.

Scientists generally forfeit IP and copyright to their host University anyway, although the mileage of that varies between institutions.

Google Everything (2, Interesting)

Dirtside (91468) | more than 6 years ago | (#22113462)

The other day my wife said she wants there to be Google Bank. They'd certainly get the online banking thing done right...

Re:Google Everything (1, Interesting)

ScrewMaster (602015) | more than 6 years ago | (#22113772)

The other day my wife said she wants there to be Google Bank. They'd certainly get the online banking thing done right...

Not necessarily ... nobody in their right mind would trust the Google File System to anything remotely mission critical (not even Google: last I heard they use Oracle for all their in-house data processing needs.) Banks actually do pretty well keeping track of financial data.

Now having said that, as I look at my credit card's online statement, I see several days of Avis car rental charges for a vehicle that was picked up in San Diego and returned somewhere in Virginia. The problem is I didn't rent the car. Okay, so maybe a Google Bank wouldn't be such a bad idea after all.

An alternative to (1)

ricebowl (999467) | more than 6 years ago | (#22113638)

The Storage@Home thing that was mentioned, albeit possibly in the comments, a while back. I'm not sure, at all, whether or not the Folding@Home data is meant to be public domain but, were it so, then it'd be a preferable solution in part to using a p2p style storage alternative.

Of course the three terabyte limit might cause problems there.

All your base pairs are belong to us (0)

Anonymous Coward | more than 6 years ago | (#22114992)

:-) .....

Oh Really? (1)

cyanyde (976442) | more than 6 years ago | (#22115198)

I'd say the most useful part would be to find correlative information from disparate fields. The nice thing about a single repository with a single interface is that you can find ALL the data you may need to investigate an interesting hypothesis. Like my current senior thesis on Economic activity and it's correlation with water usage. It's attempting to bring two spatial data sets into a single framework. All the information is out there, but it's rare to find any published papers about it, let alone any standardize set of data to go off of. So right now i'm sitting with a bare minimum of information of economic indicators, because all the other data out there doesn't seem to be easy to find, access or get to the bottom of. I'm sick of finding PDF's, loaded with information, but no real way to get at it without alot of heavy lifting. This is I imagine what google's trying to fix. Taking already available data, and placing it in an easy to use and format it in the way you need it for your GIS/EXCEL/DATABASE/SCIENCEGRAPHER. Though, one should always note the correlative between knowledge and power, and absolute power and corruption.

Big deal ... terabytes are tiny these days. (1)

Dr. Zowie (109983) | more than 6 years ago | (#22115238)

Why, I made three terabytes in just 15 hours of solar observing last summer.

The Solar Dynamics Observatory, due to launch into geosynchronous orbit next summer, is a three petabyte mission.

how about (1)

towsonu2003 (928663) | more than 6 years ago | (#22118210)

do you think social sciences could benefit from this as well? -that is, if they can get over they fears of opening their data to others- And if yes, how?
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...