Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Freeing and Forgetting Data With Science Commons

Soulskill posted more than 5 years ago | from the bringing-it-all-together dept.

The Internet 114

blackbearnh writes "Scientific data can be both hard to get and expensive, even if your tax dollars paid for it. And if you do pay the big bucks to a publisher for access to a scientific paper, there's no assurance that you'll be able to read it, unless you've spent your life learning to decipher them. That's the argument that John Wilbanks makes in a recent interview on O'Reilly Radar, describing the problems that have led to the creation of the Science Commons project, which he heads. According to Wilbanks, scientific data should be easy to access, in common formats that make it easy to exchange, and free for use in research. He also wants to see standard licensing models for scientific patents, rather than the individually negotiated ones now that make research based on an existing patent so financially risky." Read on for the rest of blackbearnh's thoughts."Wilbanks also points of that as the volume of data grows from new projects like the LHC and the new high-resolution cameras that may generate petabytes a day, we'll need to get better at determining what data to keep and what to throw away. We have to figure out how to deal with preservation and federation because our libraries have been able to hold books for hundreds and hundreds and hundreds of years. But persistence on the web is trivial. Right? The assumption is well, if it's meaningful, it'll be in the Google cache or the internet archives. But from a memory perspective, what do we need to keep in science? What matters? Is it the raw data? Is it the processed data? Is it the software used to process the data? Is it the normalized data? Is it the software used to normalize the data? Is it the interpretation of the normalized data? Is it the software we use to interpret the normalization of the data? Is it the operating systems on which all of those ran? What about genome data?'"

Sorry! There are no comments related to the filter you selected.

Again with the IP (1, Insightful)

Anonymous Coward | more than 5 years ago | (#26938033)

Einstein said "If I have seen farther than most it is because I have stood on the shoulders of giants."
Where does that begin to apply in a society of lawyers, profiteers, and billion dollar industries based on exploiting shortsighted IP management?

Does it really matter? (0, Flamebait)

Anonymous Coward | more than 5 years ago | (#26938071)

After all, we are a nation of cowards.

Re:Again with the IP (4, Informative)

arctanx (1187415) | more than 5 years ago | (#26938111)

Actually, that was Sir Isaac Newton [wikipedia.org] having a dig at one of his enemies.

Re:Again with the IP (1, Informative)

Anonymous Coward | more than 5 years ago | (#26938513)

;Your faith in wikipedia is misplaced; it was both, actually.
Perhaps Sir I.N. was the first, so you do earn the proverbial "first quote"

Re:Again with the IP (3, Insightful)

wisty (1335733) | more than 5 years ago | (#26938767)

There is a rumor that Newton meant it as an insult to Hooke. Newton had refined DesCarte's wave theory, while Hooke had backed the corpuscul theory. Also, Hooke was a short man.

Re:Again with the IP (1)

HadouKen24 (989446) | more than 5 years ago | (#26939375)

Both Newton and Descartes were corpuscularians, actually.

Re:Again with the IP (1)

Hognoxious (631665) | more than 5 years ago | (#26940569)

Also, Hooke was a short man.

Now I see how he discovered his law - hanging weights on his feet to try and get taller.

I don't know! (2, Insightful)

blue l0g1c (1007517) | more than 5 years ago | (#26938047)

I was reading through the summary quickly and almost had a panic attack at the deluge of questions at the end. We get the point already!

Re:I don't know! (0)

TapeCutter (624760) | more than 5 years ago | (#26938345)

While I sympathise with any effort to make scientific data [ipcc-data.org] more accesible, the deluge of questions were long ago answered by the philosophy and method of science [wikipedia.org] , ie: train oneself to think critically.

1000 (0)

Anonymous Coward | more than 5 years ago | (#26938055)

Exactly 1000 more bytes? Wow!

Re:1000 (0)

Anonymous Coward | more than 5 years ago | (#26938729)

What's so amazing about 0.9765625 kilobytes?

Re:1000 (0)

Anonymous Coward | more than 5 years ago | (#26941117)

ITYM kibibyte.

What's most important to keep. (2, Insightful)

MoellerPlesset2 (1419023) | more than 5 years ago | (#26938085)

What's most important to keep is quite simple and obvious really:
The results. The published papers, etc.

It's an important and distinctive feature of Science that results are reproducible.

Re:What's most important to keep. (2, Insightful)

Anonymous Coward | more than 5 years ago | (#26938097)

How can the results be reproducible if you don't keep the original data?

Re:What's most important to keep. (4, Insightful)

MoellerPlesset2 (1419023) | more than 5 years ago | (#26938185)

How can the results be reproducible if you don't keep the original data?

The relevant results are supposed to be included in the paper, as well as the information necessary to reproduce the work. Most data doesn't fall into that category.

To make an analogy the computer geeks here can relate to: All you need to reproduce the output of a program is the source code and parameters. You don't need the executable, the program's debug log, the compilers object files, etc, etc.

The point is you want to reproduce the general result. You don't usually want to reproduce the exact same experiment with the exact same conditions. Supposedly you already know what happens then.

Re:What's most important to keep. (5, Insightful)

mako1138 (837520) | more than 5 years ago | (#26938275)

Let's say the LHC publishes its analysis, and then throws away the data. What happens when five years later it's discovered that a flawed assumption was used in the analysis? Are we going to build another LHC any time soon, to verify the result?

For a billion-dollar experiment like the LHC, that dataset is the prize. The dataset is the whole reason the LHC was built. Physicists will be combing the data for rare events and odd occurrences, many years down the road.

Re:What's most important to keep. (0, Offtopic)

Sanat (702) | more than 5 years ago | (#26938391)

Mod up this important position please.

Re:What's most important to keep. (2, Insightful)

MoellerPlesset2 (1419023) | more than 5 years ago | (#26938395)

Let's say the LHC publishes its analysis [..]

Let's stop right there. There are no general lessons to be had from the LHC. It's an exception, not the rule.
First: 99.9% of scientists are not working at LHC, or any other billion dollar, world-unique facility. They are working in ordinary labs, with ordinary equipment that's identical or similar to equipment in hundreds of other labs around the world.
Second: Primary data, actual measurement results, are already kept, as a rule.
Third: The vast majority of experiments are never ever reproduced to begin with. You're lucky enough to get cited, really. Most papers don't even get cited apart from by those who wrote them.
Fourth: Very little science is done by re-interpreting existing results. That only applies to the unique cases where the actual experiment can't be reproduced easily.

What happens when five years later it's discovered that a flawed assumption was used in the analysis? Are we going to build another LHC any time soon, to verify the result?

Truth is, you'd still have to rebuild the LHC then, because you didn't test your 'corrected' assumption against the actual machine to show that your 'corrected' results are valid. Until the actual experiment is re-done it'll remain an unanswered question.

Re:What's most important to keep. (4, Insightful)

oneiros27 (46144) | more than 5 years ago | (#26938547)

Let's stop right there. There are no general lessons to be had from the LHC. It's an exception, not the rule. First: 99.9% of scientists are not working at LHC, or any other billion dollar, world-unique facility. They are working in ordinary labs, with ordinary equipment that's identical or similar to equipment in hundreds of other labs around the world.

There are two types of science. What you're referring to is called 'Little Science' (not to be derogatory), but it's the type of thing that a small lab can do, with a reasonable amount of funding. And then there's what we call "Big Science" like the LHC, Hubble Space Telecope, Arecibo Observatory, Large Synoptic Space Telescope, etc.

Second: Primary data, actual measurement results, are already kept, as a rule.

I wish. Well, okay, it might be kept, but the question is by who, and have they put it somewhere that people can analyze it?

I was at the AGU last year, and there was someone from a solar observatory that I wasn't familiar with. As I do work for the Virtual Solar Observatory, I asked them if we could put up a web service to connect their repository to our federated search. They told me there was no repository for the observatory -- the data walks out the door with whoever the observer was.

Then there's the issue of trying to to tell from the published research exactly what the original data was. But then, I've been harping on the need for data citation for years now ... it's an issue that's starting to get noticed.

Third: The vast majority of experiments are never ever reproduced to begin with. You're lucky enough to get cited, really. Most papers don't even get cited apart from by those who wrote them.

For the type of data that I deal with, none of it is technically reproducible, because it's observations, not experiments. And that's precisely why it's important to save the data.

Fourth: Very little science is done by re-interpreting existing results. That only applies to the unique cases where the actual experiment can't be reproduced easily.

In your field, maybe. But we have folks who try to design systems to predict when events are going to happen and need training data. Others do long-term statistical analysis with years or decades of data at a time. Still others find a strange feature that hadn't previously been identified as important (eg, coronal dimmings) and want to go back through all of the data to try to identify other occurrences.

Re:What's most important to keep. (5, Interesting)

Mr Z (6791) | more than 5 years ago | (#26938445)

With a large and expensive dataset that can be mined many ways, yes, it makes sense to keep the raw data. This is actually pretty similar to the raw datasets that various online providers have published over the years for researchers to datamine. (AOL and Netflix come to mind.) Those data sets are large and hard to reproduce, and lend themselves to multiple experiments.

But, there are other experiments where the experiment is larger than the data, and so keeping the raw data isn't quite so important as documenting the technique and conclusions. The Michelson-Morley interferometer experiments (to detect the 'ether'), the Millikan oil-drop experiment (which demonstrated quantized charges)... for both of these the experiment and technique were larger than the data, so the data collected doesn't matter so much.

Thus, there's no simple "one size fits all" answer.

When it comes to these ginormous data sets that were collected in the absence of any particular experiment or as the side effect of some experiment, their continued existence and maintenance is predicated on future parties constructing and executing experiments against the data. This is where your LHC comment fits.

Re:What's most important to keep. (0)

Anonymous Coward | more than 5 years ago | (#26940087)

Being that the LHC records on the order of Terabytes per second, it doesn't under any circumstance have a means of retaining all the data.

Re:What's most important to keep. (2, Interesting)

Patch86 (1465427) | more than 5 years ago | (#26944089)

5 Insightful?

Seriously, read the OP again.

"What's most important to keep is quite simple and obvious really: The results. The published papers, etc."

He never suggested you throw out the results. No-one is going to throw out the results. Why would anybody throw out the results? Whichever body owns the equipment is bound to keep the results indefinitely, any papers they publish will include the results data (and be kept by the publishers), and copies will end up in all manner of libraries and file servers, duplicated all over the place.

The most important things to keep from any experiment is 1) the results (no point in doing it if you don't keep the results) and 2) the methodology (if they don't know how you got the data, it's worthless). What you could throw away without too much harm is the analysis and interpretations, since you can always reanalyze and reinterpret (and any interpretations made now may prove wrong in the future anyhow). Even then, anything interesting is likely to be kept in the grand scheme of things anyway.

The place which TFA is actually talking about is less dramatic, lower budget science. Its still important (it's the bread and butter of science and technology), but will be found in the vaults of far fewer publishers, libraries and web servers. And it's lower budget science where it's far easier to reproduce results, as in GP.

Re:What's most important to keep. (2, Insightful)

mako1138 (837520) | more than 5 years ago | (#26944505)

You seem to be using "results" in a wider sense than "published papers". Yes, nobody is going to throw out papers. But the raw data from instruments? It is not clear whether those will be kept.

You say that the analysis and interpretations can be thrown out, but those portions are precisely what go into published papers. And for small-scale science, it makes little sense to throw away anything at all.

Re:What's most important to keep. (2, Interesting)

TapeCutter (624760) | more than 5 years ago | (#26938429)

"You don't usually want to reproduce the exact same experiment with the exact same conditions."

That's right I want an independent "someone else" to do that in order to make my original result more robust. If I were an acedemic I would rely on post-grads to take up that challenge, if they find a discrepency all the better since you now have another question! To continue your software development analogy - you don't want the developer to be the ONLY tester.

Re:What's most important to keep. (3, Interesting)

repepo (1098227) | more than 5 years ago | (#26938191)

It is a basic assumption in science that given some set of conditions (or causes) you get the same effect. For this to happen it is important to properly record how to setup the conditions. This is the kind of things that scientific papers describe (in principle at least!).

Re:What's most important to keep. (1)

digitalunity (19107) | more than 5 years ago | (#26943129)

Maybe you haven't noticed, but quantum mechanics seems to indicate there is not always one outcome for one set of conditions. This works on the macro scale, but not necessarily always on the subatomic level.

Re:What's most important to keep. (2, Informative)

Rockoon (1252108) | more than 5 years ago | (#26940227)

On the subject of reproducibility, I am reminded of a situation with Wei-Chyung Wang, a climate scientist.

He was involved in the paper Jones et al (1990), which is where the situation begins.

After *17 YEARS* of requests, Jones FINALLY released some of the data used in Jones 1990 through demands under the terms of the U.K. Freedom of Information policy on publicly funded research.

Wang himself is free from FOI requests because Wang is an American and operates in America, where FOI requests regarding publicaly funded studies have no legal weight.

The result of the eventual discloser of Jones, is that several researches have concluded that Wang fabricated research steps. That some of the steps could not have been performed, then or even now, and that for many of the climate stations used in his work the existing station histories directly contradict Wang's stated assessments about his data set.

Specifically he claimed that only a few of these recording stations had been moved during the time-frame significant to the research, and that they were free from significant urbanization changes (the research was to measure the "Urban Heat Island" (UHI) Effect.) In short, Wang claimed that the stations histories showed that they were largely "homogeneous."

According to the DOE CAS study, in regards to the quality of Wang's other station data, "details regarding instrumentation, collection methods, changes in station location or observing times are not known." The CAS bills itself as the most comprehensive history of Chinese climate available to date. Note that Wang actualy cited the CAS as one of the sources for his data.

Essentialy both Wang et al 1990 and Jones et al 1990 were fradulent pieces of work that was never independently verified, and could not have been verified given both the straight out fraud and the failure to disclose the data set used.

(Jones denies knowledge of Wangs fabrication of data.)

Sparked by this controversy, new research specifically addressing the UHI based on the Chinese climate record paints an entirely different picture with regards to China, that the effect is in fact much more significant that concluded by Jones et al 1990.

FULL DATA DISCLOSER IS NEEDED.

This is especialy true in some areas of science, where all the big players not only know each other, BUT WORK, PUBLISH, AND PEER REVIEW TOGETHER.

One specific small group of people is directly influencing global policies regarding climate change through their direct involvement with the IPCC, all the while hiding their own work and obstructing validation of their work.

Re:What's most important to keep. (2, Informative)

jschen (1249578) | more than 5 years ago | (#26942073)

How can the results be reproducible if you don't keep the original data?

As others noted, there are cases where raw data is king, and others where raw data is virtually useless. LHC raw data will be invaluable. Raw data from genetic sequencing is a waste of time to keep. Why store huge graphics files when the only thing we will ever want from them is the sequence of a few letters? One must be able to distinguish between these two possibilities (and more subtle, less black and white cases, too), and there is no one size fits all solution.

That said, you may be surprised how well really valuable data is stored by good principal investigators. I recently helped my PI re-digitize a prized result from 1988 (showing the first example of a synthetic enediyne compound cleaving DNA). The journal did not do a good job of scanning it, and it therefore was hard to interpret in the printed journal. So we dug up the original raw data (the original UV photograph of the DNA gel showing this result), which had been carefully filed away in our offsite storage location all these years, and re-digitized the image for a recent review article.

Re:What's most important to keep. (1)

Al Kossow (460144) | more than 5 years ago | (#26938125)

What's most important to keep is quite simple and obvious really:
The results. The published papers, etc.

It's an important and distinctive feature of Science that results are reproducible.

At what cost? Would you suggest discarding the data sets of nuclear bomb detonations since they are easily reproduced? How about other data sets that may need to be reinterpreted because of errors in the original processing?

Re:What's most important to keep. (2, Interesting)

MoellerPlesset2 (1419023) | more than 5 years ago | (#26938239)

At what cost? Would you suggest discarding the data sets of nuclear bomb detonations since they are easily reproduced?

Nobody said results are easily reproduced. But a-bomb tests are hardly representative of the vast majority of scientific results out there.

How about other data sets that may need to be reinterpreted because of errors in the original processing?

That's a scenario that only applies when the test is difficult to reproduce, and the results are limited by processing power rather than measurement accuracy. That's a relatively unusual scenario, since, first: Most experiments are easier to reproduce than that second: methods and measurements improve over time. The much more common scenario is that it's more efficient to simply re-do the experiment with modern equipment and get both more accurate measurements as well as better processing.

not results- grant dollars (1, Insightful)

SuperBanana (662181) | more than 5 years ago | (#26938669)

The results. The published papers, etc. It's an important and distinctive feature of Science that results are reproducible.

Having worked around academic groups that do medical research for three years now, I can tell you that is absolutely not what drives research.

Researchers will love to tell you about how it is the quest for knowledge and other pie-in-the-sky ideals, but when it comes down to it- it's mostly about making a living (or more than a living), and fame/prestige.

See, journals have what's called an "impact factor." An impact factor is how many times an article in a particular journal ends up being cited by other papers. In one lab I worked at, it was closely tracked who was published where, and how many times.

At the end of the year, when it came time to decide who went and who stayed, the scores were lined up and however many people needed to go came from the bottom. The top ones get a little closer to becoming a PI (Principle Investigator, aka someone who has postdocs and grad students working for them.)

PIs, all the people you read about in the paper- they survived the process, but they're now nothing more than management. They don't do lab work, they don't do research. They solicit ideas from their postdocs, put the final polish on a grant proposal the postdoc slaved over, and get big fat checks from NIH for millions of dollars. The PIs then pass the work down to postdocs, who dole it out to grad students. The grad students do it because a PhD is dangled in front of them while they run on the treadmill of endless, monotonous, repetitive lab work and analysis work. The postdocs do it because faculty positions and PI slots are dangled in front of them.

The problem with "the system" is that nobody is rewarded for reaching that brass ring. Just like Ford has no incentive to build a very durable car (no service/parts sales after the vehicle hits the end of the warranty, and the market quickly becomes saturated) researchers have no incentive to completely solve issues facing us today; their incentive is to come close enough to say "aha, look, we did find SOMETHING, so your grant money wasn't wasted."

What incentive does a massive industry have to solve cancer, when it would put them out of business? Tens of thousands of people have dedicated most of their adult lives, usually to studying specific mechanisms and biological functions so narrow that if cancer were cured tomorrow, they would be useless- their training and knowledge is so focused, so narrow- they cannot compete with the existing population of researchers in other biomedical fields. Journals which charge big bucks for subscriptions also would be useless. Billions of dollars of materials, equipment, supplies, chemicals- gone. "Centers", hospitals, colleges, universities which each rake in hundreds of millions of dollars in private, government, and non-profit sourced money would be useless.

Re:not results- grant dollars (4, Insightful)

smallfries (601545) | more than 5 years ago | (#26939781)

What incentive does a massive industry have to solve cancer, when it would put them out of business? Tens of thousands of people have dedicated most of their adult lives, usually to studying specific mechanisms and biological functions so narrow that if cancer were cured tomorrow, they would be useless- their training and knowledge is so focused, so narrow- they cannot compete with the existing population of researchers in other biomedical fields. Journals which charge big bucks for subscriptions also would be useless. Billions of dollars of materials, equipment, supplies, chemicals- gone. "Centers", hospitals, colleges, universities which each rake in hundreds of millions of dollars in private, government, and non-profit sourced money would be useless.

That's an old argument and although it sounds reasonable it is completely unsound. An industry does not function as a single cohesive entity with wants and desires. It is composed of many different individuals with their own wants and desires.

I know enough academics to say for certain that if any one of those individuals could discover a cure that would put their entire employer out of business then they would leap at the chance. The fame that would follow would make another job easy enough to get, and the recognition is what they're really in it for anyway.

Re:not results- grant dollars (2, Informative)

Bowling Moses (591924) | more than 5 years ago | (#26942403)

I've been doing research in the biological sciences for 12 years now, including some work that was at least tangentially related to human health. I am not in it for the paycheck--if that's all I wanted, my friends and I joke that we'd go to KFC School of Business Management and be assistant managers at fast food restaurants making more than we do in science. I, and the majority of the people I know, don't want to be professors either. It's extremely rare for a professor to actually do any lab work themselves, but if you ask they'll tell you they miss it. Besides there are 300 people applying for each professorship at a decent university. Then if you are unlucky enough to get the job, you have to successfully fight in a viciously competitive funding environment to get tenure and not lose your mind or your liver in the process. It's actually hard enough to keep a job in academic science, period. My boss and I are applying for grants. Hers are in part to keep my position funded, she's got one out and is writing a second. I've got one out, and am applying for two or possibly three more. Contrary to what you wrote, my grants are largely my ideas and my writing, and should I get funded is my money, not the boss's. However science funding is so obscenely bad (most grants have ~5% success rate, the best one I'm applying for has ~25%) that I'm also going to look for a new job, with the boss's full knowledge and support, even though we'd both very much like me to stick around for another couple years and get our proposed butt kicking science done.

So why do it if there's nothing but nonstop stress, Burger King assistant manager pay, and institutionalized job insecurity? I get to solve problems. I get to figure things out. I get to do things (sometimes, not often, but sometimes) that nobody has ever done before, see things nobody else has ever seen before. Work in a small way on projects that could impact millions of people's lives. I'll never be famous, which is fine with me. I'll never be rich, which, well, I can tolerate. I might not ever have job security...which okay, I'll admit is seriously grinding down my enthusiasm and idealism. But the things I've gotten to do--even paid a pittance to do--I wouldn't trade. Catching jellyfish off the docks in Oregon. Turned loose on a billion dollar synchrotron, unsupervised at 3 am to understand how an enzyme known to be a virulence factor in several diseases functions at an atomic level. Making radioactively labeled mosquitoes to understand lipid trafficking, working with cell culture (It's a cell from an insect's midgut...that under laboratory conditions can endlessly propagate itself. How cool! And here's my what I'm going to do with it...), genetically engineering fluorescent organisms, using high-throughput screening to find new drug lead compounds. A lot of hard work, but sometimes that's damn good fun. Plus along the way you get to understand phenomena on a level that most people don't even know exists. I'm of course not claiming god-king knowledge here, but I could spend a long time talking about the terrible beauty of host:pathogen and vector:pathogen relationships for example, or protein structure, or anything else I've studied a while, just like any other scientist. That's fun too, although not cool in most of society. But my mom still thinks I'm cool. Ok, no, she doesn't.

If you expect to get rich and famous doing science, no wonder your post seems bitter. It isn't going to happen and isn't a right reason to do science in the first place. Those pie-in-the-sky ideals are.

eh (1, Informative)

Anonymous Coward | more than 5 years ago | (#26938161)

That's not true. Any tax funded study requires more documentation and publication then a private one. Anyone who reads them knows.
All studies worth anything are aimed at a audience proficient in the subject, they are not meant for general audiences, and are often proven wrong, you need repeatable results.

And the scientists goes mooo! (1)

Ostracus (1354233) | more than 5 years ago | (#26938221)

"And if you do pay the big bucks to a publisher for access to a scientific paper, there's no assurance that you'll be able to read it, unless you've spent your life learning to decipher them. "

I predict the dumbing down of science.

Re:And the scientists goes mooo! (2, Interesting)

Vectronic (1221470) | more than 5 years ago | (#26938265)

Although likely, not necessarily...

I'd be happy with a Wiki-Style, where the actual article can be as complex (in the know) as desired, but with a glossary of sorts.

There are geniuses of all sorts, someone might be completely lost trying to understand it linguistically, but may find a fault in it instantly visually, or audibly.

However that is somewhat redundant, as the original (as it is now) can be converted into that by people, but a mandate saying it must contain X, Y and Z, will open it up to more people, quicker.

Re:And the scientists goes mooo! (2, Interesting)

Fallingcow (213461) | more than 5 years ago | (#26938877)

I'd be happy with a Wiki-Style, where the actual article can be as complex (in the know) as desired, but with a glossary of sorts.

Don't count on that being at all helpful.

Take the math articles on Wikipedia: I can read one about a topic I already understand and have no idea what the hell their talking about in entire sections. It's 100% useless for learning new material in that field, even if it's not far beyond your current level of understanding. Good luck if you start on an article far down a branch of mathematics--assuming they bother to tell you the source of the notation in that article, it'll take you a half-dozen more articles to find anything that sort-of translates some of it for you.

Some sort of mouseover tool-tip hint thing or a simple glossary is all I ask, but I think the people writing that stuff don't even realize how opaque it is to people who majored in something other than math.

Re:And the scientists goes mooo! (2, Insightful)

wisty (1335733) | more than 5 years ago | (#26938775)

Why should science be more complex than necessary? For every String Theory area (where complexity is unavoidable) there are plenty of theories like economics, which just rely on weird jargon to fence out the interlopers.

Re:And the scientists goes mooo! (1)

Repossessed (1117929) | more than 5 years ago | (#26940181)

Or the scientists just stop writing in third person passive, and start writing in a manner people outside of the scientific community are used to. Though I think the summary refers more to trying to extract data you do understand from complicated papers that talk a lot about things you neither understand nor care about.

What? Nobody has ever read... (1)

NotQuiteReal (608241) | more than 5 years ago | (#26938243)

Has nobody ever read The tragedy of the commons? [wikipedia.org]

However, in the case of the non-physical, I guess noone can "waste" or "steal" it, only copy and use.

Re:What? Nobody has ever read... (1, Funny)

Anonymous Coward | more than 5 years ago | (#26938331)

Has nobody ever read The tragedy of the commons?

Nope, can't afford the fees.

Re:What? Nobody has ever read... (1)

Fallingcow (213461) | more than 5 years ago | (#26938505)

I'm quite familiar with it, and I'm not seeing the connection.

Help?

Re:What? Nobody has ever read... (1)

TapeCutter (624760) | more than 5 years ago | (#26938647)

I'm not sure what your point is but I don't see libraries turning to dust because nobody cares.

Re:What? Nobody has ever read... (1)

drinkypoo (153816) | more than 5 years ago | (#26940695)

I'm not sure what your point is but I don't see libraries turning to dust because nobody cares.

Libraries are closing because nobody cares.

Re:What? Nobody has ever read... (1)

TapeCutter (624760) | more than 5 years ago | (#26941005)

Cite please.

Re:What? Nobody has ever read... (1)

drinkypoo (153816) | more than 5 years ago | (#26941189)

They saved Salinas libraries [latimes.com] , but look into the story: Since 2002, cuts in library funding have approached $100 million around the country, with more than 2,100 jobs eliminated and 31 libraries closed, according to the American Library Assn.

One format to gouvern them all (1)

ProfMobius (1313701) | more than 5 years ago | (#26938441)

I have been waiting all my life to see how my simulation data would look in Excel. And everyone is supposed to have it (damn you linux users ! damn you people with not enough money to buy it !).

On a more serious note, a common ground for data format would be nice. You already have some generic formats, like HDF5 and other, but i must admit right now, it is a bit of a jungle in the astrophysic department, and it is not going to change anytime soon (unless someone make a awesome generic, one-fit-all library in... Fortran77...).

Re:One format to gouvern them all (1)

Logic Worshiper (1480539) | more than 5 years ago | (#26938797)

Linux reads Excel files in open office, so .xls is pretty universal; one can also install crossover office or use wine to install office (though I don't know how well that works).

The format is the least important issue (1)

EmbeddedJanitor (597831) | more than 5 years ago | (#26938891)

It is an almost trivial exercise to convert one format to another.

What is a lot harder is knowing how the data sets were measured and whether it is valid to combine them with data sets measured in other ways.

At least half the Global Warming bun-fight is about the validity of comparison between different data sets and the same goes for pretty much any non-trivial data sets.

What's the goal, really? (4, Insightful)

Rostin (691447) | more than 5 years ago | (#26938443)

I'm a working scientist (ok, PhD student), so I read journal articles pretty often. I can understand the rub in principle, but let's say that we come up with some way for all scientific data to be freely shared. So what? In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists. Could someone explain to me why this is a real problem and not just something that people with too much time on their hands (and who would never actually read, let alone understand, real research results) get worked up about?

It reminds me of the XKCD this morning...

Re:What's the goal, really? (2, Interesting)

onionlee (836083) | more than 5 years ago | (#26938557)

agreed. most sciences that have been around for a long time and have developed their own specializations within them, such as physics, have specific journals that target their "demographics" (such as the journal of applied physics a, b, c, d, letters). anything outside of those journals most likely have been rejected by those journals and are irrelevant. furthermore, the relatively young sciences such as linguistics use (what i personally think is lame) a system of keywords so that anyone can easily find articles that theyre interested in. truly, i have yet to find any researcher who has complained about this "problem".

Re:What's the goal, really? (1, Insightful)

TapeCutter (624760) | more than 5 years ago | (#26938589)

"I'm a working scientist (ok, PhD student), so I read journal articles pretty often."

And how would you read them if your institution did not foot the bill for subscriptions?

"In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists."

When you amalgamate "almost all cases" you end up with "almost all publications". The rest of your post smacks of elitisim, trivializes scientific curiosity and completely ignores the social and scientific impact of radical improvements in communicating knowledge [google.com.au] .

I would have thought working scientists [realclimate.org] would actually be proud of their work and want to diseminate it to the largest audience possible but in your case I'm obviously mistaken.

Re:What's the goal, really? (1)

robbyjo (315601) | more than 5 years ago | (#26938725)

To be honest, if your institution does not foot the bill for subscription, try inter-library loans. That's easy. Most credible institutions in the US do have some subscription for more mainstream journals. Unless you're in third world countries.

The problem with scientific publication is that you need to be terse. They're limited to 8-12 pages. If you are required to spend time for background knowledge for the uninitiated, you'll produce a 1000 page book instead. Moreover, the reviewers will think that you spend your time too much on things that are assumed to be familiar to the intended audience.

Face it: The knowledge we have so far is the agglomeration of previous knowledge. Those who are in the cutting-edge are expected to know the background already. Try explaining measure theory to high school students that has no idea what calculus is. And then if your research has anything to do with measure theory, your result is pretty much unreachable to those students. Let alone any data that correspond to that research.

Scientists want to have their work known. But they don't have the patience and 5 years to explain to aspiring noobs. Sorry. They have a lot more research to do. If you want to know the research, do your homework and study the subject carefully for a few years. Then you'll appreciate whatever data or paper the scientists are publishing.

Re:What's the goal, really? (1)

TapeCutter (624760) | more than 5 years ago | (#26939073)

"If you are required..."

I don't think anyone in TFA is seriously suggesting that hand holding noobs be a requirement for publication and this is probably where the confusion sets in. I also understand that you may want to keep your own data close to your chest until you have extracted a paper out of it (ie: publish or perish).

"To be honest, if your institution does not foot the bill for subscription, try inter-library loans...[snip]...The problem with scientific publication is that you need to be terse. They're limited to 8-12 pages."

Einstein managed to get away with three elegant pages [fourmilab.ch] and zero refrences, chasing down the english translation in that link took a couple of minutes. I'm interested in quality not quantity, I would be delighted with the 8-12 pages at my finger-tips because like most educated laymen I do not have "too much time on my hands". The internet and afformentioned lack of time is the reason I have not set foot in a library for almost a decade and the last time I studied/taught at a tertiary institution was quite possibly before you were born...

"If you want to know the research, do your homework and study the subject carefully for a few years. Then you'll appreciate whatever data or paper the scientists are publishing.

Precisely why I chose to use the folk at realclimate as an example, following the science for 25+yrs does not make me a climatologist but it has given me a deep understanding of what they are banging on about.

Re:What's the goal, really? (1)

robbyjo (315601) | more than 5 years ago | (#26942323)

Einstein managed to get away with three elegant pages and zero refrences

Science has evolved much from 1905. Even with his zero references, he's still implicitly citing the results of Lorentz. By today's standard, no citation like that is unacceptable.

Let me ask you this: Can you honestly ask a high school student or a freshman to understand even that paper without grasping the concepts of differential equation (DE)? They can't. Sure, you can understand the motivation and introduction of that paper, just like those of typical scientific papers. But when you start to delve into the formulas, i.e. what the "meat" is all about, you suddenly need to know everything involved much beyond words explained by the scientists. I have no backgrounds in physics and I can't even follow the derivations of the formula in section I.3 of that paper although I know DE. In other words, I'm lost at section I.3 and I cannot see how Einstein arrived at his conclusions. Maybe a little knowledge of physics will help. There are some baseline knowledge you'd expect your audience to know. You can't explain everything.

Let's face it: English is an ambiguous medium of transfer for scientific knowledge. Mathematical formulas are far more succinct and far less ambiguous. If you think you can sidestep the formula part of the paper, you're dreaming. You might be better off reading popular science magazines.

The folks at RealClimate are just commenting on their results, not real papers. This sort of writing is more of popsci magazine style. They're glossing over way much on how they arrive at their results. To some degree, it's useful. But to me, the gory details is of more importance because only then can I know their assumptions and theoretical limitations on the underlying assumptions or formulas and how to further advance the knowledge or make the estimates more precise.

Scientists do not take other scientists' words at face values and neither should laymen. Given the climate crisis pro-contra, I think reading just the research comments will add to the confusion to the public minds. Or worse, creating camps. We don't want that to happen. So, I think it's wise for the public to read far beyond pop sci writings.

noobs? (0)

Anonymous Coward | more than 5 years ago | (#26939551)

But they don't have the patience and 5 years to explain to aspiring noobs.

I guess you don't have PhD students then? You should try one - mine make me think hard about things I thought I already knew.

Re:What's the goal, really? (0)

Anonymous Coward | more than 5 years ago | (#26939763)

I don't even know where to start.. your comment is just ignorant and assumes that everyone that wants to access scientific publications/knowledge is in academia or is supported (in some way) by an institution. Of course, if you are not in academia (and in a first world country), you don't deserve knowledge. *rolls eyes*

You seem to be the prototypical "ivory tower scientist" who doesn't care/understand much of science outside your field and so you assume that others are the same.

Yes, knowledge is built upon previous knowledge and "most" scientific fields will seem esoteric to an outsider, but just because you "have no patience and 5 years to explain to aspiring noobs" doesn't mean that they can't understand it better than you do (given enough time and information).

It would be nice if information that is often (almost always) charged to the taxpayer would be freely acessible to those who want it (regardless of their status in society).

Oh, and of course, this is all hypothetical stuff.. we don't really want _YOUR_ publications in Science Commons. prick.

Re:What's the goal, really? (1)

Hognoxious (631665) | more than 5 years ago | (#26940673)

just because you "have no patience and 5 years to explain to aspiring noobs" doesn't mean that they can't understand it better than you do (given enough time and information).

So your average high school student can understand almost any science paper, if you just wait for him to get a degree, PhD and ten years postdoctoral experience in the relevant field?

Re:What's the goal, really? (1)

Midnight Thunder (17205) | more than 5 years ago | (#26942875)

To be honest, if your institution does not foot the bill for subscription, try inter-library loans. That's easy. Most credible institutions in the US do have some subscription for more mainstream journals. Unless you're in third world countries.

Anything that complicates the retrieval of knowledge ends up reducing access to that knowledge. Why should someone have to put up with manual process, when we have this things called the internet. The internet is designed to facilitate access of knowledge, so it is the tool of choice.

While some readers of papers may not understand the content fully, it is sometimes enough to start the quest of understanding. Science suffers from a lack of people entering the field, so anything that can make it easier to access knowledge makes the idea of entering less daunting. In many ways this can be seen as part of the PR process.

The other way of approaching the issue, is simply asking why journals should be the only ones allowed to publish the information? They aren't paying anyone for the content, yet they are requiring a monopoly of the publishing of the given paper.

Journals have long had the role of being the only providers of papers to the community and see their position in jeopardy. I think this new source of competition is the chance for them to see where they can matter. In my opinion journals can matter by being the filter, where the 'best' papers are published.

Re:What's the goal, really? (3, Insightful)

Beetle B. (516615) | more than 5 years ago | (#26938837)

Typical comments from someone in the first world.

First, just on the side, I know lots of people who got PhD's but did not really stay in research and academia. They still want to read papers, though, as they still maintain an interest.

But the main benefit of opening up journal papers is for the rest of the world to benefit. Yes, if you have a very narrow perspective, you could just dismiss that as charity. If you're open minded, you'll realize that shutting out most of the world to scientific output means much less science globally, and much less benefits to you as a result.

Imagine if all researchers in Japan published papers only in Japanese, and the journals had a copyright condition that prevented the content from ever being translated to another language, and you'll see what I mean. Whereas current journals require a lot of money for access, these ones also have a price: Just learn Japanese. It's not exactly promoting science.

Then again, of course, journals do need a base amount of money to operate. Just that Elsevier kind of companies charge so much more than is needed to make a profit.

Re:What's the goal, really? (1)

Logic Worshiper (1480539) | more than 5 years ago | (#26938999)

I know people who'd have an easier time learning Japanese than C++, does that mean we should write computer code in English? The same thing applies to science. Non-technical science isn't science, so when scientists publish something for each other to read, they publish it in their language. There are people who translate it back to English, such as teachers and writers, and those of us who don't have the background to compile science code in our minds need to find the binary version or learn the language of science, not complain we don't understand code.

Re:What's the goal, really? (1)

ceoyoyo (59147) | more than 5 years ago | (#26942705)

Sucks to live in the developing world and be told that if you want to publish your results it's $1000 a paper.

Re:What's the goal, really? (1)

martin-boundary (547041) | more than 5 years ago | (#26938975)

The real problem is hoarding knowledge, which over time leads to elitism, then guilds, and finally priesthoods. The fix is literally trivial: open access to electronic publications for everybody, ie bypass all the elaborate subscription checks. This isn't rocket science. The only thing stopping it from happening are greedy publishing companies who like the status quo.

You're right that every single modern scientific publication has a very small intended readership, yet the argument for opening up everything is the same as the argument for having well stocked libraries.

Consider a library which monitors you whenever you enter it, and prevents you from reading or perusing any and all books, except for exactly five books in your immediate area of expertise which you are allowed to borrow or read, say. That is pretty much the current situation with electronic archives of scientific journals. If you don't see anything fundamentally wrong with that, then you've wasted your life as a student.

Re:What's the goal, really? (2, Interesting)

Grym (725290) | more than 5 years ago | (#26938979)

In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists. Could someone explain to me why this is a real problem and not just something that people with too much time on their hands (and who would never actually read, let alone understand, real research results) get worked up about?

I'm a poor medical student, but a medical student with--quite frequently--interdisciplinary ideas. I can't tell you the number of times I have been interested in pursuing a subject for independent research and have been stymied or effectively stopped in my tracks because of my lack of ability to pay or lack of online access to experimental data and results. You might think that modern science is highly specialized and, for most bleeding-edge topics, you're probably right. In these cases, the affected researchers can all afford the one or two subscriptions they need to stay up to date. However, in overlapping areas, non-specialists (or specialists in other fields) might have have a unique perspective and possibly insightful findings to add. What harm could be done by letting them take a look?

Take for example one of my hair-brain ideas. There is a disease called Pellagra [wikipedia.org] , which is caused by diets deficient in certain amino acids. These amino acids are lacking in corn. In the United States, corn is, by far, the largest cash crop. Now, diets in the U.S. are varied enough to where modern Americans do not get Pellagra, but this isn't the case in developing nations, where Pellagra can sadly be endemic. So, my idea is this: why not introduce conservative substitutions [wikipedia.org] into the genetic sequence of the gene encoding the major structural protein of corn ( zein [wikipedia.org] ) in such a way as to make corn a (more) complete amino acid food source? By doing this, you'd be turning one of the world's most abundant and cheap foodstuffs into an effective cure for a common, debilitating disease.

Now, to me, as an outsider to Agriculture, this seems like a rather basic idea. I was convinced that someone had to have tried something similar to this. But you'd be surprised. I have yet to find a single paper that has ever attempted such a thing. Almost all of them focus on crop yields or the use of zein in commercial products. Now, maybe (for reasons unbeknown to me) my idea is untenable such that, people in the field have never given it a thought. But what if that isn't the case? What if the leaders of the field (or at least the emergent behavior of the scientists and scientific institutions) is pushing so hard in one direction that an obvious area for research or advancement was overlooked? Let's hope it's not the latter...

Regardless it's a travesty how petty scientific institutions are in this regard considering how often they talk to the public about high-minded ideals when extolling the virtues of public funding of Science. This information should be available to all: specialists and non-specialists alike.

-Grym

P.S. Oh yeah, and in case, any of you were wondering. Somebody already patented the general idea [wipo.int] described in my post. So don't get any wild ideas about trying to use it to help the poor, now! (/facepalm)

Re:What's the goal, really? (0)

Anonymous Coward | more than 5 years ago | (#26939613)

Go and read about what is happening in the bio + semantic web communities.

IE: http://www.slideshare.net/fbelleau/bio2rdf-presentation-at-www2007-hcls-workshop2

It is a real problem (too much data in too many formats); and the foundation technologies to solve it already exist - there is just a truckload of modelling to do and tool building to happen now.

This is just like open source - I personally don't know a thing about the hardware driver for my network card; but if the code is open, someone else can come along and look at it from the outside; maybe submit a patch or two; or point out interesting things I did not notice about it.

Re:What's the goal, really? (1)

lbbros (900904) | more than 5 years ago | (#26939759)

An example would be if the data could be re-analyzed or reviewed when new methods for looking at it or simply to try out new stuff. I work with high-throughput data (DNA microarrays) and about half of my work is applying my ideas to data that others have published, to validate an approach in an independent data set.

Some fields require access to the data more than others. In the case I'm talking about, you should take a look at the MIAME (Minimal Information About a Microarray Experiment) checklist [mged.org] published by the MGED society, and at the letter MGED sent to Science ("Standards for microarray data" by Ball et al., no link provided as you need a subscription to read it...) to urge adoption of such modus operandi.

Re:What's the goal, really? (3, Informative)

smallfries (601545) | more than 5 years ago | (#26939827)

Trickle-down. Dissemination of knowledge.

You don't know it yet (not meant as a jibe but it is something that clicks in after your PhD) but your primary function as a scientist is not to make discoveries. It is spreading knowledge. Sometimes that dissemination will occur in a narrow pool, through journal papers between specialists in that narrow pool of talent.

This is not the primary goal of science, although it can seem like it when you are slogging away at learning your first specialisation well enough to get your doctorate. Occasionally a wave from that little pool will splash over the side - maybe someone will write a literature review that is read by a specialist in another field. A new idea will be found - after all sometimes we know the result before we know the context that it will be applied to.

The pools get bigger as you move further downstream. Journal articles pass into conference publications, then into workshops. Less detail but carried through a wider audience. Then after a time, when the surface seems to have become still textbooks are written and the knowledge is passed on to another generation. We tend to stick around and help them find the experience to use it as well. This is why all PhD students have an advisor to point out the best swimming areas.

That was the long detailed answer to your question. The simple version is that you don't know who your target audience is yet. And limiting it to people in institutions that pay enormous access fees every year is not science. As a data-point - a lot of European institutes don't bother with IEEE fees. They run to about £50k/year which simply isn't worth it. As a consequence results published in IEEE venues are cited less in Europe. So even amongst the elite access walls have an effect.

Science, Truth. (0)

Anonymous Coward | more than 5 years ago | (#26941337)

We all benefit if policy is based on reality, rather than bad science or bad data. We all lose if our money is wasted based on bad science. And the policy should make everything public, as you don't know which data will affect you (and you might not be able to get the data you need for your project).

Recently outsiders have spotted bad data at Antarctica and arctic ice mistakes.

Re:What's the goal, really? (1)

radtea (464814) | more than 5 years ago | (#26942697)

Could someone explain to me why this is a real problem

I'm a physicist who runs a business that amongst other things does data analysis in the life sciences, mostly genomics [predictivepatterns.com] . In this area data collection is relatively expensive (hundreds or thousands of dollars per sample) and disease states are relatively generic--follicular lymphoma is pretty much the same regardless of whether you are in Kansas or Karachi.

I recently invented a new algorithm for combing gene expression data for patterns of expression that distinguish sample classes. There are two ways to get this algorithm applied to more data: one is to develop an application and hope that a few thousand researchers will bother downloading it and using it on their data. The other is for me to go out and find published datasets and apply the algorithm to them myself.

I'm pursuing both paths, but having seen how hard it is to get people to adopt new software, even stuff that's dead easy to use (which my application is). Furthermore, even with really good software, having an expert look at the data carefully is highly desirable. We have yet to find a way to fully automate good judgement, and a certain amount of judgement is required in any hard analysis problem.

So I'm figuring that the most use I'll get out of this algorithm it is applying it to other people's data myself. I've already done this on a public schizophrenia dataset with some success, although I'm still trying to figure out what to do with the results (as a scientist I'd like to see them used for the betterment of humanity, as a businessperson I'd like to somehow get paid at least a little for my contribution.)

The widespread publication of well-curated datasets is absolutely vital to getting the most value out of the considerable amount of money spent on collecting data of this kind. How we deal with rewarding the various contributors to any commercially useful discoveries that result is an ongoing problem that can be dealt with on a case-by-case basis for now.

Re:What's the goal, really? (1)

Scrameustache (459504) | more than 5 years ago | (#26943607)

I'm a working scientist (ok, PhD student), so I read journal articles pretty often. I can understand the rub in principle, but let's say that we come up with some way for all scientific data to be freely shared. So what? In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists. Could someone explain to me why this is a real problem and not just something that people with too much time on their hands (and who would never actually read, let alone understand, real research results) get worked up about?

Replace "scientific data" with "satellite imagery".
There's nothing to gain by letting anyone look at it? Only highly trained experts can decipher it?

People have found hidden forests, ancient ruins, and a few meteor impacts. You don't know what's to find in the data until you let people look.

The value of Data (1)

a-zA-Z0-9$_.+!*'(),x (1468865) | more than 5 years ago | (#26938455)

  1. It would be good to share, for a paper, both all the data as well as the conclusions on the web. Then the reasoning is easier to check. More importantly, other people can query the data differently to produce other conclusions without special requests.
  2. "Reproduceability" doesn't mean using the same data, it means using the same procedures on different data and seeing if the same conclusion is reached.
  3. If data were commonly provided as experiments are done, this would be of value to others even if the experimenters' paper(s) came later. In other words, as well as the market for articles, a market for data!

Re:The value of Data (1)

Logic Worshiper (1480539) | more than 5 years ago | (#26938967)

The data is reviewed by people qualified to review the data; that's what peer reviewed journals are for.

Is storage an issue? (2, Interesting)

blue l0g1c (1007517) | more than 5 years ago | (#26938489)

Data storage is something we've gotten very good at and we've made it very cheap. A Petabyte a day is not as staggering as it was even five years ago.

Re:Is storage an issue? (1, Interesting)

DerekLyons (302214) | more than 5 years ago | (#26938633)

Not as staggering as it was five years ago only means it is not as staggering as five years - not that it still isn't staggering. Especially when you consider a petabyte a day means 36.5 exabytes a year.

Re:Is storage an issue? (0)

Anonymous Coward | more than 5 years ago | (#26941075)

Do you mean 365 petabytes a year?

Re:Is storage an issue? (1)

blue l0g1c (1007517) | more than 5 years ago | (#26943991)

36.5 exabytes should be more than enough for anybody.

Re:Is storage an issue? (1)

dkf (304284) | more than 5 years ago | (#26939353)

Data storage is something we've gotten very good at and we've made it very cheap. A Petabyte a day is not as staggering as it was even five years ago.

It still has to be paid for. It still has to be actually stored. It still has to be backed up. It still has to be kept in formats that we can actually read. It still has to have knowledge about what it all means maintained. In short, it still has to be curated, kept in an online museum collection if you will. And this all costs, both in money and effort by knowledgeable people.

The problem doesn't stop with copying the data to a disk array.

Hello? Is there a fucking editor in the house ... (0)

Potor (658520) | more than 5 years ago | (#26938683)

Wilbanks also points of that as the volume of data grows from new projects...

I'm sorry, but that makes no sense. 'Points of'???? Come on.

Science is hard - news at 11 (2, Insightful)

jstott (212041) | more than 5 years ago | (#26938705)

And if you do pay the big bucks to a publisher for access to a scientific paper, there's no assurance that you'll be able to read it, unless you've spent your life learning to decipher them.

I know that this is a real shock to you humanities majors, but science is hard. And yes, for the record, I do have degrees in both [physics and philosophy, or will as of this May — and the physics was by far the harder of the two].

Here's another shocker. If you think the papers are hard to read, you should see the amount of work that went into processing the data until it's ready to be written up in an academic journal. Ol' Tom Edison wasn't joking when he said its "1% inspiration and 99% perspiration." If you think seeing the raw data is going to magically make everything clear, well, I'm sorry, the real world just doesn't work that way. Finally, if you think professional scientists are going to trust random data they downloaded off the web of unknown provenance, well, I'm sorry but that isn't going to happen either. I spend enough time fixing my own problems; I certainly don't have time to waste fixing other peoples' data for them.

-JS

Re:Science is hard - news at 11 (0)

Anonymous Coward | more than 5 years ago | (#26938939)

I agree. Scientists can't, and shouldn't be expected to, dumb down science so Joe Shome can understand it. Science is inherently technical; if that bothers people they just need accept the data analysis used to draw those conclusions is over their head, and trust the people who did it. Most people trust the makers of computer software understanding they don't know how to program; the same applies to science. You don't have to understand the source code to use open office, but the source code being available not only helps with the program's verifiability, and allows advanced users to change it, it also can be a teaching tool for beginners. It's not the programmers fault Joe Shome who can't plug a flash drive in doesn't understand the source code. The same is true with an open database for scientific research; it could be used to teach students, and interested members of the public, as well as to share data more easily. Of course it would have to be verifiable, not just any old website.

Re:Science is hard - news at 11 (1, Insightful)

Anonymous Coward | more than 5 years ago | (#26939049)

I fully agree.

Furthermore, I've read the entire, long interview and get the feeling this is a person looking for a problem. Yes, taxpayer-funded research should be freely available. Yes, we could all benefit from more freely available data. But he builds up a massive and poorly defined manifesto with very little meat around a few good points.

I'd love to have access to various data sets that I know exist, because others have published their results and described the data collection. But they likely invested multiple years of experimental work (and grant writing) to generate said data - so I see why they may be reluctant to hand it out to others. The solution must be based on giving credit where credit is due, yet this is precisely the problem: if I use someone else's experimental data, even old data, for a new analysis, how do I ensure they receive credit? My sense is that even if my paper clearly stated the generous source of the data, and cited the publications describing the original research, many readers would read past these bits and focus on my results only.

I write this from the point of view of someone who a) has data and would like to share it, b) spends most of his time painstakingly generating more experimental data, c) is a bit conflicted, because I'd feel bad to see someone else get credit for an analysis of my hard-earned data without some of that rubbing off on me.

As for the storage problem: there is none, in general. Certain special disciplines, yes, but those call for specific, appropriate solutions. The cost of storing most experimental data that I am aware of is completely dwarfed by, say, the cost of a single experiment's reagents, or the monthly health insurance of a single graduate student.

Finally, the suggestion given in the interview that scientists must come up with standard "ontologies", etc. is misguided at best. Every area of specialization already is, and long has, maintained an ongoing, iterative process whereby new terms are introduced, debated, used, and accepted. And yet, as methods and ideas change, sometimes new things don't fit into the established vocabulary: but it would be wrong to suppose (or require) a consensus procedure for such cases. It's science, it's original, it's creative, and every scientist feels some ownership for their ideas and their specific use of terminology. So let us have sort it out, we're not bad at that stuff. Besides, "standard ontologies" usually become out of date the moment they are "ratified" ... better to be flexible and unstructured. That's how Google sees the world's information, and I'm fine with that.

Re:Science is hard - news at 11 (0)

Anonymous Coward | more than 5 years ago | (#26940135)

"I spend enough time fixing my own problems; I certainly don't have time to waste fixing other peoples' data for them."

I'm sure encyclopedia editors said that as well: I spend enough time doing this at my work, who'd want to spend time actually making it for free. Then came Wikipedia.

Re:Science is hard - news at 11 (1)

ceoyoyo (59147) | more than 5 years ago | (#26942759)

Bravo.

There ARE big shared datasets, when it makes sense, from trustworthy sources. They tend to cost a lot to assemble, make available, and maintain. I'm starting a post doc at a new lab and they showed me one they're working on: the price tag was $40 million.

We also have a mechanism by which anybody can read scientific papers, for free, if they choose to put in a little effort. They're called libraries.

Yes, the journal publishers probably need to cut their prices now that nobody actually wants the printed version anymore. Yes, open access journals are an interesting idea. I'm currently submitting a paper to one, and writing a chapter to an open access textbook. BUT, they can't completely take over or you shut out poorer research groups who can't afford to pay to have their papers published.

Is it? (0)

Anonymous Coward | more than 5 years ago | (#26938735)

Is it that I'm posting on slashdot? Is it that we all read this article? Is it that your read this comment? Is it that you read this comment on slashdot? Is it that you read slashdot which had this article which had this comment? Or is it?

Re:Is it? (0)

Anonymous Coward | more than 5 years ago | (#26939541)

"Gucci by Gucci - Pour Homme"

Euclids... (1)

gmuslera (3436) | more than 5 years ago | (#26938737)

said once to a king "there is no royal road to geometry". The nature of some things is in fact complex and there is no easy and accurate at the same time way to represent that.

Is a science or religion goal that the universe is made in such way that should be easy to explain it to humans?

Re:Euclids... (1)

metageek (466836) | more than 5 years ago | (#26939805)

Very good point.

I never bought into Occam's razor. I think it shaves off important stuff all too often.

Doesn't work yet: (0)

Anonymous Coward | more than 5 years ago | (#26938755)

Tried to use in recent paper, and got this reply:

"IEEE have advised that they are unable to accept
the Science Commons license at this time.

If you want your paper to be published, you will
need to sign off a plain IEEE copyright form
and scan/email it to me."

Re:Doesn't work yet: (2, Informative)

janwedekind (778872) | more than 5 years ago | (#26940153)

Actually IEEE allows you to make your paper available [ieee.org] on the internet at *one* location. However the material must not be reprinted/republished without permission from the IEEE. They also don't allow making your work part of another world-wide indexed collection. That's still far from perfect but at least it allows you to make your work accessible on your homepage or your university's Digital Commons [bepress.com] repository. I don't know what the future plans of IEEE are.

Cumbersome... (1)

going_the_2Rpi_way (818355) | more than 5 years ago | (#26938771)

This is the just as likely to add burden as to remove it.

I can't count the number of times I've seen attempts to 'standardize' data, or even just notation, in a given field. It all works very well for data to that point, but then the field expands or changes, or new assumptions become important, and the whole thing becomes either unwieldy or obsolete. This is one reason why every different field, it seems, has their own standards in their literature.

Speaking of the literature, most of these proposals are quickly followed by a 'let's just ask authors to conform to this now' approach to adopting these things. Papers get rewritten (or rejected), key points get lost, and the community gets weaker, all so that some standard with a half life of 12 months can be implemented.

This might be different. I applaud people trying to solve hard problems, and this is certainly one. I do think that more of the burden should be on demonstrating that the standradization is applicable for 12 months or more AFTER final development in a given field, never mind several.

Generally, though we shouldn't fear context. We should embrace it.

Well Science excluding Maths and the Hard Sciences (1)

Secret Rabbit (914973) | more than 5 years ago | (#26939235)

Excluding experimental data, those fields don't really have the problem that this guy is talking about. Perhaps someone should give him/her a lesson in the Scientific Method. Then maybe his/her words would reflect some rigour. Well, that and a link to the ArXive (http://arxiv.org/).

Why is this so? Because, these communities are so small, that just about everyone knows or knows of everyone else('s work). Of course, that's a slight hyperbole. BUT, /just/ a *slight* one.

This sort of project only really applies to the non-fundamental sciences. Not that it's not useful. Of course it'd be a good thing to get this going. But, we just have to be honest about its true scope. And of course it'd be nice if this guy would tone down the rhetoric. Coming off that naively idealistic only works against things.

Re:Well Science excluding Maths and the Hard Scien (1)

metageek (466836) | more than 5 years ago | (#26939787)

This sort of project only really applies to the non-fundamental sciences.

And what are fundamental sciences?
I keep hearing this type of argument: (some) physicist think biology is not a fundamental science; (some) biologists think sociology is not a fundamental science... each science is fundamental to those who want to understand the phenomena that it deals with.

arXiv!!!!! (1)

anonymShit (1415181) | more than 5 years ago | (#26940659)

Finally somebody mentioned the arxiv.
By the way, it's quite funny to see all these guys telling somebody how to do his job better, mostly when they have absolutely no idea what they're talking about.
Some nice sentences from the article:
-"It's taken me some time to learn how to read them"... what!!??
-"Because you're trying to present what happened in the lab one day as some fundamental truth", hahaaha, that one is good.
-"So what we need to do is both think about the way that we write those papers, and the words and the tone and how that really keeps people out of science. It really reduces the number of scientists". Yea. From under which rock of another planet have they taken this guy?. Keeps people out of science...yes, they see equations they don't understand, and don't want to make the needed effort. I can imagine the solution is that we through science away and begin writing "easy papers" that any analphabet can understand. That would be progress!

Now seriously, change the patent system to reward theoreticians, not only experimentalists. And make the population less ignorant!!!

The purpose of papers... (0)

Anonymous Coward | more than 5 years ago | (#26940063)

IAAP (I am a physicist), and agree that the model of charging researchers to access their own papers is rediculous and broken - i submit preprints to arxiv.org in addition to print journals (everyone needs citations).

Any researcher will tell you that writing papers is a giant pain - it takes a long time which we would rather spend running experiments/simulations.
Whether they are published in open or closed journals, papers do have a useful function: they summarise the important results and (should) clearly explain the caveats and errors.

What this guy seems to be advocating is that the raw data from experiments be openly available. I work on large experiments (tokamaks), where diagnostics are one-offs, built specifically for that experiment. The data is full of errors and subtleties which only those most familiar with the experiment can assess properly. For this reason any papers to be published externally are first thoroughly reviewed internally to ensure that the data has not been misinterpreted.

Whilst freely publishing the resulting papers is a Good Thing (TM), freely allowing access to the raw data is not.

like, ummm (1)

Hognoxious (631665) | more than 5 years ago | (#26940431)

Does anyone, -- I mean there's me obviously -- think that the way the structure of the articles doesn't, in the sense that it's sort of an exact word for word -- transcription of someone *speaking* -- is extremely jarring when you see it -- by that I mean in the written form?

Re:like, ummm (1)

ceoyoyo (59147) | more than 5 years ago | (#26942789)

Yes. Hey, maybe we should all write our scientific papers that way!

Scientific data is niether free nor cheap... (2, Insightful)

w0mprat (1317953) | more than 5 years ago | (#26940447)

Research data is typically large. In the mid-late 90s I recall a researcher planning to move 10 TB of data internationally. It wasn't exactly unprecedented either. The internet was simply not capable of such a transfer. Eventually they had to ship it on many disks.

The problem is with such raw data, ie from a radio telescope, is you need all of it, you can't really cut any out before it's even processed.

This is a lot less of a issue today with research networks all hooked into multi-gigabit pipes. But there are still very large datasets researchers are attempting to work with that are simply not cheap to handle.

I think this is a great idea, it's nice being able to share it but as far as the really sexy big research going on these days I don't see it being much of a point-click-download service!

Re:Scientific data is niether free nor cheap... (1)

ceoyoyo (59147) | more than 5 years ago | (#26942797)

Grant applications in my field typically have at least one line item for "storage." It's not cheap.

All research payed for by taxes should be free (0)

Anonymous Coward | more than 5 years ago | (#26940727)

Very simple:
Make a law that forces any tax funded research to end up in the public domain.
Problem solved.

Scientific papers are not "compressed" (1)

habbakuk (112920) | more than 5 years ago | (#26942119)

From the article, regarding scientific literature: "Because you're trying to present what happened in the lab one day as some fundamental truth. And the reality is much more ambiguous. It's much more vague. But this is an artifact of the pre-network world. There was no other way to communicate this kind of knowledge other than to compress it."

A statement like this suggests that the speaker either unfamiliar with the way scientific data is actually turned into papers, or inappropriately optimistic about the utility of making the data "available." It is true that scientific data can be voluminous, but the overwhelming majority of papers do not "compress" data. To stretch an inadequate analogy, scientific literature is much more akin to metadata. Imagine scientific data as a large set of digitized recordings of music, all jumbled about. The paper would represent the list of song title, artist, etc. that someone had to put together. The metadata is not so much a compression as a re-representation and categorization of the data.

As a neuroscientist responsible for sharing my results with the world, I've taken reasonable steps to ensure that all of the data used in my papers is freely available (under the Science Commons license, which I'm quite grateful to Wilbanks & co. for). Similarly, the code I wrote to extract meaningful parameters from the data and present them in an aesthetically pleasing way is also freely available. I maintain no illusions as to the utility of the database: nobody is really interested in recreating the figures in the paper from the original data, nor in reanalyzing the data. However, I do know that some of the insights I've presented have influenced those (few) that have read my papers and struggled to understand the ideas presented within.

There is nothing wrong with the idea that scientific data and biological materials ought to be readily available to those who would use them. But the notion that somehow the hard-won insights that come to those who spend years collecting and thinking about the data will somehow follow is fanciful at best. Peer-reviewed, editor-selected papers are not compressed versions that are easier to transmit, but rather the collected insights and interpretations that allow us confidence in the work we've done. So by all means, if Mr. Wilbanks can find people to pay for it, make it easy to disseminate data. Just don't be surprised to find that "decompressing" papers doesn't do all that much to advance knowledge.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?