Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Bringing Open Source To Biomedicine

Soulskill posted more than 3 years ago | from the given-enough-eyeballs,-all-diseases-are-shallow dept.

Medicine 60

waderoush writes "'Facebook and Twitter may have proven that humans have a deep-seated desire for sharing, [but] this impulse is still widely suppressed in biomedicine,' biotech reporter Luke Timmerman observes in this column on Sage Bionetworks founder Stephen Friend. Friend is working to convince drugmakers and academic researchers to pool their experimental genomic data in a shared database called the Sage Commons. The database could be used to track adverse drug events, or to 'visually display network models of disease that connect the dots between genes, proteins, and clinical manifestations of disease in ways that [scientific] journals are not equipped to handle,' Timmerman says. Researchers from Stanford, Columbia, UCSF, and UCSD are already contributing to the Sage Commons, and Friend is now calling for a community effort by drugmakers, academic scientists, doctors, regulators, insurers, and patients to 'grab this platform and run with it on their own."

cancel ×

60 comments

Sorry! There are no comments related to the filter you selected.

So long as there is money to be made (1)

KBentley57 (2017780) | more than 3 years ago | (#35787034)

there will no be any "openness" of any kind. There is just too much financial gain at stake (not that it is a good thing).

Re:So long as there is money to be made (1)

TaoPhoenix (980487) | more than 3 years ago | (#35787084)

We can fight to have pockets of open. Hopefully about 1000 academic articles and 1000 drugs and 1000 genomes to study.
(Starting small)

Re:So long as there is money to be made (1)

arun84h (1454607) | more than 3 years ago | (#35787098)

You mean "no openness of any kind...besides the ones listed in TFA who are already sharing" right?

Re:So long as there is money to be made (1)

KBentley57 (2017780) | more than 3 years ago | (#35787118)

I think it was apparent what I mean. But, I'll spell it out. Big companies will not be willing to open source everything while they are raking in profits. The comment was meant in disregard to those mentioned

Re:So long as there is money to be made (1)

rtb61 (674572) | more than 3 years ago | (#35790814)

They will also not open up anything that remotely hints of greed driven culpability. In fact that will be the driving factor for not only not loading up data by the pharmaceuticals but to also attack Sage Commons with claims that any negative data is false and it will sue to cripple the data base.

The only counter will be foreign governments with universal health who are directly financially affected by poorly performing drugs and who will fight to protect the billions at stake.

Re:So long as there is money to be made (0)

Anonymous Coward | more than 3 years ago | (#35787202)

Got news for you those ones listed are only sharing things that they have patents on other pieces of. Sure we'll share part A, but we own part B -- to make A worth anything you'll need to buy B -- but we wholeheartedly think you should investigate wonderful things to do with part A and share that back with us too!

Re:So long as there is money to be made (2)

sexconker (1179573) | more than 3 years ago | (#35787112)

I can tell you didn't RTFA, or even RTFS.
TFH is fucking misleading.

There is very little to do with open source, or openness in general.

Some guy is simply trying to get various players to buy into his system, with money and data, so he can then go back and run a few queries, maybe make a little graph, etc., and sell that data to others (for the price of money and more data).

It's basically stone soup [wikipedia.org] , but he demands money as well as all the work. (And if he's not demanding money now, just wait until the date draws nearer.)

But this will never happen. The reason these companies are so tight lipped with their data is not because they don't see a benefit in sharing and accessing data, but because they don't dare let others see their dirty laundry, lest they expose themselves as liable for their fuckups.

Re:So long as there is money to be made (0)

Anonymous Coward | more than 3 years ago | (#35787188)

there will no be any "openness" of any kind. There is just too much financial gain at stake (not that it is a good thing).

Well, from my perspective of working in commercial informatics company that relies heavily on OS software, yeah, we're not opening it. We take a lot of precautions to make sure we don't violate the GPL, while at the same time taking advantage of free software.

In the end, we are a business, we need to pay our employees and giving it away for free just does not fit that model. Maybe when the government writes us checks for doing things that are a benefit to society, we won't have to operate like a business. Until then... thanks OSS, but don't expect anything exciting back (we might release a driver or something, but you'll never get the interesting stuff.)

Re:So long as there is money to be made (1)

tqk (413719) | more than 3 years ago | (#35789230)

I fail to see why this is a problem for anyone. My idea? Every individual gets a unique alphanumeric ID, that matches a tuple in a nationally maintained database that contains that anonymous citizen's data. You don't need to know the individual's name. You just want his raw data to aggregate with the rest of the population.

What's wrong with this? Smiple? Make sure the link is secure and keep the lawyers out. :-|

Re:So long as there is money to be made (1)

ldobehardcore (1738858) | more than 3 years ago | (#35790092)

I really wish the world could work that way... I don't see how any company with a national database of "Anonymous" user IDs could resist doing a cross correlation and using that for direct medical advertizing.

The thing is, what you're suggesting is that rights to medical privacy will be revoked from the patients, and their information will be commercialized at the highest rate possible.

I understand the value of having a dataset like this, but it gives me chills to think of the consequences of it's implementation.

Re:So long as there is money to be made (1)

tqk (413719) | more than 3 years ago | (#35790212)

The thing is, what you're suggesting is that rights to medical privacy will be revoked from the patients

No. It's anonymized.

You're arguing against early warning systems.

Icelandic citizens DNA was anonymous (0)

Anonymous Coward | more than 3 years ago | (#35790644)

They tried this anonymous model when setting up the genetic database on Iceland. They failed. Commercial interests used lawyers to bypass all expectations of privacy and ended up being really dangerous to the public. Basically the people who donated caught DeCode in the act while doing sinister things, and used the legal system to opt out of the program. After that the legal advice anybody will get about taking a DNA-test is to not take the test. Anything you discover can and will be used against you to discriminate you. And it's especially the insurance-industry that will discriminate people because private companies are that way.

If the insurance-industry did not exist and all health-care was public, then sharing medical data would probably be okay. But, we are simply not there yet. Also the police sample all the databases they can get a hand on. Like in Norway, if you took at paternity-test, you would end up in the archive the people who investigate crimes uses. Knowing that some evidence get planted, this makes travesties of justice too easy to commit. No, keep your medical data confidential for now. The world is a place where the corporations and the governments will go to any means necessary to steal private data.

Re:So long as there is money to be made (1)

ldobehardcore (1738858) | more than 3 years ago | (#35803212)

There's really no such thing as anonymized data when it comes to large aggregate databases. For example, you're facebook page can be matched to your netflix account 9 out of 10 times because of the sheer volume of similarity in the data pertaining to your advertizing profile.

I'm all for early warning systems based on genetic markers, but I think it is a horrible idea to make a national system for it. I know a small company in the Seattle area where I live that does the testing on their own premises, and doesn't keep any data about the samples they process. There is no database that my information is stored in. If I want to knock $50 off the charge for my genetic processing, I can let them do a little experimentation on the dna, but they don't hold on to individual results.

I understand your point in that an anonymized national database would make a good standard. But the thing is any useful database for tracking a personal medical record can, and probably will be exploited.

Re:So long as there is money to be made (1)

tqk (413719) | more than 3 years ago | (#35804516)

There's really no such thing as anonymized data when it comes to large aggregate databases.

Well, I have to admit that's true. Once you start aggregating, ...

Still, anonymize, anonymize, anonymize, anonymize, anonymize, ... Give it to the Secret Service, or NCIS. "We just want your data, we don't care who you are. Honest, it's all just going into this big pot. We're shooting lawyers on sight."

Re:So long as there is money to be made (1)

ldobehardcore (1738858) | more than 3 years ago | (#35813942)

Heh Heh.

Shooting lawyers on sight.

Just appreciating the irony that of any group to shoot with wanton abandon, lawyers would probably be the best equipped to ruin you financially and have you put in jail if you do.

Incorporating this "Standard" (0)

Anonymous Coward | more than 3 years ago | (#35787088)

There are so many "standards" in place that it's impossible to implement systems to handle them all. What benefit do I get as a software creator if I implement this system which is still pretty much unestablished? Who is going to pay for the development to implement this? Contrary to popular belief scientists don't just sit down and code programs on their off time for fun, there are teams of people working on these projects as full time jobs.

Re:Incorporating this "Standard" (1)

blair1q (305137) | more than 3 years ago | (#35787140)

What standard are you talking about? And what software?

TFA is about sharing data that companies are keeping secret or are too lazy to publish.

Re:Incorporating this "Standard" (1)

TwistedPants (847858) | more than 3 years ago | (#35787152)

I really wish this wasn't an article about "sage commons" but one for Life Sciences & the semantic web - http://www.w3.org/blog/hcls [w3.org]

Re:Incorporating this "Standard" (3, Informative)

Samantha Wright (1324923) | more than 3 years ago | (#35787198)

Well [wikipedia.org] , although you're right, there is still something that I believe is usually called a "clusterfuck" when it comes to data transfer formats for biology and chemistry, and it's not helping the open-ification process any. (Note that this list seems to omit most of the proprietary formats, at least a dozen of which I can name off the top of my head.) It's symptomatic of the commercial land-grab that took place in biomedical computing (mostly) in the nineties.

Re:Incorporating this "Standard" (1)

tqk (413719) | more than 3 years ago | (#35789452)

... "clusterfuck" when it comes to data transfer formats for biology and chemistry

Oh, come on. That's what computers do if you've someone with smarts on the keyboard. Filters and data conversion's simple stuff. Know your input, know what you want out, figure out what you need to do that.

I know, bleeding edge versions of some software can't even read data from their previous versions. Well, build a box with the old version installed and ... Sometimes it's both a data and system problem. You need a better geek. :-)

I used to work for Atomic Energy Canada. You would not believe the bizarre, proprietary system they bought into for documenting the project we were in. Imagine Lotus Notes ca. '70.

I wrote *my* dox in LaTex (which was refused), and bailed soon afterwards (into contracting :-).

Re:Incorporating this "Standard" (1)

Samantha Wright (1324923) | more than 3 years ago | (#35789890)

As programmers, I would like to think we are positioned to criticize those who don't respect applicable standards. Simply because a brain-dead decision can be accommodated doesn't mean it deserves to live!

And these are simple things, very often—dozens of different metadata and header formats for wrapping and annotating DNA, for example. Totally bogus.

Re:Incorporating this "Standard" (1)

tqk (413719) | more than 3 years ago | (#35789978)

... dozens of different metadata and header formats for wrapping and annotating DNA, for example

So, you need a DBA who understands data formats.

I've been building stuff like this ever since I got into computing. This is geek heaven. "file $Blah". "apropos $Blah". ...

I don't understand why this is so hard.

Re:Incorporating this "Standard" (1)

Samantha Wright (1324923) | more than 3 years ago | (#35790332)

It isn't, but they're still jerks for doing it in the first place. Also, your assumptions about the organization sizes involved are a bit high—often we're talking about labs with two or three PhDs and a handful of masters students. Not a major resource of deep computer expertise, or large enough to have a DBA. That they have to export all of their old material and re-import it into a new format when they upgrade their software is an obstacle (albeit a defeatable one!) to getting things done, and before you know it, you're wasting time and company/grant money.

On top of that, you have the same format obsolescence problem we see with physical media: if DNAStar goes out of business, and everyone switches to MacVector, then Microsoft discontinues support for 32-bit executables in Windows 12, how do we interpret what header bytes 8-12 mean in their proprietary SBD format when we need to access Professor (Emeritus) Recently Deceased's early graduate work on a cancer cure, the programmers have been dead for fifty years, and no format documentation was released because people were expected to export to FASTA first? We may be able to recover the sequence from the file (it's stored in lower-case ASCII) but not the annotations. Laboratory work must be redone to confirm hypotheses about the precise format of the binary-encoded addresses, and this could cost months of work and tens of thousands of dollars (today) if Prof. Deceased was working in mammalian cells, which require very expensive techniques to transform with modified DNA.

In short, the hacker's approach fails here, and hard. Your technique is valid for sensible things like firewall scripts that are all well-commented, but the quantity of file formats in this world that are undocumented (and not self-explanatory) is far greater than that of those which are generally understood. This is the whole point of formats like FASTA and GenBank, and even the hacker's arch-nemesis XML, which are ASCII-encoded and easy to comprehend, but there are many programs that continue to store their material in obscure binary structures for convenience and legacy compatibility, and those companies have yet to cough up any scrap of documentation—in the aforementioned example, MacVector can't read DNAStar's native format [macvector.com] , and the manufacturer recommends exporting from the LaserGene suite into a more common format first. Again, hours of headache for semi-computer-literate experimentalists, and potentially months of headache for people digging into historical archives.

Do you understand now?

Re:Incorporating this "Standard" (1)

tqk (413719) | more than 3 years ago | (#35796850)

I don't understand why this is so hard.

It isn't, but they're still jerks for doing it in the first place.

Which is why I wrote my dox in LaTeX. I knew they'd reject it. I didn't care.

... labs with two or three PhDs and a handful of masters students. Not a major resource of deep computer expertise, or large enough to have a DBA.

See, this's the part that pees me off. This is complicated !@#$ for average mortals, and you sciency types, hell everyone in and out of research, have to learn to budget for this specialized computing expertise. My last big client couldn't wait to ship my position off to three guys in Brazil. If that's how much local expertise is appreciated, who'd want to be in this business?

We can do this, but you've got to fund us as much as your sponsors are funding you.

If you've got exotic data to deal with, do you hand it to an intern, or to someone who already knows how to handle it? That's your choice. How soon do you want it?

Re:Incorporating this "Standard" (1)

Samantha Wright (1324923) | more than 3 years ago | (#35800688)

You're ignoring my point completely in order to make a stand for your own job security, like an assembly line worker insisting that car-building robots can't cope with the unexpected, and thus car-building robots should be banned. The entire problem can be eliminated by making the data more consistent in the first place. Also, in this case, it's possible that in a few years the people who "already know how to handle it" are dead because the format specs were never released.

I am actually working as a DBA right now supporting a very fucked up genealogy database that uses numbers for table and column names for deliberate obfuscation reasons. This job sucks, because the vendor's shit isn't remotely fucking extensible, and it's a huge amount of work to find the data in its back-end (an old version of Sybase) and manipulate it externally. But at the same time, this database platform provides features that are patent-encumbered and can't be reimplemented, even if we had the money for hiring the developers required. So we have to cope with it. My predecessor left me a book correlating column numbers, table numbers, and data, that had to be reverse-engineered by probing over an SQL connection. We still don't know where significant portions of the input from the UI is stored. All of this was created to prevent customers from migrating away, but that doesn't matter because there's nothing to migrate to.

Your fantasy world that geeks + money = results ignores the amount of pain and suffering that these bad designs are creating in the first place. The whole point of computer technology is to simplify people's lives and work, and data standardization is critical to that, just like quality control of parts was critical to automating assembly lines. Yes, there's still a place for experts, and someone always needs to know how to keep the machines running, but would you personally rather be doing that, or programming the next generation of better automation tools?

Your argument is essentially that of the Luddite [wikipedia.org] . I remind you that there are still artisan textile workers in the world, and suggest that you start your own business pursuing your dreams.

Re:Incorporating this "Standard" (1)

tqk (413719) | more than 3 years ago | (#35801744)

I'm sorry you've come to the conclusion that we're in hopeless disagreement with each other. I assure you, you're jumping to conclusions. I've always railed against the proliferation of proprietary, opaque file formats & etc. (remember, I used LaTeX against orders, ffs).

You're ignoring my point completely in order to make a stand for your own job security

I'll cop to the job security charge, but in my defense, you're the one with the vast, complex problem to solve. I'm someone who (theoretically :-) can solve it.

The entire problem can be eliminated by making the data more consistent in the first place.

I'm one of the loudest advocates for this.

What you fail to understand is this is as it is! This is IT. *We* didn't create this clusterfuck, but this is our reality! It is among the youngest sciences out there. We're going to have to go through a lot of !@#$ before it's as solid as other professions. COBOL programmers are still valuable, ffs. There's going to be a lot of deadwood to wade through on the way, and more's created as we speak. I've been fighting this crap since '75 or so. With respect, suck it up. This is IT. Idiots out there created a mess. For our own reasons, we choose to work within that mess. These are our dragons.

dbi_list_schema.pl [nucleus.com]

Cry me a river.

Your fantasy world that geeks + money = results ignores the amount of pain and suffering that these bad designs are creating in the first place.

You appear to be blaming this on me. Why? I'm well aware there's a vast amount of dumbth in IT. Not every geek is worth the air they breath. I know of Sun Certified engineers who can't use ls to list a directory. Such is life.

Your argument is essentially that of the Luddite.

If that's really the impression you got, then I've obviously failed to express myself cogently. My apology.

All I'm asking is, is it really worth six months of a postdoc's time to bang their head on this to figure it out, or might a competent specialist manage to get that data for you in a week? When do you want it? How cheap are you? What's the postdoc really want to do? Bang their head on data conversion for six months, or do something with the data?

Wouldn't it be smarter to budget for data conversion specialists in the first place?

Re:Incorporating this "Standard" (1)

Samantha Wright (1324923) | more than 3 years ago | (#35802020)

It may in theory be smarter to budget for data conversion specialists, yes, given their flexibility (we'll assume for simplicity that they're all worth their paychecks), but it's just not practical to do on the scale we're looking at: The average small university would need one per biology/biochemistry/life sciences department, and institutions of that kind of bureaucratic gravitas are hard to move. Rather than pushing for embedding my classmates in every biomedical sciences department in the world, (even though I personally feel that postdocs are staggeringly computer-illiterate sometimes and really need some technically-minded adults to supervise them) it's much more practical to lobby vendors to open their stuff up, so that we can move everything into future-proof formats once, and never have to deal with it ever again.

A lot of biotechnology companies are already appreciating the OSS movement and go so far as to document their file formats in the user's manual for their hardware (e.g. Applied Biosystems's DNA sequencing hardware) but in general, biomedical software companies, such as MacVector and DNAStar from my first example, are still in the "Let's emulate Microsoft!" mindset when it comes to data storage.

Re:Incorporating this "Standard" (1)

tqk (413719) | more than 3 years ago | (#35802628)

The average small university would need one per biology/biochemistry/life sciences department ...

No. It would need a process implemented by a specialist. I'd do an inventory of all the data, all the file formats that need to be dealt with, then I'd start building tools/filters that handle those types of data. Once built, those tools can be used institution-wide. Soon, you would be batch processing data in the background automatically. Your postdocs would see the output in their email every morning.

... it's much more practical to lobby vendors to open their stuff up ...

I very much doubt that! Since when has that sort of thing been in their interest?

Re:Incorporating this "Standard" (1)

tqk (413719) | more than 3 years ago | (#35802838)

The average small university would need one per biology/biochemistry/life sciences department ...

No. It would need a process implemented by a specialist.

That person ought to be employed by your computing centre. You shouldn't even need to budget for them. This's like having someone on hand to do backups, or configure the firewall. You need data conversion on a regular basis. It's an essential service your entire institution needs, institution-wide. What's wrong with your IT dept?

Out-sourced to Brazil?

Re:Incorporating this "Standard" (1)

Samantha Wright (1324923) | more than 3 years ago | (#35806112)

Deciphering some of these formats is, as I've said, non-trivial. Your "start building tools/filters" step is where I take fault, especially when some combinations of closed tools can produce files that aren't lossless, e.g. a Windows metafile of a graph embedded in a FileMaker Pro database. How do you get the data points back out of the graph?

It also doesn't stop the world from continuing to produce files in formats with non-open specifications, even if you've fixed the institutions that have hired you, because you're only treating the symptoms, not the root problem. It ultimately is in the best interest of vendors to be compatible and open, because it's far more convenient for users, and what they want. (And, many organizations and companies are already moving this way, so it's not like no one's ever thought about it.) Consider that this same situation has happened in a number of IT arenas: video encoding being a recent prominent example. When there are open alternatives close enough in quality to closed software—which use widely-supported formats—people tend to prefer them by default. There's no de facto closed standard here, unlike Microsoft Office documents, which is why we often shuffle DNA around in the very simple FASTA format (one line starting with a > for the title of the sequence, and then another line containing the nucleotides, which lacks many useful features.)

As to your other comment: most university IT departments aren't well-prepared for application-specific material. They do things like make sure everyone has a network connection, that the computer labs all work, that every department has its own web-accessible site (which most departments write and maintain themselves), that course scheduling proceeds as normal, etc. Professors are too self-important—and university IT staff are too content to focus on their own material—for their paths to ever cross. The computer situation in most labs I've been to resembles a home LAN, and is generally completely under the control of the lab staff. They wouldn't generally tolerate externally-managed machines, as the time to resolve complications would mean a significant hit to productivity.

To make your batch idea work, you'd have to do the conversions as part of a nightly backup process, requiring no intervention on the part of the user to produce the record. You then have to hope to the gods that you get informed whenever a professor adds a new obscure format to his or her roster, and then personally know enough field-specific information to interpret the format involved. This is a great way to ensure you remain employed forever, but it's not a solution to the problem. And you can bet that running overnight wouldn't be good enough for their every-day conversion needs—many labs are open 24/7 so that staff can get exclusive access to equipment, just like the hackers of the seventies staying up to wait for mainframe access. We need to have a file format flag day [catb.org] , but there's too much mass to do so efficiently.

Re:Incorporating this "Standard" (1)

tqk (413719) | more than 3 years ago | (#35810538)

To make your batch idea work, you'd have to do the conversions as part of a nightly backup process, requiring no intervention on the part of the user to produce the record. You then have to hope to the gods that you get informed whenever a professor adds a new obscure format to his or her roster, and then personally know enough field-specific information to interpret the format involved.

This is an old problem, one that we've been dealing with forever! At a shell prompt on any *nix box, type "apropos 2". On my Linux box, that spits out stuff like:

po2debconf (
pod2html
pod2latex
pod2man
pod2text
pod2usage
ps2ascii
ps2epsi
ps2pdf

We've been building and using specialised data conversion tools since forever! Anyone with any shel/perl/python/... scripting foo can build a tool that'll loop over the contents of $INCOMING, detect what sort of file it is, pass it through the correct filter, or bail and scream "Exception!", and go on to the next.

As for your "new obscure format", shouldn't you have policies in place to handle this? If $NEWFILEFORMAT is non-portable, submission refused, rework and resubmit, damnit!

Re:Incorporating this "Standard" (1)

Samantha Wright (1324923) | more than 3 years ago | (#35811184)

1. There are lots of cases where the only currently-existing way to get data produced is in a patent-encumbered or indecipherably complex format. There's no reworking or resubmitting; there's just one vendor-specific program that does the magic, its storage format, and an export feature that only captures one viewpoint of the data. The only solution in such a case is getting this original storage format changed.

2. In general, I think you're out of touch with the culture at universities. In general, labs are self-managed and do not have anything remotely resembling a general IT department that they run software purchasing decisions by. There's a very good reason for this: maximum independence and self direction enables maximum efficiency in producing worthwhile results. That's partially to blame for the problems that TFA is about—people not sharing research with each other—but what you're proposing is impossible on many levels.

We're talking about people running experiments, here. They may be inventing new kinds of data (and needing to bring in new software tools) on a very regular basis. A university that encumbers this process with red tape about tools of choice is harming its ability to compete as a research institution, which is its ultimate goal. In practice, no university has policies regarding what labs can run on their computers (which, further, are almost always self-purchased by the labs) much less any restrictions on file formatting. The amount of work that would be required to track all of the tools and utilities used by hundreds of graduate students, postdocs and professors is far greater than you seem to assume, (particularly since it would require understanding of the research, in many cases) and would be extremely invasive and disruptive to productivity.

Re:Incorporating this "Standard" (1)

tqk (413719) | more than 3 years ago | (#35812130)

There are lots of cases where the only currently existing way to get data produced is in a patent encumbered or indecipherably complex format. There's no reworking or resubmitting; there's just one vendor-specific program that does the magic, its storage format, and an export feature that only captures one viewpoint of the data.

I don't know what sort of IT people you have to put up with, but from my perspective this is not the rocket science you appear to believe it is. Any of my friends would be able to sit down and analyze those bizarre file formats and come up with *some* sort of process to handle them. Some will be finicky and demand hands-on treatment. However, you ought to be able to automate the majority of them.

At the very least, insist this cryptic data is submitted in as many forms as possible; raw binary, export, backup data files, screenshots, email attachments, fax, ... All I'm doing here is advocating for doing it smarter. IT is a young field, but we have learned few basic laws. Redundancy's an important one.

I'm also suggesting that you're missing the value of divide and conquer. Leave the IT to the geeks. Leave the researchers to research. Don't try to teach a pig to sing; they're not good at it, and it annoys the pig.

In general, I think you're out of touch with the culture at universities.

Absolutely. I've never been to one (I'm primarily a self-taught hacker:-).

Computers can make life easier for everyone involved, but only if the sharp end's focussed upon. I enjoy implementing solutions that make problems disappear forever. Not all of your labs or projects should have to fight with every problem, ffs! Automate what you can institution-wide, and deal with the rest when you run into them. Iterate.

Not to belittle your burden, but this's what I've been doing for two decades. Some problems are intractable exceptions, however most *can* be handled if you know how. I still say you need a better geek. :-)

Re:Incorporating this "Standard" (1)

tqk (413719) | more than 3 years ago | (#35789554)

Q: How did the regular expression cross the road?
A: ^.*$

Admitting my ignorance, would you please explain your .sig? Pretty please?

Re:Incorporating this "Standard" (1)

Samantha Wright (1324923) | more than 3 years ago | (#35789930)

Regular expressions are a rudimentary programming language (not Turing-complete) most commonly used for matching strings based on patterns. This is similar to the * and ? wildcards used by Unix- and Windows-derived/inspired operating systems for filenames, but more powerful. The answer to the joke consists of a regular expression containing the following four symbols:

^ indicates the start of a line
. indicates any character other than a linebreak
* indicates "zero or more repetitions of the previous character
$ indicates the end of a line

Thus, the regular expression is starting at one "side", "crossing" any characters it passes over, and stopping at the other "side". This particular road-crossing joke is unique in that it completely describes the method by which the subject crosses the road, instead of just a brief summary of the goal or, as in the few other jokes that use "how" instead of "why", a vague descriptor of the manner in which the road was crossed.

Now hand in your geek card, forever.

Re:Incorporating this "Standard" (1)

tqk (413719) | more than 3 years ago | (#35790112)

... Thus, the regular expression is starting at one "side", "crossing" any characters it passes over, and stopping at the other "side".

That's when I whooped out laughing. Damn, this's well written and composed! Thanks.

I do know the values of all those special chars you mention above, but damn, you do put a brilliant spin on them.

No, you'll not get my geek card, except from my cold, dead hands ...! :-|

I'm still giggling. Fun meeting you. Carry on, thanks.

Re:Incorporating this "Standard" (1)

HiggsBison (678319) | more than 3 years ago | (#35787288)

Contrary to popular belief scientists don't just sit down and code programs on their off time for fun...

I am a counter-example to your assertion.

Re:Incorporating this "Standard" (1)

tqk (413719) | more than 3 years ago | (#35797124)

Contrary to popular belief scientists don't just sit down and code programs on their off time for fun...

I am a counter-example to your assertion.

Not to mention, there's a few geeks out here who love to dabble in sciency stuff. Show me an interesting problem, and you won't find it easy to get me off it. I live for the "three pipe problem."

Daily Fukushima updates in a PDF (-1)

Anonymous Coward | more than 3 years ago | (#35787182)

Get daily updates about Fukushima in PDF format.

best information I have found.

http://www.jaif.or.jp/english/ [jaif.or.jp]

Tons of money involved (1)

kvvbassboy (2010962) | more than 3 years ago | (#35787190)

Thing is, there is a lot of money involved Biomedicine. Research Institutes can hope to gain a lot of funding by selling their results to pharmaceutical companies. It would be the equivalent of Microsoft open sourcing the datasets used for their multibillion dollar speech and language technologies.

I can see this happening at universities though with a "GPLv2 equivalent" license on the database.

Re:Tons of money involved (1)

oldhack (1037484) | more than 3 years ago | (#35787794)

The mixture of medicine with so much moneyed interest produces noxious stench like no other. And no other fields are propped up with so much pubic fund - look up NIH funding vs. any other research funding.

Soon enough, it will also implode our finance, both public and private.

The What Bionetworks? (0)

Anonymous Coward | more than 3 years ago | (#35787220)

Anyone else read that as "Sega Bionetworks"?

Twitter and Facebook .. sharing? (1)

ackthpt (218170) | more than 3 years ago | (#35787284)

I'd say it's more than sharing - it's about exposing ones thoughts in the desire for acknowledgement or acceptance .. I am not alone.

Also, useful for slagging others to prop up own self esteem and plugging ones own site/service/content or film one has participated in.

As for medicine .. I think it would be great to get more people on-line with folk remedies, to see if any actually have merit, i.e. chewing willow bark helped relieve my headache (this is the origin of Aspirin as salicylic acid.) The only worries I have, aside from a flood of quackery would be some billion dollar pharma concern buying up the track of land where the useful plant or bug lives and tries to patent the heck out of DNA or synthntesis to corner the market.

confused (0)

Anonymous Coward | more than 3 years ago | (#35787318)

I am a little confused as to how and whether you cynics think open source works at all. Does it? It does for me in my selfish little world.

If the model works for some copywritable/patentable system then it is likely that it can be made to work for another. What's the problem?
Nolan

Adverse drug events and duplicity (1)

rune.w (720113) | more than 3 years ago | (#35787442)

Drugmakers are already required to keep track of adverse drug events that arise during clinical testing. Much of this information is reported to regulatory agencies on almost a daily basis and there's a lot of work going behind the scenes to make sure the information is reliable, consistent and keeps patient privacy.

I can understand to some extent why drugmakers aren't too keen to jump into this. There is little use in adding yet another database into an already busy workflow. This new database is guaranteed to be different from many in-house solutions currently in use, so you will need to train people, get them used to the new process, etc. just to input the same data the regulator already receives. IMO this won't be worth the effort in the eyes of many drugmakers unless you get regulatory agencies involved.

I am not saying in general this is not a worthy cause. We currently have more data derived from genomics (and all the other -omics) than we can analyze. However to be successful this guys need to make sure they aren't duplicating the functionality of the myriad of public databases already out there.

Re:Adverse drug events and duplicity (1)

Daniel Dvorkin (106857) | more than 3 years ago | (#35788402)

Adverse event reporting covers only a tiny fraction of the data gathered in any clinical trial. There's an enormous amount of information that would be useful for future research locked up in clinical trials databases, and as we move into the "genomic medicine" era, this will be ever more the case. Having gone back and forth between bioinformatics and clinical research, and being persistently annoyed at how difficult it is to access data in the latter field, I say that anything that can bring bioinformatics' generally more open approach to the clinical research world is a good thing. Obviously there are privacy concerns when working with human data that don't exist when working with model organisms, but most of what keeps clinical data out of public research databases is plain old inertia, and it sounds like Sage Commons is working to overcome that -- good for them.

Re:Adverse drug events and duplicity (0)

Anonymous Coward | more than 3 years ago | (#35807368)

But they don't. The failure to report adverse events is a major crime. The perverse incentives to keep secret these millions of episodes is obvious, but no one is enforcing this lynchpin of drug release protocol. For example, Avastin causes paralysis in some significant percentage of people. We don't know how many because the nurses and doctors refuse to report it. My clinic admits people who were told by the nurse, "Ok, now if you feel any tingling or problem moving your arm, tell me immediately" when infusing. One person had right arm paralysis in 30 minutes. We looked up the FDA reporting both trials and ongoing, found nothing, looked up support groups, found several. If a major adverse event isn't being reported, why isn't it? The implications are that the entire system is compromised, and sickness and death are a greater consequence of new drugs than anyone can tell.

Two questions (1)

wrencherd (865833) | more than 3 years ago | (#35787744)

TFT&TFS are as misleading as others have noted; this is about "open-data", not really "open-source".

I am skeptical and have two questions:

(1) In terms of research, isn't this what peer review and publication are supposed to accomplish?

(2) How is "biomedicine" different from "medicine"?

Re:Two questions (1)

Daniel Dvorkin (106857) | more than 3 years ago | (#35788812)

(1) Peer review is a lot more powerful when you can review the data itself, not just what the paper says about the data. In bioinformatics, we've known this for years, which is why you absolutely can't publish a paper concerning a microarray experiment without making the raw data available in GEO [nih.gov] or a similar repository.

(1.5) Any high-throughput experiment generates enormous amounts of data (that's pretty much the definition of "high-throughput") and that data is very often useful for answering questions other than the specific one the experimenter was asking. Public availability of data has proven an enormous boon to basic biology, and an awful lot of people would like to see that carry over into medical research.

(2) Generally speaking, "medicine" refers to clinical practice, and "medical research" to research in that practice, while "biomedical research" refers to research in the biology underlying disease and the treatment of disease. "Biomedicine" is, more or less, best defined as "what biomedical researchers do." For example, if you have cancer, your oncologist may prescribe chemotherapy (medicine) but before that, a pharmacologist designed the drug you're now being administered (biomedicine) and a biostatistician analyzed the results of the clinical trials on the drug (medical research). Ideally, data sharing will help close the loop: more biostatisticians can analyze the results of your and other patients' treatments, and more pharmacologists can use the results of that analysis to design the next generation of treatments.

Re:Two questions (0)

Anonymous Coward | more than 3 years ago | (#35788898)

Depositing SNP / gene expression data is easy, but without clinical phenotypes, they're meaningless. Moreover, without the source code, we usually can't replicate the results mentioned in the paper. The only way to do repetition is by redoing the experiment (which is bloody expensive). Biomed papers are usually scant in details; they're swept away under the rug. So, it's very very easy to fake results and/or display misleading results. Any favorite genes? Tack 'em in. We just need to word them carefully. As long as it's not too spectacular to warrant reexamination, there we get our publications.

Re:Two questions (1)

innerweb (721995) | more than 3 years ago | (#35788928)

Marketing.

Biomed is a very sad place (0)

Anonymous Coward | more than 3 years ago | (#35787844)

I was having a conversation with a friend big in biomedical research.
They are essentially trying to create a classifier for protein/drugs docking properties (I won't even pretend I get the geometry of these data, I just know the data is in a format usable for the non-continuous-input AI algo's, the most common ones). Apparently the possible positive gains from such a classifier are pretty intense (hell, a lab of people were doing this for free), yet the training data for their classifier is 4 sets. Yes, that is 4, as in "I'll use 3 sets for training and 1 for validation". Apparently, that's the entire set of data available in the public domain. To his knowledge, there are more datasets, yet the rest are available behind prohibitively expensive paywalls (prohibitive for a biomedical research lab for one of the wealthiest EU nations that is, not for me and you).
The amount of AI knowledge to pull off the research is easily within the grasp of a 3rd year CS student, yet the permanent research staff cannot make a breakthrough because they don't have access to the data.

Biomedical community and sharing (0)

Anonymous Coward | more than 3 years ago | (#35787940)

As a little background I'm an unpaid intern at a university in a lab where we are currently working on a cure for Parkinson's Disease. I find this rather home hitting as we sort of hide our secrets horribly so (and in some ways, absolutely not) from outside sources. I also find it kind of interesting, as to the question of - what if we're all sharing information? How will competition play into the development of research, then?
Although truth be told, we know we're slightly behind in some areas compared to a few other research groups working on this - but we're getting really close. So far we've yielded near perfect "cure" results with DBS treatments.
Anywho, I babble on, but yeah. Open-data could really help the medical community and makes prices go down. But human greed always seems to get in the way.

develop vaccines that are less than fatal? (0)

Anonymous Coward | more than 3 years ago | (#35788312)

discover the mysterious roots of the hymen, & associated value systems, real sex religious trainings? why a lost/missing hymen = further surgery, or even death by shame in some colonies? estimate the projectdead end (this week) of the never ending chosen ones' holycost life0cidal exterminations/business outings, & the costs to us, & profits to the royals etc...? with woz, stallman, torvalds, jobs & ventura looking on in a citizenly way, there'll be precious little eugenatic fudgepacking snaked by us god fearing/loving unchosens, again & again.

anything else that has it's base kode in the truth would also help.

Are you shitting me? (1)

Anonymous Coward | more than 3 years ago | (#35788854)

Sage is a spinoff of Merck as Rosetta Inpharmatics. Rosetta died and Sage emerged from it. The spin was that Merck has deposited thousands of clinical mouse strains that supposedly worth tens of millions USD. I don't buy it.

I know Stephen Friend has been "promoting" his idea of pooling genetic data. The pitch is that by pooling, his company can offer "better" analysis. However, by "pooling" means for his company's (Sage) use and NOT for public good. This article is absolutely misleading! I'm speaking as someone who has dealt with Sage. So, Sage has been acting like a piggy bank. It's easy to deposit data, but it's really hard to get ANYTHING out. The reason is simple: they're ALWAYS citing NDA and privacy concerns (HIPAA among others).

Their supposed algorithms are lousy and shrouded with mysticism. They ALWAYS cite patents and/or proprietary rights. No source codes have been released so far. Read their papers / publications. They are FILLED with too many buzzwords and little detail. The important part are handwaved really vigorously into the paper. It bothers me why people trust them so much. Yes, they can produce good results, no doubt. But are they for open science? Definitely NOT!

Cause and effect (1)

Anubis IV (1279820) | more than 3 years ago | (#35788910)

Facebook and Twitter may have proven that humans have a deep-seated desire for sharing

If anything, sharing is merely a byproduct of the actual desires that drive those sites. Desires and tendencies such as showing off, egoism, seeking acceptance, seeking affirmation, or gathering information. And more so in the case of Facebook than Twitter, given the studies that repeatedly indicate that Twitter's graph is structured as a news graph rather than as a social graph (what was the last statistic? That the top 1% of Twitter users produce 98% of the tweets that get retweeted?).

To suggest that sharing is the driving desire behind those sites is to give humanity far more credit than it is due.

as (0)

Anonymous Coward | more than 3 years ago | (#35789058)

mbt shoes [imbtshoe.com]
mbt on sale [imbtshoe.com]

Why no sharing? (1)

Syberz (1170343) | more than 3 years ago | (#35791502)

Easy, this is why: $

Except for the companies providing the publishing service, nobody makes money with your tweets. On the other hand, complex drug interaction and related biomedical info is a potential source of great return which is gathered at great risk for the company (they can spend millions on R&D and get nothing currently usable out of it). Biomed companies will not share info that a competitor could use.

Doesn't The National Institutes Of Health (0)

Anonymous Coward | more than 3 years ago | (#35792400)

have resources for this?

Ahhh yes open source in Bio med... (1)

Schmyz (1265182) | more than 3 years ago | (#35794600)

...soon to usher in the "lawsuit era for the bio-med hack" I can see it now...somewhere some how some yo-yo is going to state open source lead to a hack into some company's personal bottom line. (i.e. personel info, pay scale, profits info, secret formula for some new ED med...etc)
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>