Bringing Open Source To Biomedicine

Bringing Open Source To Biomedicine 60

Posted by Soulskill on Monday April 11, 2011 @06:49PM from the given-enough-eyeballs,-all-diseases-are-shallow dept.

waderoush writes "'Facebook and Twitter may have proven that humans have a deep-seated desire for sharing, [but] this impulse is still widely suppressed in biomedicine,' biotech reporter Luke Timmerman observes in this column on Sage Bionetworks founder Stephen Friend. Friend is working to convince drugmakers and academic researchers to pool their experimental genomic data in a shared database called the Sage Commons. The database could be used to track adverse drug events, or to 'visually display network models of disease that connect the dots between genes, proteins, and clinical manifestations of disease in ways that [scientific] journals are not equipped to handle,' Timmerman says. Researchers from Stanford, Columbia, UCSF, and UCSD are already contributing to the Sage Commons, and Friend is now calling for a community effort by drugmakers, academic scientists, doctors, regulators, insurers, and patients to 'grab this platform and run with it on their own."

Bringing Open Source To Biomedicine

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 60 Comments Log In/Create an Account

Comments Filter:

So long as there is money to be made (Score:1)

by KBentley57 ( 2017780 ) writes:

there will no be any "openness" of any kind. There is just too much financial gain at stake (not that it is a good thing).
- Re: (Score:2)
  
  by TaoPhoenix ( 980487 ) writes:
  
  We can fight to have pockets of open. Hopefully about 1000 academic articles and 1000 drugs and 1000 genomes to study.
  (Starting small)
- Re: (Score:1)
  
  by arun84h ( 1454607 ) writes:
  
  You mean "no openness of any kind...besides the ones listed in TFA who are already sharing" right?
  - Re: (Score:1)
    
    by KBentley57 ( 2017780 ) writes:
    
    I think it was apparent what I mean. But, I'll spell it out. Big companies will not be willing to open source everything while they are raking in profits. The comment was meant in disregard to those mentioned
    - Re: (Score:2)
      
      by rtb61 ( 674572 ) writes:
      
      They will also not open up anything that remotely hints of greed driven culpability. In fact that will be the driving factor for not only not loading up data by the pharmaceuticals but to also attack Sage Commons with claims that any negative data is false and it will sue to cripple the data base.
      The only counter will be foreign governments with universal health who are directly financially affected by poorly performing drugs and who will fight to protect the billions at stake.
- Re: (Score:2)
  
  by sexconker ( 1179573 ) writes:
  
  I can tell you didn't RTFA, or even RTFS.
  TFH is fucking misleading.
  There is very little to do with open source, or openness in general.
  Some guy is simply trying to get various players to buy into his system, with money and data, so he can then go back and run a few queries, maybe make a little graph, etc., and sell that data to others (for the price of money and more data).
  It's basically stone soup [wikipedia.org], but he demands money as well as all the work. (And if he's not demanding money now, just wait until the date
- - Re: (Score:1)
    
    by tqk ( 413719 ) writes:
    
    I fail to see why this is a problem for anyone. My idea? Every individual gets a unique alphanumeric ID, that matches a tuple in a nationally maintained database that contains that anonymous citizen's data. You don't need to know the individual's name. You just want his raw data to aggregate with the rest of the population.
    What's wrong with this? Smiple? Make sure the link is secure and keep the lawyers out. :-|
    - Re: (Score:1)
      
      by ldobehardcore ( 1738858 ) writes:
      
      I really wish the world could work that way... I don't see how any company with a national database of "Anonymous" user IDs could resist doing a cross correlation and using that for direct medical advertizing.
      The thing is, what you're suggesting is that rights to medical privacy will be revoked from the patients, and their information will be commercialized at the highest rate possible.
      I understand the value of having a dataset like this, but it gives me chills to think of the consequences of it's imple
      - Re: (Score:1)
        
        by tqk ( 413719 ) writes:
        
        The thing is, what you're suggesting is that rights to medical privacy will be revoked from the patients
        No. It's anonymized.
        You're arguing against early warning systems.
        
        Re: (Score:1)
        
        by ldobehardcore ( 1738858 ) writes:
        
        There's really no such thing as anonymized data when it comes to large aggregate databases. For example, you're facebook page can be matched to your netflix account 9 out of 10 times because of the sheer volume of similarity in the data pertaining to your advertizing profile.
        I'm all for early warning systems based on genetic markers, but I think it is a horrible idea to make a national system for it. I know a small company in the Seattle area where I live that does the testing on their own premises, and
        
        Re: (Score:1)
        
        by tqk ( 413719 ) writes:
        
        There's really no such thing as anonymized data when it comes to large aggregate databases.
        Well, I have to admit that's true. Once you start aggregating, ...
        Still, anonymize, anonymize, anonymize, anonymize, anonymize, ... Give it to the Secret Service, or NCIS. "We just want your data, we don't care who you are. Honest, it's all just going into this big pot. We're shooting lawyers on sight."
        
        Re: (Score:1)
        
        by ldobehardcore ( 1738858 ) writes:
        
        Heh Heh.
        Shooting lawyers on sight.
        Just appreciating the irony that of any group to shoot with wanton abandon, lawyers would probably be the best equipped to ruin you financially and have you put in jail if you do.
- Re: (Score:2)
  
  by blair1q ( 305137 ) writes:
  
  What standard are you talking about? And what software?
  TFA is about sharing data that companies are keeping secret or are too lazy to publish.
  - Re: (Score:1)
    
    by TwistedPants ( 847858 ) writes:
    
    I really wish this wasn't an article about "sage commons" but one for Life Sciences & the semantic web - http://www.w3.org/blog/hcls [w3.org]
  - Re:Incorporating this "Standard" (Score:4, Informative)
    
    by Samantha Wright ( 1324923 ) writes: on Monday April 11, 2011 @07:21PM (#35787198) Homepage Journal
    
    Well [wikipedia.org], although you're right, there is still something that I believe is usually called a "clusterfuck" when it comes to data transfer formats for biology and chemistry, and it's not helping the open-ification process any. (Note that this list seems to omit most of the proprietary formats, at least a dozen of which I can name off the top of my head.) It's symptomatic of the commercial land-grab that took place in biomedical computing (mostly) in the nineties.
    
    - Re: (Score:1)
      
      by tqk ( 413719 ) writes:
      
      ... "clusterfuck" when it comes to data transfer formats for biology and chemistry
      Oh, come on. That's what computers do if you've someone with smarts on the keyboard. Filters and data conversion's simple stuff. Know your input, know what you want out, figure out what you need to do that.
      I know, bleeding edge versions of some software can't even read data from their previous versions. Well, build a box with the old version installed and ... Sometimes it's both a data and system problem. You need a bett
      - Re: (Score:2)
        
        by Samantha Wright ( 1324923 ) writes:
        
        As programmers, I would like to think we are positioned to criticize those who don't respect applicable standards. Simply because a brain-dead decision can be accommodated doesn't mean it deserves to live!
        
        And these are simple things, very often—dozens of different metadata and header formats for wrapping and annotating DNA, for example. Totally bogus.
        
        Re: (Score:1)
        
        by tqk ( 413719 ) writes:
        
        ... dozens of different metadata and header formats for wrapping and annotating DNA, for example
        So, you need a DBA who understands data formats.
        I've been building stuff like this ever since I got into computing. This is geek heaven. "file $Blah". "apropos $Blah". ...
        I don't understand why this is so hard.
        
        Re: (Score:2)
        
        by Samantha Wright ( 1324923 ) writes:
        
        It isn't, but they're still jerks for doing it in the first place. Also, your assumptions about the organization sizes involved are a bit high—often we're talking about labs with two or three PhDs and a handful of masters students. Not a major resource of deep computer expertise, or large enough to have a DBA. That they have to export all of their old material and re-import it into a new format when they upgrade their software is an obstacle (albeit a defeatable one!) to getting things done, and befor
        
        Re: (Score:1)
        
        by tqk ( 413719 ) writes:
        
        I don't understand why this is so hard.
        It isn't, but they're still jerks for doing it in the first place.
        Which is why I wrote my dox in LaTeX. I knew they'd reject it. I didn't care.
        ... labs with two or three PhDs and a handful of masters students. Not a major resource of deep computer expertise, or large enough to have a DBA.
        See, this's the part that pees me off. This is complicated !@#$ for average mortals, and you sciency types, hell everyone in and out of research, have to learn to budget for this sp
        
        Re: (Score:2)
        
        by Samantha Wright ( 1324923 ) writes:
        
        You're ignoring my point completely in order to make a stand for your own job security, like an assembly line worker insisting that car-building robots can't cope with the unexpected, and thus car-building robots should be banned. The entire problem can be eliminated by making the data more consistent in the first place. Also, in this case, it's possible that in a few years the people who "already know how to handle it" are dead because the format specs were never released.
        
        I am actually working as a DBA
        
        Re: (Score:1)
        
        by tqk ( 413719 ) writes:
        
        I'm sorry you've come to the conclusion that we're in hopeless disagreement with each other. I assure you, you're jumping to conclusions. I've always railed against the proliferation of proprietary, opaque file formats & etc. (remember, I used LaTeX against orders, ffs).
        You're ignoring my point completely in order to make a stand for your own job security
        I'll cop to the job security charge, but in my defense, you're the one with the vast, complex problem to solve. I'm someone who (theoretically :-) c
        
        Re: (Score:2)
        
        by Samantha Wright ( 1324923 ) writes:
        
        It may in theory be smarter to budget for data conversion specialists, yes, given their flexibility (we'll assume for simplicity that they're all worth their paychecks), but it's just not practical to do on the scale we're looking at: The average small university would need one per biology/biochemistry/life sciences department, and institutions of that kind of bureaucratic gravitas are hard to move. Rather than pushing for embedding my classmates in every biomedical sciences department in the world, (even t
        
        Re: (Score:1)
        
        by tqk ( 413719 ) writes:
        
        The average small university would need one per biology/biochemistry/life sciences department ...
        No. It would need a process implemented by a specialist. I'd do an inventory of all the data, all the file formats that need to be dealt with, then I'd start building tools/filters that handle those types of data. Once built, those tools can be used institution-wide. Soon, you would be batch processing data in the background automatically. Your postdocs would see the output in their email every morning.
        ...
        
        Re: (Score:1)
        
        by tqk ( 413719 ) writes:
        
        The average small university would need one per biology/biochemistry/life sciences department ...
        No. It would need a process implemented by a specialist.
        That person ought to be employed by your computing centre. You shouldn't even need to budget for them. This's like having someone on hand to do backups, or configure the firewall. You need data conversion on a regular basis. It's an essential service your entire institution needs, institution-wide. What's wrong with your IT dept?
        Out-sourced to Brazil?
        
        Re: (Score:2)
        
        by Samantha Wright ( 1324923 ) writes:
        
        Deciphering some of these formats is, as I've said, non-trivial. Your "start building tools/filters" step is where I take fault, especially when some combinations of closed tools can produce files that aren't lossless, e.g. a Windows metafile of a graph embedded in a FileMaker Pro database. How do you get the data points back out of the graph?
        
        It also doesn't stop the world from continuing to produce files in formats with non-open specifications, even if you've fixed the institutions that have hired you,
        
        Re: (Score:1)
        
        by tqk ( 413719 ) writes:
        
        To make your batch idea work, you'd have to do the conversions as part of a nightly backup process, requiring no intervention on the part of the user to produce the record. You then have to hope to the gods that you get informed whenever a professor adds a new obscure format to his or her roster, and then personally know enough field-specific information to interpret the format involved.
        This is an old problem, one that we've been dealing with forever! At a shell prompt on any *nix box, type "apropos 2". O
        
        Re: (Score:2)
        
        by Samantha Wright ( 1324923 ) writes:
        
        1. There are lots of cases where the only currently-existing way to get data produced is in a patent-encumbered or indecipherably complex format. There's no reworking or resubmitting; there's just one vendor-specific program that does the magic, its storage format, and an export feature that only captures one viewpoint of the data. The only solution in such a case is getting this original storage format changed.
        
        2. In general, I think you're out of touch with the culture at universities. In general, labs
        
        Re: (Score:1)
        
        by tqk ( 413719 ) writes:
        
        There are lots of cases where the only currently existing way to get data produced is in a patent encumbered or indecipherably complex format. There's no reworking or resubmitting; there's just one vendor-specific program that does the magic, its storage format, and an export feature that only captures one viewpoint of the data.
        I don't know what sort of IT people you have to put up with, but from my perspective this is not the rocket science you appear to believe it is. Any of my friends would be able to s
    - Re: (Score:1)
      
      by tqk ( 413719 ) writes:
      
      Q: How did the regular expression cross the road?
      A: ^.*$
      Admitting my ignorance, would you please explain your .sig? Pretty please?
      - Re: (Score:2)
        
        by Samantha Wright ( 1324923 ) writes:
        
        Regular expressions are a rudimentary programming language (not Turing-complete) most commonly used for matching strings based on patterns. This is similar to the * and ? wildcards used by Unix- and Windows-derived/inspired operating systems for filenames, but more powerful. The answer to the joke consists of a regular expression containing the following four symbols:
        
        ^ indicates the start of a line
        . indicates any character other than a linebreak
        * indicates "zero or more repetitions of the previous char
        
        Re: (Score:1)
        
        by tqk ( 413719 ) writes:
        
        ... Thus, the regular expression is starting at one "side", "crossing" any characters it passes over, and stopping at the other "side".
        That's when I whooped out laughing. Damn, this's well written and composed! Thanks.
        I do know the values of all those special chars you mention above, but damn, you do put a brilliant spin on them.
        No, you'll not get my geek card, except from my cold, dead hands ...! :-|
        I'm still giggling. Fun meeting you. Carry on, thanks.
- Re: (Score:2)
  
  by HiggsBison ( 678319 ) writes:
  
  Contrary to popular belief scientists don't just sit down and code programs on their off time for fun...
  I am a counter-example to your assertion.
  - Re: (Score:1)
    
    by tqk ( 413719 ) writes:
    
    Contrary to popular belief scientists don't just sit down and code programs on their off time for fun...
    I am a counter-example to your assertion.
    Not to mention, there's a few geeks out here who love to dabble in sciency stuff. Show me an interesting problem, and you won't find it easy to get me off it. I live for the "three pipe problem."
Tons of money involved (Score:1)

by kvvbassboy ( 2010962 ) writes:

Thing is, there is a lot of money involved Biomedicine. Research Institutes can hope to gain a lot of funding by selling their results to pharmaceutical companies. It would be the equivalent of Microsoft open sourcing the datasets used for their multibillion dollar speech and language technologies.
I can see this happening at universities though with a "GPLv2 equivalent" license on the database.
- Re: (Score:1)
  
  by oldhack ( 1037484 ) writes:
  
  The mixture of medicine with so much moneyed interest produces noxious stench like no other. And no other fields are propped up with so much pubic fund - look up NIH funding vs. any other research funding.
  Soon enough, it will also implode our finance, both public and private.
Twitter and Facebook .. sharing? (Score:2)

by ackthpt ( 218170 ) writes:

I'd say it's more than sharing - it's about exposing ones thoughts in the desire for acknowledgement or acceptance .. I am not alone.
Also, useful for slagging others to prop up own self esteem and plugging ones own site/service/content or film one has participated in.
As for medicine .. I think it would be great to get more people on-line with folk remedies, to see if any actually have merit, i.e. chewing willow bark helped relieve my headache (this is the origin of Aspirin as salicylic acid.) The only worr
Adverse drug events and duplicity (Score:2)

by rune.w ( 720113 ) writes:

Drugmakers are already required to keep track of adverse drug events that arise during clinical testing. Much of this information is reported to regulatory agencies on almost a daily basis and there's a lot of work going behind the scenes to make sure the information is reliable, consistent and keeps patient privacy.
I can understand to some extent why drugmakers aren't too keen to jump into this. There is little use in adding yet another database into an already busy workflow. This new database is guarante
- Re: (Score:2)
  
  by Daniel Dvorkin ( 106857 ) * writes:
  
  Adverse event reporting covers only a tiny fraction of the data gathered in any clinical trial. There's an enormous amount of information that would be useful for future research locked up in clinical trials databases, and as we move into the "genomic medicine" era, this will be ever more the case. Having gone back and forth between bioinformatics and clinical research, and being persistently annoyed at how difficult it is to access data in the latter field, I say that anything that can bring bioinformati
Two questions (Score:1)

by wrencherd ( 865833 ) writes:

TFT&TFS are as misleading as others have noted; this is about "open-data", not really "open-source".

I am skeptical and have two questions:

(1) In terms of research, isn't this what peer review and publication are supposed to accomplish?

(2) How is "biomedicine" different from "medicine"?
- Re: (Score:2)
  
  by Daniel Dvorkin ( 106857 ) * writes:
  
  (1) Peer review is a lot more powerful when you can review the data itself, not just what the paper says about the data. In bioinformatics, we've known this for years, which is why you absolutely can't publish a paper concerning a microarray experiment without making the raw data available in GEO [nih.gov] or a similar repository.
  (1.5) Any high-throughput experiment generates enormous amounts of data (that's pretty much the definition of "high-throughput") and that data is very often useful for answering questions
- Re: (Score:2)
  
  by innerweb ( 721995 ) writes:
  
  Marketing.
Are you shitting me? (Score:1)

by Anonymous Coward writes:

Sage is a spinoff of Merck as Rosetta Inpharmatics. Rosetta died and Sage emerged from it. The spin was that Merck has deposited thousands of clinical mouse strains that supposedly worth tens of millions USD. I don't buy it.
I know Stephen Friend has been "promoting" his idea of pooling genetic data. The pitch is that by pooling, his company can offer "better" analysis. However, by "pooling" means for his company's (Sage) use and NOT for public good. This article is absolutely misleading! I'm speaking as som
Cause and effect (Score:2)

by Anubis IV ( 1279820 ) writes:

Facebook and Twitter may have proven that humans have a deep-seated desire for sharing
If anything, sharing is merely a byproduct of the actual desires that drive those sites. Desires and tendencies such as showing off, egoism, seeking acceptance, seeking affirmation, or gathering information. And more so in the case of Facebook than Twitter, given the studies that repeatedly indicate that Twitter's graph is structured as a news graph rather than as a social graph (what was the last statistic? That the top 1% of Twitter users produce 98% of the tweets that get retweeted?).
To suggest that shar
Why no sharing? (Score:2)

by Syberz ( 1170343 ) writes:

Easy, this is why: $
Except for the companies providing the publishing service, nobody makes money with your tweets. On the other hand, complex drug interaction and related biomedical info is a potential source of great return which is gathered at great risk for the company (they can spend millions on R&D and get nothing currently usable out of it). Biomed companies will not share info that a competitor could use.
Ahhh yes open source in Bio med... (Score:1)

by Schmyz ( 1265182 ) writes:

...soon to usher in the "lawsuit era for the bio-med hack" I can see it now...somewhere some how some yo-yo is going to state open source lead to a hack into some company's personal bottom line. (i.e. personel info, pay scale, profits info, secret formula for some new ED med...etc)

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

So long as there is money to be made (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:1)

Re: (Score:1)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re:Incorporating this "Standard" (Score:4, Informative)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Tons of money involved (Score:1)

Re: (Score:1)

Twitter and Facebook .. sharing? (Score:2)

Adverse drug events and duplicity (Score:2)

Re: (Score:2)

Two questions (Score:1)

Re: (Score:2)

Re: (Score:2)

Are you shitting me? (Score:1)

Cause and effect (Score:2)

Why no sharing? (Score:2)

Ahhh yes open source in Bio med... (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals