Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Open-Source Bioinformatics Programs?

Cliff posted more than 9 years ago | from the niche-programs-on-a-different-platform dept.

Biotech 28

An anonymous reader asks: "This summer I have the opportunity to work in a bio research lab creating a web site for data about proteins. Part of my job is to do bioinformatic analysis of the proteins to determine what types of support their are for the preliminary gene predictions. I have been using DNA Stryder (a Mac program) for sequence alignments plus translations from DNA sequences to protein sequences, and I was wondering if any of the Slashdot crowd knew of similar programs for Linux? I have looked into Bioperl , Biopython, EMBOSS, and BioConductor, but they seem to be more oriented towards servers and less towards stand-alone applications. What programs would you suggest, especially those that might be geared more towards biologists rather than computer scientists?"

cancel ×


Sorry! There are no comments related to the filter you selected.

Freshmeat (2, Informative)

abradsn (542213) | more than 9 years ago | (#13140304) []

If you go here and have a look you will see some interesting programs that meet your needs. I was looking for some biochem programs the other day in this web site.

Re:Freshmeat (-1, Troll)

Anonymous Coward | more than 9 years ago | (#13141923)

Sometimes when I stroke myself to much i spoge on my head and have to get hector to pee on me.

Read the O'Reilly book (3, Informative)

wayne606 (211893) | more than 9 years ago | (#13140452)

"Developing Bioinformatics Computer Skills"

This has lots of useful information and references and is a great starting point. It might be a bit dated, though.

Re:Read the O'Reilly book (2, Informative)

dmaduram (790744) | more than 9 years ago | (#13143064)

Regarding books by O'Reilly, I'd also recommend Beginning Perl for Bioinformatics [] , and, to a lesser extend, Mastering Perl for Bioinformatics [] -- quite personally, our lab has been using several custom-built sequencing tools, but I've found that Perl always gets the job done faster.

PS: Personally haven't checked this out, but you might want to take a gander at O'Reilly's Sequence Analysis in a Nutshell: A Guide to Tools []

Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases pulls together all of the vital information about the most commonly used databases, analytical tools, and tables used in sequence analysis. The book contains details and examples of the common database formats (GenBank, EMBL, SWISS-PROT) and the GenBank/EMBL/DDBJ Feature Table Definitions. It also provides the command line syntax for popular analysis applications such as Readseq and MEME/MAST, BLAST, ClustalW, and the EMBOSS suite, as well as tables of nucleotide, genetic, and amino acid codes. Written in O'Reilly's enormously popular, straightforward "Nutshell" format, this book draws together essential information for bioinformaticians in industry and academia, as well as for students. If sequence analysis is part of your daily life, you'll want this easy-to-use book on your desk.

OS Bioinformatics software (3, Informative)

Neil Blender (555885) | more than 9 years ago | (#13140496)

Most useful open source bioinformatics software is going to be geared toward biologists with at least some programming and unix skills. A lot of it was written by bioinformaticians which tend to lean more toward the informatics than the bio. They get more caught up in the technical aspects of the feild rather than the biology of the problem looking to be addressed. Unfortunately, the same can be said about most commercial bioinformatics software as well.

On the flip side, when people more interested in the biology than the technology write software, they tend to write just enough to get the job done and then stop. Software from this camp is often buggy and has a bad UI or no UI at all. It gets the job done, but only if you know exactly how to use it.

Anyway, you might want to take a look at R - [] . It's more geared towards statistics but it does have some protein modules.

Hold on (2, Interesting)

Laxitive (10360) | more than 9 years ago | (#13140656)

I'm working on something to do basic sequence analysis in my spare time right now.

I'm not a bio researcher, but I am a programmer and I work in the field. My father is a biology researcher and we've been talking about putting together a GUI app that interfaces with various tools to provide an easy interface to common tasks that bio researchers have: basecalling, vectormasking, clustering, sequence alignment, along with a nice GUI that lets you play with the results (search the results, order them, associate them with different databaes, relate them to gene ontologies - essentially a powerful set of data visualization tools).

It's all focused around EST management. Our goal is to get an app that a non-power-user can get up and running, out-of-the-box, for managing small sets of libraries.

It's pretty obvious that there are a very solid set of OSS base tools that implement the algorithms for doing analysis on ESTs, but in terms of glue apps that bring all of the tools together into a cohesive whole, there's not much out there.

What IS out there is hopelessly complex to expect an average bio researcher with little time on his hands to get up to speed with ("Download and install mysql? wtf?").

The problem is that most of these tools are geared towards large institutions with dedicated bioinformatics departments. They have the resources to hire a couple sysadmins and programmers and set up a high-throughput management system. Most small-timers don't really have the resources to get these apps working. I want to write something targeted specifically to smaller labs.

I just started writing it last weekend.. so it's not like there's much there yet - OTOH I have programmed a high-throughput EST management system for my work, so I have a good idea of most of the design issues.

An OSS app in this area would rock. It's a great opportunity to add to the wealth of OSS tools in the field, and I think it would solve a real need.

If you want to talk about it, reply to this post and also send me a message kav062 at yahoo dot com.


Re:Hold on (1)

merdark (550117) | more than 9 years ago | (#13141527)

Did you do a literature search? There are many such applications that already exist, or are in development. It would be wise look up existing solutions so that you don't unintentially repeat the same mistakes.

Re:Hold on (1)

Laxitive (10360) | more than 9 years ago | (#13141774)

Well, I actually implemented a custom webapp to do this for my work. That project is a bit more heavyweight - proper DBMS backend, robust web frontend, flexible access control over data - mostly stuff that big institutions care about. Before we started that project we went around looking to see if there was work that was already done that we could modify to make work for us.

We looked at various applications (magpie, emboss, gboss, others) and database schemas (BioSQL, others from application papers). We didn't really find anything that fit our needs.

What I'm working on now (in my spare time) is a scaled down, standalone version of that, without a DBMS backend, without a web frontend, and without a bunch of the management features (users/groups/permissions/etc.). I assumed that since we didn't find anything before when we were starting the webapp, that there wasn't anything much out there - otherwise we would have run across it.

On the other hand, it's entirely possible that our search was not thorough. If you know of apps that implements something similar to what I was talking about, please let me know - I'm curious what's out there. Links or names of papers would be greatly appreciated.

And you're very correct, I don't want to repeat work if it's already been done.


Re:Hold on (0)

Anonymous Coward | more than 9 years ago | (#13143855)

The reason I posted this question is because I haven't found anything ... what you're developing would fit perfectly for what I'm doing. I haven't found anything decent like Stryder for Unix or Linux... and it seems that development has stopped on Stryder...


Re:Hold on (1)

merdark (550117) | more than 9 years ago | (#13144934)

I can't think of anything off the top of my head as GUI's are not my area of research. I have come across many a failed GUI though, which is why I asked if you'd done a good background check.

It not entirely clear from your description what your goals are, but it sounds maybe something like the GCG web interface (Wisconson Package). That program is not open source, and I think EMBOSS is essentially an open source versoin of the GCG command line programs. A quick search reveals at least a few projects attempting to put a front end on EMBOSS.

Also, I remember an old project at sourceforge that was attempting to do a bioinformatics workspace type GUI. I forget the name now, and it probably died as that was many years ago.

At any rate, good luck with your project. GUIs are fickle things. It's difficult to hit the nail on the head (probably because there are so many different types of nails!).

Perl is not optional (1)

rakarnik (180132) | more than 9 years ago | (#13140846)

If you have a real interest in bioinformatics, I cannot stress enough that you should learn Perl. Even if you are a biologist by background, Perl is not like Java or C, and stresses more on getting things done rather than on abstract computer science concepts.

Once you learn Perl, using something like BioJava will give you all you need to handle sequence data. For instance, you could build a data pipeline that you use on all of your sequences of interest, instead of a graphical tool which pretty much forces you to do alignments and such one at a time.

Now there are some tasks that will require a graphical tool (editing alignments is an example), and one free tool you could use is JaMBW [] . There is also a list of open bioinformatics software for Linux [] (generally will be Java or Perl, occasionally C) hosted at

ApE plasmid editor (1)

euk (545387) | more than 9 years ago | (#13140940)

You should look into ApE,
it has many of the same functions as Strider, plus some that strider dosen't have.
Works on windows, linux, OS X etc []

Try Chimera and BioKnoppix (2, Informative)

frenchs (42465) | more than 9 years ago | (#13141098)

Well, I'm %99.9999 sure that you can get BioPerl running on a Linux box. Also, for a fun project, grab a copy of your sequence databases of choice and try to install BLAST on the Linux box.

That said, take a look at Chimera [] , which is an app written at UC San Francisco. It is mostly useful for visualizing, but I know there is a sequence viewer, and some other tools in there too.

Now, for all the aspiring bio geeks I give you BioKnoppix [] . Go download and burn the ISO. Then use that CD to boot any x86 box into a full Linux install with many of the popular bioinformatics tools already installed.



My ex-job (1)

perrin5 (38802) | more than 9 years ago | (#13141126)

It seems to be a sad but true fact that the tools are fractured, and most definately NOT user friendly. BioPerl is nothing more than a perl module allowing you to plug into the NCBI blast tools to automate processes. It was truly the most useful module I ever found, but I was writing web-scripts in perl at the time, not attempting to plug ready made systems in.

My ex-employer had produced a standalone/server webserver that integrated many of these tools, but market forces, and a lack of VC forced them to shut down.

The problem will continue because every scientist has specific needs. Until there are defined protocols for identification of new genes, or processing of data, there will be no ready made products for them.

I encourage everyone interested to check out for ways to contribute to the cause in the meantime.

Re:My ex-job (1)

Neil Blender (555885) | more than 9 years ago | (#13141734)

Until there are defined protocols for identification of new genes, or processing of data, there will be no ready made products for them.

Unfortunately, this is what happens when you try to do this: You get a biologist, a bioinformatician, an object/data modeler and a DBA (and 20 other people of various disciplines) in a room and they spend years concocting the 'perfect' set of protocols/models/standards. They attempt (and fail repeatedly) to create a model that is all encompassing for whatever process they are dealing with. GEML, MIAME, MAGE, etc - they are all dismal failures. MIAME is a perfect example - the M stands for minimal but MIAME is a horribly complex object model that, in my mind, is worthless.

Re:My ex-job NOW: another data standards resource (0)

Anonymous Coward | more than 9 years ago | (#13166686)

Thanks for the pointer.
If you're interested in working on data standards for bioinformatics, particularly in the health arena, you might take a look at the tools and standardization efforts underway at NIH's caBIG(t)--the cancer bioinformatics "grid". It's aimed at linking various cancer centers and university hospitals' systems so they can share findings and stats and ultimately help defeat cancer. Lots of good work in that area.

>>The problem will continue because every scientist has specific needs. Until there are defined protocols for identification of new genes, or processing of data, there will be no ready made products for them.

(I'm an HCI researcher and non-coding technical professional, so I'll remain an anonymouse for now. :D )

Subject Doesn't Match the Article (1)

Karma Farmer (595141) | more than 9 years ago | (#13142733)

You know, the subject seems to have nothing to do with the article. The submitter was looking for a program for linux. Cliff, however, wrote a completely misleading headline claiming the poster was looking for an open source program.

Here's a clue, you stupid twit -- not all linux programs are open source.

Oriented Towards Servers? (3, Informative)

jmt9581 (554192) | more than 9 years ago | (#13142843)

I don't quite know what you mean when you claim that Bioperl, Biopython, EMBOSS and BioConductor are more oriented towards servers than stand-alone applications. First of all, servers and stand-alone applications don't divide up the application world into mutually exclusive parts. Applications can be stand-alone and run on a server for example. I've built applications using Bioperl that have a GU interface (take that grammar nazis!), and people are extremely happy with them. So, if you have a Perl guy nearby, I highly recommend talking to them about your problems.

Secondly, translations? Database searches? Sounds like you're doing some very basic Bioinformatics work. Not to say that your research isn't meaningful, just that the problems you're approaching are easily solved by a computational biologist. For example, here's a snippet of Bioperl code that will read in a set of GenBank sequences, translate them and print the results to a new file:

my $seqin = Bio::SeqIO->new( -file => 'myseq.gbk', -format => 'genbank' );
my $seqout = Bio::SeqIO->new( -file => '>translated.gbk', -format => 'genbank' );
while ( my $seq = $seqin->next_seq ) {
$translated_seq = $seq->translate;
$seqout->write_seq( $translated_seq );

Seems pretty simple, right? There are similar, simple wrappers around BLAST, FASTA and some other common algorithms in computational biology. Check out the Beginners HOWTO [] on the Bioperl website, it explains Bioperl without requiring previous CS experience. I think it's a good intro, but I also wrote it so I'm slightly biased.

If programming is not your style, check out JEMBOSS [] . It's a Java-based GUI wrapper for EMBOSS [] .

Cheers and good luck.

Bioconductor... (1)

ByronEllis (22531) | more than 9 years ago | (#13142855)

Actually, Bioconductor's original intent was to provide a platform for statisticians doing research on analyzing high-throughput experiments (as opposed BioPerl and friends which AFAIK originally arose to deal with sequence analysis and the myriad of tools that go with it). To date, this has principally meant microarray experiments and such, though there is also a lot of support code for manipulating things common to Bioinformatics (annotations, ontologies and so on). Its also used by a fair number of biologists these days to actually analyze their experiments which has led to the development of some simple GUI interfaces.

Useful bioinformatics programs (2, Informative)

axolotl_farmer (465996) | more than 9 years ago | (#13143043)

I have used Clustal [] for multiple sequence alignments. There is a gui (ClustalX) and a scriptalbel command line version (ClustalW). Available for all platforms and source included with the download.

Also keep an eye on POY [] that does direct optimization on sequences. Also available for all platforms with BSD style licence.

For just viewing and manual editing of alignments there is BioEdit [] . Free, but not open source. Windows only.

For a general sequence assembly/analysis/kitchen sink approach try the Staden Project [] . Open source and available for Windows, Linux and OSX.

Hope this is useful. I have never worked with protein sequences, but I have done a lot of DNA sequenceing and alignment!

Shameless plug (2, Interesting)

Bjarne Knudsen (902081) | more than 9 years ago | (#13143872)

My company, CLC bio [] , just released a free bioinformatics application that works for Linux, Mac and Windows (not open source). It was designed with molecular biologists and biochemists in mind, rather than bioinformaticists. Thus, a lot of effort has been put into the user interface.

The program is a 0.9 beta and so far it only has basic functionality: GenBank searching, DNA/RNA to protein translation, alignment, tree reconstruction, graphical viewers, and a few other things. More will follow in the coming months.



Two pointers... (1)

gkoczyk (859261) | more than 9 years ago | (#13144982)

If you are running Fedora/SUSE two good sites, to browse are BioLinux and BIOrpms: [] []

The sites have RPMs for most of the basic packages along with descriptions (some packages are a bit old, however).

Here's an easy one that works on Linux + Windows (1)

Rudi Cilibrasi (853775) | more than 9 years ago | (#13148293)

Check out [] I've used it for many different applications, including genomics and proteinomics. It can be used by novices or experts easily as it is parameter-free.

Genezzo as an OSS database for bioinformatics (1)

andrewzx1 (832134) | more than 9 years ago | (#13149522) [] is a a unique and innovative open source database that may be useful to ambitious bio-infomaticians. It's goal is to handle extremely large information stores and yet provide extremely flexible schemas. It's developers include some fairly seasoned database professionals.

Genetic Data Environment (GDE) for Linux and Mac (1)

Linuxathome (242573) | more than 9 years ago | (#13150660)

I've tried this out recently:

GDE []

I haven't done intense work with it recently. It appears to be a GUI for a huge collection of software.

If you just need a program for molecular biology work (DNA sequence and protein sequence analysis, organization, and publication quality layouts) then I suggest you check out Clone Manager 7 [] . It's very pricey, but if your lab can afford it, it's a good piece of software. I know it's a *dows only program, but I can confirm that it works well with the latest version of Wine. Some work is needed to get the drives mapped correctly in wine, but it was well worth it for me. Imagine being able to NX client (like VNC) into your lab's machine and work on your sequences from home.

Canadian Bioinformatics Workshop lecture notes (0)

Anonymous Coward | more than 9 years ago | (#13162238)

A while back I took the Canadian Bioinformatics Workshop and we did a lot of installing and tinkering with linux-based tools. They offer their lecture notes on-line [] . Contra-topic, but in case it should prove useful - Nucleic Acid Research's yearly roundup of webserver based programs for 2005 [] has just come out.

where to start (1)

Dazed yes I am (885362) | more than 9 years ago | (#13181930)

There are many Linux tools. But you want to start by understanding the problem space. "Bioinformatics for Dummies" is a great place to start; although it is web-oriented, many of the programs are downloadable for linux. But really, try to get a deeper understanding of the problem - Mount's text "Bioinformatics" is well worth the money, and the second edition looks like an improvement in many ways - and the reference everyone uses (but this will induce pain) is "Biological Sequence Analysis". These three books and being in a lab will make things move very fast. Ok, if you must get a fast answer... download EMBOSS for the set of well-tested robust utilities (warning: installation a slight pain) and NCBI's blast for sequence alignment. Then write scripts to integrate. Fast easy robust.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>