Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Gamers Outdo Computers At DNA Sequence Alignments

Soulskill posted more than 2 years ago | from the must-not-be-using-watson dept.

Biotech 61

ananyo writes "In another victory for crowdsourcing, gamers playing Phylo have beaten a state-of-the-art program at aligning regions of 521 disease-associated genes form different species. The 'multiple sequence alignment problem' refers to the difficulty of aligning roughly similar sequences of DNA in genes common to many species. DNA sequences that are conserved across species may play an important role in the ultimate function of that particular gene. But with thousands of genomes likely to be sequenced in the next few years, sequence alignment will only become more difficult in future. Researchers now report that players of Phylo have produced roughly 350,000 solutions to various multiple sequence alignment problems, beating the accuracy of alignments from a program in roughly 70% of the sequences they manipulated."

cancel ×

61 comments

would be interesting to mine their data (5, Interesting)

Trepidity (597) | more than 2 years ago | (#39332039)

I'm highly skeptical that these gamers are really using some un-automatable human-only deep skills, especially since they aren't exactly extensively trained in this game, not to the level of, say, good Go players. So the interesting question to me is not that they beat current algorithms, but whether data mining these hundreds of thousands of alignments can tell us something about how they're doing it. My guess is that there are some heuristics that can be mined from this data that would massively speed up search.

That's a more general point about how these stories are always pushed, though, sometimes by media, sometimes by the researchers themselves. Imo the most exciting thing about successful uses of "human computation" isn't that we can harness people to do things, but that we can gain some large data sets that will make it so we don't have to get people to do them anymore. Or at least, that should be the baseline, imo: that humans can beat some hand-crafted algorithm is one thing, but can they beat machine-learned algorithms trained on those humans' own gameplay logs?

Re:would be interesting to mine their data (5, Funny)

K. S. Kyosuke (729550) | more than 2 years ago | (#39332163)

Perhaps they could make it into yet another captcha: "If you want to download your porn movie, please align the following two DNA fragments." :) If people can be made to do OCR for others, why not DNA alignment?

Re:would be interesting to mine their data (3, Interesting)

tibit (1762298) | more than 2 years ago | (#39332291)

This is not as silly as you might think. If it weren't for generally fucked up academic politics, this would work wonders. Get a bunch of popular porn sites to accept phylo points as payment. My bet is that there'd be plenty teenagers and basement dwellers who can trade plenty of time for the money they don't have to pay for porn :)

Re:would be interesting to mine their data (5, Funny)

ottawanker (597020) | more than 2 years ago | (#39332921)

Worst case scenario is that the crackers write a really good DNA Sequencing programs to beat the captchas.

Re:would be interesting to mine their data (1)

tibit (1762298) | more than 2 years ago | (#39333075)

Wank and contribute to science in TDMA fashion, FTW!

Re:would be interesting to mine their data (1)

ooshna (1654125) | more than 2 years ago | (#39336099)

This would have worked 10 years ago but now its easier to find porn about anything you want than it is to find pics of celebrities babies.

Re:would be interesting to mine their data (1)

GrumpySteen (1250194) | more than 2 years ago | (#39343359)

Unless you're trying to find porn of celebrities' babies, that is.

Re:would be interesting to mine their data (0)

Anonymous Coward | more than 2 years ago | (#39336957)

Might as well have them contribute their new DNA samples while you're at it.

Re:would be interesting to mine their data (0)

Anonymous Coward | more than 2 years ago | (#39332573)

Make it a DNAVille in facebook or something. Come to it, we should find a way to make a Riemann-Ville or something like that.

Re:would be interesting to mine their data (1)

vreinharz (1519541) | more than 2 years ago | (#39334661)

Seriously, you have a really good idea. I work for the prof. who did Phylo and I need to tell him now.

Re:would be interesting to mine their data (1)

rosieannemayers (2522910) | more than 2 years ago | (#39337049)

That's actually a good idea. You're getting off and helping science at the same!

Re:would be interesting to mine their data (2)

rish87 (2460742) | more than 2 years ago | (#39332187)

I agree 100% with the sentiment of figuring out how the players make the decisions and use it as new heuristics. The MSA problem isn't that computers cannot get the optimal solution, the problem is doing it quickly. Given enough time, a computer will always outdo or match a human. What needs to be done is improve the existing computational algorithms with heuristics learned from these players. Then we have much better results at a much faster rate.

Re:would be interesting to mine their data (3, Interesting)

mug funky (910186) | more than 2 years ago | (#39334191)

wouldn't the problem at hand be NP-hard? maybe that's why gamers are beating the algos?

could this be a new way to "monetize" the internet? outsourcing hard problems for cash. with a cloud paradigm, it doesn't matter whether it's a cluster of computers or a crowd of aspies when the end result is the same.

Re:would be interesting to mine their data (1)

Anonymous Coward | more than 2 years ago | (#39334347)

Yes, the problem is NP-hard. Computers can solve NP-hard problems, just the algorithms to do so are often too slow to be useful so approximation algorithms are used instead. The humans are competing against results generated by an approximation algorithm. Having humans do the computation is more or less a different approximation algorithm. Given enough time, a computer could simply work out the full solution, but the amount of computation would be way too high.

Paying people for better results would be an interesting model. There is the problem of the perverse incentive to keep the algorithm secret though: if you come up with a better algorithm, you want to get paid to run it on instances, not tell the researchers so they can run it themselves.

Re:would be interesting to mine their data (1)

biodata (1981610) | more than 2 years ago | (#39336993)

Is there another aspect to this, other than simply hardness? We can talk about exact solutions, and approximate solutions, but both are dependent on having some scoring metric that 'knows' what the correct solution is. In real alignment, we do not actually know what the answer is (assuming that the purpose of the alignment is to find the most likely evolutionary relationship between the bases in the sequences). When bases have been inserted/deleted and mutated, it is not necessarily possible to tell what happened, and in what order, so the best scoring method for alignments is in itself not necessarily known/knowable. In the end we are left trying to score alignments on the basis of parsimony. Maybe it makes sense that humans have evolved an ability to rapidly find moves likely to lead to parsimony (we have learnt how to optimally use resources), whereas automated solutions these days tend to focus on hill climbing and a variety of MC approaches?

Re:would be interesting to mine their data (1)

mattack2 (1165421) | more than 2 years ago | (#39345127)

Paying people for better results would be an interesting model. There is the problem of the perverse incentive to keep the algorithm secret though: if you come up with a better algorithm, you want to get paid to run it on instances, not tell the researchers so they can run it themselves.

Though they could have a separate large payment for an algorithm. Sure, it wouldn't be as much as paying for the work over time forever, but the algorithm inventor/discoverer is betting that someone else doesn't come up with the algorithm in the future too. The payment for algorithm would be similar to the pulled Netflix prize.

Study the moves (1)

biodata (1981610) | more than 2 years ago | (#39337023)

My hypothesis is that humans may have learnt how to find a path to parsimony. We have evolved to use resources efficiently, so finding stepwise approaches that use resources most parsimoniously would have been important. MSA seems like mostly a parsimony problem - what arrangement of bases most parsimoniously explains the likely evolutionary relationships. Typical computational approaches to this involve MCMC and various more or less random moves to try to find the most parsimonious solution. Humans are clearly using many less moves than computers to solve this, so are much better than computers at seeing where the best hill is, and climbing that hill directly, rather than randomly exploring the likelihood landscape. We should find a way of classifying the moves people are making to discover whether they can see the big picture, or whether they are just very efficient at exploring the landscape.

Re:would be interesting to mine their data (1)

KDR_11k (778916) | more than 2 years ago | (#39340711)

The basic visual sense of a large animal includes an insane amount of brainpower for pattern recognition, interpretation and such things. There's a reason even very dumb animals can maneuver through the world much faster than our smartest robots. The heuristics used by humans in tasks like that are likely backed by the enormous processing power of the brain when it comes to analyzing pictures and patterns so they may not be terribly useful for computers.

Re:would be interesting to mine their data (1)

Forty Two Tenfold (1134125) | more than 2 years ago | (#39332219)

Cf. earlier summary about a similar achievement in protein folding [slashdot.org] .

Re:would be interesting to mine their data (0)

Anonymous Coward | more than 2 years ago | (#39332225)

If only we had a genetic algorithm to mine the data. Oh, wait...

Re:would be interesting to mine their data (1)

Anonymous Coward | more than 2 years ago | (#39332239)

I don't have the data to look through, but the general process of a human learnign new rules can be described as a sort of 'brute cunning algorithm.' It starts as a brute force, but recognizes certain trends, assumes consistency, and then makes jumps, narrowing back until they find a peak. Each person will display a different balance of brute force and portion skipping, so with a large enough gamerbase, you will get a collection of results that includes local maximums and a good chance of the true maximum. Especially if 'top score' is a universally known value that can keep people competing after they find a local maximum.

Re:would be interesting to mine their data (0)

Anonymous Coward | more than 2 years ago | (#39332255)

It could be an algorithm that only works on a massively parallel processor though, and that could make it difficult to comprehend. Implementing it on anything less than a processor with as many cores as neurons involved in the process might not be possible.

Re:would be interesting to mine their data (0)

Anonymous Coward | more than 2 years ago | (#39332259)

'"Imo the most exciting thing about successful uses of "human computation" isn't that we can harness people to do things,...'

"Nine times seven, thought Shuman with deep satisfaction, is sixty-three, and I don't need a computer to tell me so. The computer is in my own head.
And it was amazing the feeling of power that gave him. "

Re:would be interesting to mine their data (0)

Anonymous Coward | more than 2 years ago | (#39332271)

I'm guessing that automating this is similar to automating voice recognition.

Re:would be interesting to mine their data (0)

Anonymous Coward | more than 2 years ago | (#39332277)

I'm highly skeptical that these gamers are really using some un-automatable human-only deep skills, especially since they aren't exactly extensively trained in this game...

Humans are amazingly good at some things--object recognition, for example. That is not totally unrelated to what is going on here. Had there been evolutionary pressure to solve computational problems quickly, there is no doubt in my mind that we would be able to do that far faster than any (contemporary) computer. Obviously, that would eventually change.

Re:would be interesting to mine their data (3, Insightful)

Trepidity (597) | more than 2 years ago | (#39332379)

That's true; a legitimate hypothesis is that this task involves very difficult skills that humans are naturally adept at, like object recognition in images does. My guess is that aligning DNA sequences is not as strong an example of one of those kinds of problems as object recognition, in particular because it doesn't involve the large amount of general knowledge about the world that we bring to bear when interpreting scenes; aligning sequences is more of a "formal" problem, than recognizing what constitutes a "chair". But I'll admit I could be wrong. One way to find out would be to try to see how much can be mined from the data. ;-)

Re:would be interesting to mine their data (1)

mug funky (910186) | more than 2 years ago | (#39334225)

brains don't recognise "chair", so much as they recognise $object and we are trained by our environment to be good at spotting whatever $object is our specialty. this doesn't take long, and i'd reckon that a problem like this would take minutes of training to get equal or better than a computer, and a bit longer to far surpass it.

brains are very plastic, changable things.

Re:would be interesting to mine their data (1)

Trepidity (597) | more than 2 years ago | (#39334473)

I don't think that's universally true; for example, computers are much better at characterizing integer sequences than (almost?) any human is, because humans are just not that good at integer sequences, especially those with any sort of non-trivial mathematical relationship underpinning them. Humans are good at a fairly specific set of pattern-recognition tasks, like object recognition in images. Even there, they vary surprisingly strongly by the specific nature of the task; for example, humans are much better at recognizing faces than just about anything else--- and this ability is so specialized that humans generally can't process [wikipedia.org] upside-down faces nearly as well.

Re:would be interesting to mine their data (0)

Anonymous Coward | more than 2 years ago | (#39332355)

i tried, this is not a game. its work!

Re:would be interesting to mine their data (1)

whydavid (2593831) | more than 2 years ago | (#39332839)

Agreed. I would be interested to see what the researchers learned from this exercise in terms of improving MSA algorithms. Perhaps the performance of the human players suggests that aligning a small subset of the problem with a high quality alignment algorithm before completing the problem with a run-of-the-mill algorithm is the way to go. The fact that puzzles completed repeatedly were where the phylo solutions performed best would indicate that running this first algorithm repeatedly with some element of random error might lead to a better solution. Whether or not this is computationally feasible is another question that begs to be answered. In the meantime, it would be interesting to see a "puzzle of the day" where researchers can upload a current MSA problem they need a good solution to in order to use phylo to help with current research questions.

Re:would be interesting to mine their data (0)

Anonymous Coward | more than 2 years ago | (#39334143)

Or at least, that should be the baseline, imo: that humans can beat some hand-crafted algorithm is one thing, but can they beat machine-learned algorithms trained on those humans' own gameplay logs?

Yes, see any attempt of this that has been done to date (most of the top games today actually incorporate learning algorithms that do just this to keep them challenging - the issue stems from the lack of original creativity when competing against Humans [though they seem to be pretty good at non-strategic trivia games such as Jeopardy]).

Re:would be interesting to mine their data (1)

KDR_11k (778916) | more than 2 years ago | (#39340837)

Games only learn what they are programmed to learn. They aren't remotely as flexible as large animals because that kind of learning is so damn complex that it'd take way too much processing power. To learn from an action you first have to understand what the action was, games only store simple things like "player likes to use item X" but if a situation arises that the designer of the algorithm has not anticipated the game cannot adapt. Clicking on a button does not tell you why clicking the button was the right thing to do in the situation and that may be a vital difference. If you look for a pattern you have to define which factors to include in the pattern and the human may have used factors that you didn't include. Maybe humans like aesthetically pleasing color arrangements and try to reach them but how can a computer even see that that happened?

Simply looking at the entire system state would be problematic because generating a library of patterns out of that could easily take more effort than simply following your old computer algorithm.

Re:would be interesting to mine their data (0)

Anonymous Coward | more than 2 years ago | (#39337853)

Human brain is capable of performing using multiple scientific methods on the same time which are very hard to automate because human brain sources data from it's vast memory and experience as well it's trained skill is based on it's exceptionally complex environment and it's absolutely opposite of something you wish to automate.
Actually it is really perfect approach - computer solves 50% of the problem easy way, but to reach another 50% you would have to use unbelievably complex algorithms - or just use a human intuition.

Re:would be interesting to mine their data (0)

Anonymous Coward | more than 2 years ago | (#39344253)

As the old adage goes: this machine has no brain, use your own.

Brilliant! (0)

Anonymous Coward | more than 2 years ago | (#39332053)

Whenever I read about this stuff, it never ceases to amaze me how brilliant it is that they are harnessing the power of video games to solve problems like this. +1 for Human ingenuity.

Re:Brilliant! (2)

GrumblyStuff (870046) | more than 2 years ago | (#39332133)

Makes the original premise of The Matrix that much better than the "lol we're batteries!"

Re:Brilliant! (1)

Moheeheeko (1682914) | more than 2 years ago | (#39332195)

We processor now!

Diff? (1)

Relic of the Future (118669) | more than 2 years ago | (#39332211)

So can we extract any insights from this, and use them to improve diff?

Imagine... (2)

KillAllNazis (1904010) | more than 2 years ago | (#39332233)

... a beowulf cluster of these!

Re:Imagine... (2)

lister king of smeg (2481612) | more than 2 years ago | (#39333273)

call me when you can make a Beowulf culster of human brains, i bet porting C will be a real bitch though

Re:Imagine... (1)

NemoinSpace (1118137) | more than 2 years ago | (#39335335)

We have that. We call them nations. They don't play well together for any length of time.

Hooray for 'Many eyes' (1)

Picass0 (147474) | more than 2 years ago | (#39332333)

A fantastic example of why the building blocks of human life should not be patentable and hidden away by pharmaceutical companies.

Re:Hooray for 'Many eyes' (0)

Anonymous Coward | more than 2 years ago | (#39332581)

You could say patents on life, not just human, are invalid. That would take care of the Monsanto problem as well.

Re:Hooray for 'Many eyes' (0)

Anonymous Coward | more than 2 years ago | (#39334363)

You don't need to be open to use "many eyes". It's well-known that BGI (the least "open" bioinformatics group in the world) has a lot of people to do precisely this.

The problem is that science demands that experiments be replicable. If you give the same data to BGI twice, you get two different answers.

Time limit (0)

Anonymous Coward | more than 2 years ago | (#39332603)

Why do they put a time limit on the game?
I don't see what good reason there is to force people to do it quickly rather then give them all the time they need to make the best sequence?

Re:Time limit (1)

tibit (1762298) | more than 2 years ago | (#39333021)

Not only that, they also seem to be illiterate. Having to watch a video tutorial, narrated by a girl who couldn't read a book for kids lest her live be saved, just to learn how the damn thing is scored? I thought they know how to write, being in the academia and all? Their results are obviously secret, because if you just happened to educate yourself on a puzzle that you can't solve, the par results are inaccessible. Big stinkin' sikret, I tell ya. But no, they must have included the stupid car game countdown, and time-wastin' transitions. And they think that just because it requires a mouse hover and has a fade-in and fade-out, it must be cool. I've been having a lot of genuine fun with the zooniverse project; phylo in comparison seems done by braindamaged web designers who never bothered using their own fine creation, and decided that once all the transitions, music and sfx are done, they'll proclaim it done. I have never been so genuinely disappointed by a project of that kind.

Re:Time limit (3, Interesting)

KhabaLox (1906148) | more than 2 years ago | (#39333639)

I haven't played the "game", but I suspect that there are a lot of things like time limits that can serve as a motivation factor that actually increase user output in the aggregate. Having a time limit can give you a sense of urgency that will force you to work faster. The error rate may increase, but overall productivity could still be higher given that higher number of "answers" given per unit time.

Imagine two Magic The Gathering players. One assembles decks painstakingly, spending hours crafting card ratios just right, and researching combos to get the perfect balance of # cards to power of combo. Then he play tests it, goes back and makes adjustments, etc. The other throws decks together quickly and play tests them very quickly. He adjusts the deck without as much deliberate thought, but rather more quickly (perhaps intuitively). He is able to iterate much faster, and it's easy to imagine that if each player were given 1 month to pursue these strategies, the latter could easily come out with more decks that met some minimum standard of success (that was suitably high).

(Obviously, it's easy to see how inane, useless rewards can spur gamers to expend more time and "contribute" more to the game... just look at badges, trophies, etc. But I think it's just as possible that "negative" reinforcement ideas, such as a time limit, can have the same effect.)

Re:Time limit (1)

vreinharz (1519541) | more than 2 years ago | (#39334645)

I can tell you that there is a a time limits and they work really hard to add useless rewards that will hook up the users :).

WHAT?!?! AMAZING! (1)

Thelinuxpenguin (2586473) | more than 2 years ago | (#39332665)

I just started playing and I am haveing a slight trouble with it. These people must be geniuses!

Achievement/Trophy Unlocked! (2)

Eponymous Hero (2090636) | more than 2 years ago | (#39332677)

Cured Lupus! 150G / Platinum Trophy

Read the fine print... (5, Informative)

whydavid (2593831) | more than 2 years ago | (#39332747)

This is an interesting finding, but let's not get too carried away. If you read the article, you'll see that: a) The phylo-based alignments are partial solutions. They are simplified for the human user by leaving many orthologous sequences out of the alignment. This means there is another algorithm that finishes these partial solutions before they can be compared to solutions produced solely by algorithms. b) Only 36% of the _best_ phylo-based solutions, once completed, were better than the algorithms' solutions. This is still an improvement, but it DOES NOT suggest that humans are better than computers at multiple sequence alignment. If you were to ever try to solve a real MSA problem by hand, you would quickly understand how completely hopeless it is. In fact, even aligning 2 sequences of any appreciable length by hand is a chore. The problem here is the misguided title: "Gamers outdo computers at matching up disease genes" which should read: "Gamers + computer outdo computers only at matching up very small fragments of disease genes, some of the time"

Re:Read the fine print... (1)

tibit (1762298) | more than 2 years ago | (#39333055)

I'm not surprised. Their UI is disgusting, their scoring rules hidden behind a most amateurishly done video (they must expect you to write down fucking notes), and the whole project just seems in-your-face obnoxious. What a let-down :(

Re:Read the fine print... (0)

Anonymous Coward | more than 2 years ago | (#39340219)

I tried following the video, but scores remained negative... There's obviously something they forgot to say about spacing/gaps or something.

Re:Read the fine print... (1)

Samantha Wright (1324923) | more than 2 years ago | (#39333433)

If you were to ever try to solve a real MSA problem by hand, you would quickly understand how completely hopeless it is.

Nope nope nope [u-strasbg.fr] . From scratch, perhaps it looks daunting. But the big parts are actually pretty easy. I should stress that BAliBASE is used as a benchmark for new alignment programs, including MultiZ (which, btw, is actually a little old now.)

Re:Read the fine print... (2)

whydavid (2593831) | more than 2 years ago | (#39336069)

BAliBASE is a great reference, but all of the sequence alignments in the database were refined from algorithmically-derived alignments (implemented on computers) in the first place. I think it furthers my assertion that computers + humans > either alone when it comes to MSA. Certainly, the sheer scale of the data would prevent any sort of economic use of manual global alignment, even if the local alignments were best carried out by biologists. Again, my issue here is that the article gives the impression that gamers have "outdone" computers at matching up disease genes, when in reality the gamers have been presented with a very small slice of the problem (as I'm sure you recognize better than I) and only outperformed the computer alone in certain scenarios, certainly not the blanket 70% quoted in the news piece.

Re:Read the fine print... (1)

Samantha Wright (1324923) | more than 2 years ago | (#39336197)

Wholly agreed—but it should be emphasized that the mere existence of BAliBASE asserts that the trickiest part still requires direct intervention. There are precious few things in the universe that a computer can do that a human can't do more slowly or in smaller chunks, after all—and most of those are comparatively silly things like set voltages. I could, for example, implement ClustalW by hand, no sweat—just give me your favourite BLOSUM table, a few other parameters for gap size, a stack of paper, and enough time: it's a pairwise alignment followed by a series of careful compromises to approximate something that looks fairly right. In general this approach achieves 96-97% accuracy [biomedcentral.com] on BAliBASE tests. Approaching the challenge blindly certainly looks daunting, but like using synthetic division [wikipedia.org] to solve higher-order polynomials, it's actually not that bad in terms of memory consumption or effort. Of course, there are some really heinously complex algorithms on that chart, like Mafft, which uses a Fourier transform to do something I still completely don't understand, but even the first FFT implementation required unit tests.

That giant tangent aside? The day Slashdot posts an accurate story about bioinformatics is the day I get tenure. :)

Working link (1)

dmt0 (1295725) | more than 2 years ago | (#39332837)

Link to the English version that actually works:

http://phylo.cs.mcgill.ca/eng/ [mcgill.ca]

In your face, Mom! (0)

Anonymous Coward | more than 2 years ago | (#39336243)

My Mother always told me those countless hours of Tetris were absolutely useless. Ha!

Failed. (1)

zandeez (1917156) | more than 2 years ago | (#39336897)

I couldn't even solve one puzzle, so gave up.

There is another really good article about this (0)

Anonymous Coward | more than 2 years ago | (#39337309)

right here
http://games.slashdot.org/story/11/12/07/0413238/video-gamers-advancing-genetic-research

Wait a minute...

Let's see... (1)

rocket rancher (447670) | more than 2 years ago | (#39347411)

Time on planet to optimize pattern matching algorithms --

Humans: Millions of years

Computers: Tens of years.

Not sure there is a story, here...

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...