Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Naming All Lifeforms On Earth With Hash Functions

timothy posted about 8 months ago | from the sort-of-a-nickname dept.

Earth 97

First time accepted submitter ssasa writes "A Virginia Tech researcher is proposing a new naming system for all life on earth [based on each organism's] genetic fingerprint — basically something like a hash function of an organism. Hash functions are in common use in software development. Hopefully it will pass some time before we see a hash collision between a cat and some dinosaur."

cancel ×

97 comments

Sorry! There are no comments related to the filter you selected.

The actual journal article (4, Informative)

Anonymous Coward | about 8 months ago | (#46313365)

For those that want to read the actual journal article
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0089142 [plosone.org]

The word hash is never mentioned either :)

Re:The actual journal article (2)

Fwipp (1473271) | about 8 months ago | (#46313405)

An important limitation of his approach is that it only works for "all organisms whose genomes can be aligned to each other." (With no mention of how "good" the alignment has to be, nor the fact that alignment is not objective.)

So, you'd have multiple schemas for each "group" of organisms. I think his idea is possibly applicable to, say, describing multiple samples within a species. It's clearly ill-suited for a universal naming strategy like the article proposes, though.

Re:The actual journal article (2, Insightful)

Anonymous Coward | about 8 months ago | (#46313471)

So are every two people who aren't twins going to have a completely different hash function?

Perhaps a better scheme would be to assign a function that describes the genetic similarity between two organisms. Well, we kinda already have that. We can use percentage. and if all organisms are 90 percent similar and only vary by ten percent, for instance, we can narrow our function to those ten percent. Create a new scale from 1 to 100 where the genetically most similar organisms would be grouped next to each other (a 1 would be genetically very similar to a 2 varying by a percentage of a percentage or whatever it scales to) and the least similar organisms would be grouped far apart (a 100 would be genetically least similar to a 1 varying by ten percent). Wait ... we kinda already do that.

So what are the advantages of this guy's ideas.

Re:The actual journal article (1)

Fwipp (1473271) | about 8 months ago | (#46313923)

Your method is actually pretty close to what the paper describes. The idea is that '00060000000D' and '00070000000F' are closely related, but '38439FDCA' and '921938312C' are not.

Re:The actual journal article (2, Insightful)

Anonymous Coward | about 8 months ago | (#46313983)

and the idea is nothing new except he is adding more digits and making it more confusing for us by removing the intuitive base ten that has been the scientific standard since the metric system and replacing it with something worse (kinda like how the 'standard' system is worse than the metric system).

Re:The actual journal article (2, Informative)

Anonymous Coward | about 8 months ago | (#46314063)

(same author)

I just read the article (I know, I should have read it before posting anything earlier instead of relying on the often misleading Slashdot summary) and while I don't really understand what he's doing it does seem to make more sense than what the Slashdot summary shows.

and, no, my original idea does not seem to be what the article proposes (my original idea is an obvious improvement over what the misleading Slashdot summary proposes but next time I should read the article before posting).

I agree that the current way organisms are named and classified can be inconsistent and confusing. A lot of time the usefulness is based on whether a particular bacteria produce a specific enzyme (enzymatic tests can be done to determine this and then substances that inhibit an enzyme can be used to stifle the bacteria). but what I found interesting about the article is that

"

With the naming scheme developed by Vinatzer, the name of every single anthrax strain would contain the information of how similar it is to other strains. Using Vinatzer's genome sequence, the Ames strain used in the bioterrorist attack would, for example, be known as lvlw0x and the ancestor of this strain stored at the U.S. Army Medical Research Institute for Infectious Diseases would be known as lvlwlx.

Vinatzer's naming convention would also give researchers the ability to name new pathogens in a matter of days—not months or years—based on their similarities to known pathogens.

The proposed naming process begins by sampling and sequencing an organism's DNA. The sequence is then used to generate a code unique to that individual organism based on its similarity to all previously sequenced organisms."

The article is kinda vague and I would like to see more detail on exactly how it works but it does seem like it has potential.

Re:The actual journal article (2)

GoodNewsJimDotCom (2244874) | about 8 months ago | (#46313507)

(With no mention of how "good" the alignment has to be...

Chaotic good suffices.

Re:The actual journal article (1)

VortexCortex (1117377) | about 8 months ago | (#46314565)

To the truly chaotic a good alignment is impossible.

Re:The actual journal article (0)

Anonymous Coward | about 8 months ago | (#46314619)

Well, duh. That's why it's called "Chaotic Good" instead of "Chaotic Neutral".

Re:The actual journal article (0)

Anonymous Coward | about 8 months ago | (#46315025)

Is that Cleric/Fighter/Mage leveling quickly enough for you? Up for another couple of hours of grinding?

Re:The actual journal article (1)

ultranova (717540) | about 8 months ago | (#46315825)

To the truly chaotic a good alignment is impossible.

Depends. The standard alignment grid uses cartesian coordinates. It's "natural" shape is a square where law-chaos and good-evil axis are separate, and ordinal alignments have a greater magnitude of difference from neutrality. On the other hand, the Great Wheel uses a polar alignment system where the type and magnitude of alignment form the axis, thus chaotic good is both less good than neutral good and less chaotic than chaotic neutral of same magnitude.

Ugh, fourth edition sucks. (1)

Arancaytar (966377) | about 8 months ago | (#46316537)

(...)

Re:The actual journal article (3, Informative)

Frosty Piss (770223) | about 8 months ago | (#46313513)

An important limitation of his approach is that it only works for...

Those who pay the licence (since it's being patented)?

Re:The actual journal article (1)

Anonymous Coward | about 8 months ago | (#46313523)

Yep! Sorry, I haven't read the article ... but I do know that there's really no such thing as a strain, meaning, a collection of genetically identical organisms. (Note: yeah, I know a virus may not fit strict definitions of "organism;" I'm using the term loosely.)

Viruses mutate continually -- which is why a "strain" is actually a collection of "closely related" organisms, that vary within some limit of permitted (by the classifier) of point mutations.

Re:The actual journal article (1)

pepty (1976012) | about 8 months ago | (#46314375)

But a strain isn't a collection of genetically identical organisms; the closest you would get to that would be clones. A strain refers to a group with a similar genotype or phenotype.

Re:The actual journal article (2)

davester666 (731373) | about 8 months ago | (#46313447)

I'd totally hash that!

Re:The actual journal article (1)

JoeMerchant (803320) | about 8 months ago | (#46315793)

Hash would be a horrible way to do it (and I doubt it's what's proposed).

A single bit error in reading the DNA would lead to a completely indistinguishable, unrelated hash value.

Even if we could sample DNA and get a 100% accurate reading of its content, which we can't, DNA sampled from your left hand would come up with a different value from DNA sampled from your right hand - just due to insignificant genetic drift, tiny transcription errors, etc.

Names are for communication (4, Funny)

Chemisor (97276) | about 8 months ago | (#46313381)

I think I'll go hunt some af7caaf1e73a2d24924371a370b4ef9b so I can feed my 362842c5bb3847ec3fbdecb7a84a8692 and a nice quiet evening with my 34b46c8cf192431e84ea81109660367b, chatting about the difficulty of talking about a474fb23f886eeaa16223eba872e53b1 that some socially inept scientist decided to name with a hash function.

Re:Names are for communication (0)

Anonymous Coward | about 8 months ago | (#46313465)

Wouldn't be faster to buy some ebff344a30f680b4d1357c87428852a1 flavoured cans instead of hunting for af7caaf1e73a2d24924371a370b4ef9b's?

Re:Names are for communication (1)

Chemisor (97276) | about 8 months ago | (#46313497)

Nah. Only 2befc6455fdef3fdc8fe4d9770e45d1b like ebff344a30f680b4d1357c87428852a1 flavor.

Re:Names are for communication (3, Funny)

Darinbob (1142669) | about 8 months ago | (#46314275)

It all tastes like 518bf09f107329cef14fd9c9dbddab3c anyway.

Re:Names are for communication (2)

physicsphairy (720718) | about 8 months ago | (#46313623)

Names are indeed for communication, but 'name' here is mostly bad terminology, or at least The Fine Article leads me to believe these are meant more as serial numbers to supplement the existing system of nomenclature than anything else.

Which is actually somewhat useful. Any research project starts by looking at what other research has already been done. It's no good if your search terms don't bring up the relevant papers. I suppose this might be somewhat like the nomenclature system for chemistry, in which the IUPAC standard for naming molecules has replaced common names. Frequently used chemicals still are referred to by common names, but mostly even if a molecule you encounter has a common name you're not likely to know it off the top of your head. It's pretty hand to be able to figure out the standard name by its structure, so you can then search for it or look up its properties in the CRC.

Re:Names are for communication (2)

Hognoxious (631665) | about 8 months ago | (#46315483)

I don't think they're like serial numbers. A serial number should be just that - assigned serially to whatever is produced/discovered next. These will have meaning embedded in them.

I generally don't like identifiers with meaning. Yeah, let's give the females odd employee numbers and the males even ones. And while we're at it, make the 3rd digit indicate their grade and the 5th their education level...

Re:Names are for communication (0)

Anonymous Coward | about 8 months ago | (#46313737)

Depending on what it is you're hunting, you could end up with either 0xdeadbeef or 0xcafebabe (either 55378008** or 0xb16b00b5), or 0xcafed00d if you're into that (no judgement). Just watch out for 0xbaadf00d while 0xfacefeed.

** - We'll see who remembers this one from their childhood. Captcha: memory

Re:Names are for communication (1)

interkin3tic (1469267) | about 8 months ago | (#46314035)

Different forms of communication. You wouldn't say "I'm going to feed my Felis Cattus some Gallus Gallus for dinner." The Linnaeus system is already too specialized to fill in those mad libs blanks, you just use the common names for organisms you deal with regularly. This is for when the Linnaeus system isn't specialized enough. Say you discover through DNA sequencing that your sample of mud from the bottom of the ocean has ten new species of microbes in it. You're probably not going to come up with ten latin names that are particularly memorable. Putting some information in from their genome makes sense.

Re:Names are for communication (2)

subreality (157447) | about 8 months ago | (#46314477)

["dog\n", "deer\n", "wife\n", "animals\n"] ... People would find these names easier to understand if you used "echo -n".

No success (4, Informative)

Sven-Erik (177541) | about 8 months ago | (#46313401)

Not so sure this will take off since they have applied for a patent and wants users to pay a license fee to use it.

Re:No success (1)

Frosty Piss (770223) | about 8 months ago | (#46313473)

Not so sure this will take off since they have applied for a patent and wants users to pay a license fee to use it.

Much like Dewey Decimal [wikipedia.org] .

Re:No success (2)

Frosty Piss (770223) | about 8 months ago | (#46313529)

For those that did not RTFA:

Virginia Tech is submitting a patent describing the naming scheme. Vinatzer and his collaborator Lenwood Heath, a professor in the Department of Computer Science in the College of Engineering founded This Genomic Life Inc., which will license the invention to develop it further.

Re:No success (1)

pepty (1976012) | about 8 months ago | (#46314393)

If the licensing terms are expensive or onerous they won't get many users ... so the name will probably also contain a paid advertisement ;). Actually; I'd guess the license would ask users to submit a copy of the name and other useful info to This Genomic Life's database and then they will charge institutions a fee to search it.

Re:No success (0)

Anonymous Coward | about 8 months ago | (#46314309)

Which is odd, because there's _already_ a patent for using hashes to name things... yeah.

Because noooooo, it's TOTALLY INOBVIOUS THAT YOU CAN USE THEM TO NAME ANYTHING. When someone comes up with using a hash to name computers/cats/printers/buildings/whatever it will be totally new to the USPTO. Because the law says that lawyers are *supposed* to be dumb about that.

The only advantage (0)

Anonymous Coward | about 8 months ago | (#46313413)

Will they use it to find identical lifeforms by comparing their hashes? Can't they just look at them and tell?

Re:The only advantage (0)

Anonymous Coward | about 8 months ago | (#46314631)

Genes are not the ultimate determinant of how an organism looks once it's grown up. Also, two species have the same genes when it comes to their appearance, but have some significant differences in their insides.

What's the hash of hash (1)

Mister Liberty (769145) | about 8 months ago | (#46313433)

A higher form of life.

Biology and Computer Science Two Way Street (5, Insightful)

utkonos (2104836) | about 8 months ago | (#46313451)

Last month, at ShmooCon [shmoocon.org] a talk [archive.org] was given about spatial analysis [shmoocon.org] of malware samples. The technique is borrowed directly from bioinformatics. This is a great example of techniques from Biology being used effectively in the IT security realm.

I hope that the researcher involved in naming organisms based on hash algorithms chooses context triggered piecewise hashes (CTPH) AKA fuzzy hashing [dfrws.org] or a similarity hash algorithm [princeton.edu] rather than an algorithm like SHA512. Google's simhash [wwwconference.org] or at least the ideas of this type of algorithm would lend itself much better to the naming of organisms.

FYI: a FOSS implementation of fussy hashing is called ssdeep. The project site is here [sourceforge.net] . This is an implementation that is widely used in open source malware analysis tools like Cuckoo Sandbox [cuckoosandbox.org] .

Re:Biology and Computer Science Two Way Street (2)

dirt (1129) | about 8 months ago | (#46314453)

Thanks for those links. Comments like yours are why I continue to read /.

Re: Biology and Computer Science Two Way Street (1)

K. S. Kyosuke (729550) | about 8 months ago | (#46315111)

Last month, at ShmooCon a talk was given about spatial analysis of malware samples. The technique is borrowed directly from bioinformatics

Haven't read it yet but from the abstract, it sounds very much like the approximate word matching/text indexing method I prototyped almost a decade ago, matching pieces of text by mapping them onto sets or multisets of their constituent trigrams. (And I'm pretty sure I wasn't the first person ever to have done that.)

Re: Biology and Computer Science Two Way Street (1)

utkonos (2104836) | about 8 months ago | (#46315127)

It's definitely not a new technique. It's purpose in bioinformatics is sequence comparisons. It calculates the mean and standard deviation of a fixed size piece of the sequence. It lends itself very well toward malware analysis.

Re: Biology and Computer Science Two Way Street (1)

utkonos (2104836) | about 8 months ago | (#46315131)

Sorry, it's late: I meant to say fixed sized pieces. It doesn't just look at one stretch of sequence.

Re: Biology and Computer Science Two Way Street (1)

K. S. Kyosuke (729550) | about 8 months ago | (#46315387)

I got it, if N-grams are what you meant.

Beta still sucks (-1)

Anonymous Coward | about 8 months ago | (#46313455)

/. beta still sucks, switched back to classic, far more readable.. Stop trying to fix what isnt broken you drongos.

Whoops, too late (1)

Krishnoid (984597) | about 8 months ago | (#46313505)

I think we're a little late to the game [deviantart.com] on this one.

Hash tags (0)

Anonymous Coward | about 8 months ago | (#46313517)

Maybe we should enter the Twitter age and name them all with hash tags instead! Just my idea as a member of #homosapiens

SMS-Scams have done that before (1)

ffkom (3519199) | about 8 months ago | (#46313531)

A few years back there was an eruption of expensive "premium short message services" that offered gullible people to provide their "Manga character name", their "hero name" and other ridiculous stuff like that - based on a hash-function that picked a name from a name library according to the hash sum over some arbitrary data (like real name, phone number or whatever) the gullible customers provided in their short message.

Now this "pick name based on hash over genes"-proposal does not sound that different - and it is similarily useless. Why would one pick some completely different random name just because of a single insignificant minor mutation?

Re:SMS-Scams have done that before (1)

PPH (736903) | about 8 months ago | (#46313611)

I hope this doesn't invalidate my family coat of arms which I paid good money for some years back.

Excellent Idea! (1)

sk999 (846068) | about 8 months ago | (#46313533)

From now on, please refer to me as "ed35073e47a38fbbcc66c1c69058b9c3"

This system has more uses beyond categorizing life on earth. My favorite movie line: "77b0ba27c2fcfa0e02793671c27afb38"

Re:Excellent Idea! (1)

TrollstonButterbeans (2914995) | about 8 months ago | (#46313563)

Sorry dude, under the new naming convention you are: human.nerd.slashdot.sk999

Re:Excellent Idea! (1)

sk999 (846068) | about 8 months ago | (#46313629)

"Sorry dude, under the new naming convention you are: human.nerd.slashdot.sk999"

I think I just said that!

Re:Excellent Idea! (1)

TrollstonButterbeans (2914995) | about 8 months ago | (#46314001)

Exactly!

Re:Excellent Idea! (1)

Tablizer (95088) | about 8 months ago | (#46314303)

From now on, please refer to me as "ed35073e47a38fbbcc66c1c69058b9c3"

Well, my wife sometimes refers to me as #&%@$! after I mess up

Individials of the same species have (3, Insightful)

Anonymous Coward | about 8 months ago | (#46313591)

differing genetic code.

Re:Individials of the same species have (0)

Anonymous Coward | about 8 months ago | (#46322793)

Clones are individuals too, you big jerk.

Good! (0)

Anonymous Coward | about 8 months ago | (#46313607)

Good! Finally people might understand that the Platypus (Platypus australis) is actually a species of beetle and not the mammalian species Ornithorhynchus (Ornithorhynchus anatinus).

The most obvious problem with this approach (4, Informative)

Anonymous Coward | about 8 months ago | (#46313617)

This kind of thinking has a tremendous problem with it. Presently, organisms take the name of a previously described species if and only if it is a member of the same species as a particular type specimen from which the species is described. This holotype serves as the reference specimen for each species. This system has worked extraordinarily well for more than 200 years and has promoted nomenclatural stability.

The biggest problem with attempting to identify species on the basis of their genetic "fingerprint" or bar code is that unless you have some other means to establish that the specimen from which the genetic material is in fact from the same species as the holotype, then the genetic fingerprint will simply misidentify the specimen. This is a major problem for much of the genetic data in GENBANK, for which, more often than not, there is no longer a means of associating the source of the genetic material with a specimen, whose identity can be established independently). because the original specimens are seldom vouchered or saved. Consequently, the actually identity of the species that has been sequenced, remains uncertain, even if alignments of the code are "perfect". As for the patent, the rules of Zoological Nomenclature forbid the commercialization of names used in science. These guys can make up their own naming scheme, but scientists, who must rely on having their work, at least in principle repeatable and refutable, will be unable to use it for the purposes of science.

Re:The most obvious problem with this approach (2)

utkonos (2104836) | about 8 months ago | (#46313663)

I think you''re mostly correct, except for the case of organisms with horizontal gene transfer such as bacteria and archaea. The current naming convention breaks down when it is applied to this type of organism.

Re:The most obvious problem with this approach (1)

turkeyfish (950384) | about 8 months ago | (#46314423)

Yes, this approach makes far more sense for bacteria, whose phenotype is not that far removed from its genotype.

Re:The most obvious problem with this approach (1)

turkeyfish (950384) | about 8 months ago | (#46314435)

I should have also added that if you have evidence that horizontal transfer of genetic information has occurred then you probably already have information about the distinctiveness of two putative species. Otherwise, how would you tell?

Re:The most obvious problem with this approach (1)

utkonos (2104836) | about 8 months ago | (#46314467)

There may be cases where a single "species" of bacteria has a varying rate of horizontal transfer based on its host species. It may have more exposure to a different species of bacteria that it is able to trade genes with because that other species is exclusive to one of the two hosts rather than both. In cases like these, you could name each by its code. I think the ultimate goal is to make clear naming distinctions that reflect actual differences in populations of organisms.

Re:The most obvious problem with this approach (1)

as.kdjrfh sxcjvs (2872465) | about 8 months ago | (#46319761)

There are problems with the current system, though. The one I'm most familiar with runs like this:

1. Species was named a while ago, with the type specimen kept. Name, say, Foozy yanner. Specimens are collected from several places over time.
2. Then we realize that those specimens represent more than one species (very possible just with old-fashioned naturalist observation, and happens also with genetic analysis).
3. So some of the specimens now are officially Foozy tanner -- but we aren't allowed to rename Foozy yanner itself (rules of system).
4. When you run across a reference to Foozy yanner it's _very_ difficult to know whether it's referring to the old unknowingly inclusive set of species, or the single (?) species that `Foozy yanner' now indicates. We'd be more precise if, e.g., we called the single species Foozy yanner_1, which might later get split to Foozy yanner_2 and Foozy tethera, with Foozy yanner_2 the name that includes the original Foozy yanner type specimen.

And this can happen with vertebrates! Archaea, hellifino.

I always wanted that some scientists recognize ... (1)

angel'o'sphere (80593) | about 8 months ago | (#46313653)

... my genius!!
How the fuck am I now supposed to have an "angel'o'sphere" in the middle of a the name of a beetle?

They just gave a beetle the name of Darwin and a stupid musician ... and I'm lost.

(* cry *)

ssasa must not know what hashing functions really (1)

Nutria (679911) | about 8 months ago | (#46313671)

are, if he (women can't be this stupid) thinks that a hierarchical naming scheme is anything like a hash function.

Re:ssasa must not know what hashing functions real (1)

Intrepid imaginaut (1970940) | about 8 months ago | (#46313697)

Why can't women be this stupid?

Re:ssasa must not know what hashing functions real (0)

Nutria (679911) | about 8 months ago | (#46313745)

If you think that it's even possible that a womon (it's actually a word...) be stupid, then you must be a racist, sexist homophobic religionist who voted (TWICE!!) for The Stupidest Man In The Word, Evar! [wikipedia.org]

Women are the saviors of the World. Long live Gaia, long live the Matriarchy!!

Re:ssasa must not know what hashing functions real (0)

Anonymous Coward | about 8 months ago | (#46315245)

then you must be a racist, sexist homophobic religionist who voted (TWICE!!) for The Stupidest Man In The Word, Evar!

In the rest of the world, we call such a person an "American".

Why bother (1)

Eric Damron (553630) | about 8 months ago | (#46313683)

At the rate we are driving species extinct a much simpler system of naming the few surviving will be sufficient.

Re:Why bother (1)

fuzzyfuzzyfungus (1223518) | about 8 months ago | (#46314961)

If anything, killing the big, fuzzy, charismatic ones and leaving nothing but horrid insects and burbling bioslurry probably makes the naming challenge substantially more difficult. Back in the good old days, when 'biology' consisted of going out, shooting things, having the servants stuff them, and then deciding which ones looked most like the other ones, you could get away with all kinds of sloppy naming because there just weren't that many species on the table.

Now that we have these kids with their fancy 'genomes' and whatnot, you can probably identify more species within a single block of the megacity 12 slums than we did in the entire pre-genetic history of biology.

Hippo's tasty, but I couldn't eat a whole one (2)

Hognoxious (631665) | about 8 months ago | (#46315367)

shooting things, having the servants stuff them [...] there just weren't that many species on the table.

Ah. When you said "stuff them" I thought you were referring to taxidermy, and not the herbs and breadcrumbs kind.

Re:Hippo's tasty, but I couldn't eat a whole one (0)

Anonymous Coward | about 8 months ago | (#46315587)

> Ah. When you said "stuff them" I thought you were referring to taxidermy, and not the herbs and breadcrumbs kind.

Nope... Chuck Testa!

Re:Hippo's tasty, but I couldn't eat a whole one (1)

fuzzyfuzzyfungus (1223518) | about 8 months ago | (#46320803)

More than a few noted biologists also came to this conclusion. A major theme in Charles Darwin's "The Voyage of the Beagle" is his constant quest to document, and devour, every goddamn novel species he could get his hands on. And he got his hands on quite a few.

Re:Hippo's tasty, but I couldn't eat a whole one (1)

coolsnowmen (695297) | about 8 months ago | (#46334089)

In reply to your subject: did you know about the hippo farming plan? Apparently it was a big deal like a hundred years ago when we had a "meat problem" in the united states.

Random link: http://www.wired.com/wiredscie... [wired.com]

Doesn't want a hash (1)

GrahamCox (741991) | about 8 months ago | (#46313721)

You don't want a hash function for this, where the hash is effectively random. You need a function that derives a unique value for each input, but retains the relative distance of the original value. i.e. two values that are very similar yield an index that is similarly close. That way the 'hash' can be used to determine how closely related two species are. Randomising in the way a true hash does is of no real value.

Re:Doesn't want a hash (1)

fozzy1015 (264592) | about 8 months ago | (#46313797)

Randomising in the way a true hash does is of no real value.

There's nothing random about a hash function. It has to be deterministic; the same input will always result in the same output.

Re:Doesn't want a hash (1)

GrahamCox (741991) | about 8 months ago | (#46314013)

I know that; you know that; everyone on Slashdot presumably knows that. I didn't mean 'random' as in truly random, I meant that the spread of values produces by the hash must be effectively random. In the context of the article, you don't want anything like random, you want both deterministic and a function that maintains distance. A hash is completely wrong.

Re: Doesn't want a hash (1)

K. S. Kyosuke (729550) | about 8 months ago | (#46315125)

I meant that the spread of values produces by the hash must be effectively random. In the context of the article, you don't want anything like random, you want both deterministic and a function that maintains distance. A hash is completely wrong.

Of course a hash is wrong. That's why you use a locality-sensitive hash [wikipedia.org] for that.

Re:Doesn't want a hash (1)

Nutria (679911) | about 8 months ago | (#46314015)

It has to be deterministic; the same input will always result in the same output.

Correct. OP is conflating "random" with "one-way".

Re:Doesn't want a hash (1)

Hognoxious (631665) | about 8 months ago | (#46315403)

I think he's conflating random with meaningless. A well constructed hash shouldn't give clues about the contents, i.e. if you change a single e to an f in "War & Peace" the results should be totally different.

Re:Doesn't want a hash (1)

common-lisp (2771805) | about 8 months ago | (#46314271)

Not only that, but DNA varies from ape to ape just as much as from ape to humans. So unless they use a hash function like you specify, it would be impossible to tell if you're looking at the hashed-DNA of two members of the same species or two members of different species.

IMO if they're trying to use hashes to provide a unique identifier for species, they probably don't care about the ability to measure the similarity between species. It would probably be difficult to write a hash function that would have few collisions and still guarantee that similar numbers result in similar hashes. Someone correct me if I'm wrong.

Re:Doesn't want a hash (2)

VortexCortex (1117377) | about 8 months ago | (#46314625)

You don't want a hash function for this, where the hash is effectively random. You need a function that derives a unique value for each input, but retains the relative distance of the original value. i.e. two values that are very similar yield an index that is similarly close.

Certain hash tables for search functions are built around hashes exhibiting the very type of behavior you describe -- Not to mention current 'reverse' image search technologies. A "hash" function is not required to have a seemingly random output. Cryptographic hashes try to produce high entropy deterministic output, but other types of hashing can and do have different goals, namely with far less entropic outputs.

I would ask you to turn in your geek card, but the standard for issuance is far lower nowadays...

#newclassification (0)

akage.chan (3545191) | about 8 months ago | (#46313747)

Just be glad that we aren't going to name organisms with hashtags! #homosapiens!

Not sure how similar this is to hashing (3, Informative)

fozzy1015 (264592) | about 8 months ago | (#46313775)

I first thought the genetic sequence of an organism would be the input to a hash function, but reading further that doesn't seem to be the case.

"Using Vinatzer's genome sequence, the Ames strain used in the bioterrorist attack would, for example, be known as lvlw0x and the ancestor of this strain stored at the U.S. Army Medical Research Institute for Infectious Diseases would be known as lvlwlx."

The output name would still show ancestry using identical values, when one of the key properties of a hash function is that small changes in the input result in a completely changed output.

Re:Not sure how similar this is to hashing (0)

Anonymous Coward | about 8 months ago | (#46314137)

"Using Vinatzer's genome sequence, the Ames strain used in the bioterrorist attack would, for example, be known as lvlw0x and the ancestor of this strain stored at the U.S. Army Medical Research Institute for Infectious Diseases would be known as lvlwlx."

The output name would still show ancestry using identical values, when one of the key properties of a hash function is that small changes in the input result in a completely changed output.

I question if that naming scheme is useful at all. There are two problems with this:
1: lvlw0x isn't names people will really remember, which mean it will have to be some database lookup anyway.
2: if you classify one to place it at the wrong location, you will have to rename all genomes having that one as ancestor. This mean the names can't be static.

The purpose appears to be able to build a genome family tree. However wouldn't you need to the tree to assign the names in the first place?

The point in the periodic table is to group similar atoms and it has been used in the past to predict abilities of yet to be discovered atoms as well as newly discovered, yet poorly researched atoms. I wonder if this naming strategy could be used for something like that. However I suspect not as the knowledge you want to gain needs to be discovered the hard way to figure out which name to assign. Besides a database with relations would still be able to figure that out regardless of names.

Somehow I feel like this naming scheme is like kings used to define a yard. It was from the king's nose to his fingertip (often, not always) meaning it was redefined with each new king. After all he should leave a mark around him. Coming up with a renaming scheme for all genomes could be leaving the researcher's mark into history without adding benefits.

Re:Not sure how similar this is to hashing (1)

laughingskeptic (1004414) | about 8 months ago | (#46317097)

You are equating "hash function" and "cryptographic hash function" with your assertion "one of the key properties of a hash function is that small changes in the input result in a completely changed output". Not all hash functions are cryptographic hash functions. Inside operating systems you may see a hash function that is no more than a simple masking of bits because that is all that is required.

Re:Not sure how similar this is to hashing (2)

martin-boundary (547041) | about 8 months ago | (#46321787)

No, cryptographic hash functions have certain strong guarantees, but all(*) hash functions are supposed to mimic independent, uniformly random, behaviour of inputs. Since in the physical world, inputs often come from processes, and processes tend to evolve continuously, the inputs to be hashed by a computer system often have some amount of similarity if they occur close together in time. Thus to transform consecutive inputs into a pair of independent uniformly random hashes, it is desirable that small changes in the input result in completely changed output.

(*) There are exceptions, such as when devising algorithms for locality sensitive hashing [wikipedia.org] , but they are few.

God no. (0)

Anonymous Coward | about 8 months ago | (#46313955)

It'd be easier to just name them using a simple system based on structured syllable words, similar to the system in use for naming unnamed super heavy elements. [wikipedia.org]

But instead of numerals, you could do it based on regions of discovery, general attributes of the creature, what it is, the family, blah blah etc, you get the idea.
Make it as complex, but as defined as possible. What use does a hash have for anything?
A structured systemic naming system for every possible creature would be considerably better, and is absolutely 100% unique, and better yet, is free of bias, ego or anything else in deciding the name since it would be a simple "follow the definition table".
That stuff can come after when the normal people care enough to want to name a creature that has 50 syllables to its scientific name.

Not particularly useful (1)

drinkypoo (153816) | about 8 months ago | (#46314179)

Might as well just use shorthands for now. If the nomenclature may have to be changed someday because of collisions, you might as well use something more friendly today. Until we understand DNA well enough to reject nonviables, the potential for namespace collision is too high to expect to be able to use today's scheme forever.

Mathematically impossible (1)

Kim0 (106623) | about 8 months ago | (#46314771)

It is impossible to make a hash function that gives one hash to a set of similar genes, because there will always be too many near ambiguities.
There are however other ways of doing things that behave like hashes. I made the best method for this for fingerprints, so I should be contacted for stuff like this.

Re:Mathematically impossible (0)

Anonymous Coward | about 8 months ago | (#46315383)

I thought so also but according to the news they've somehow managed to create a solution. RTFM.

On aother subject (0)

Anonymous Coward | about 8 months ago | (#46315329)

I'm waiting for captcha be implemented as mandotory check before sex.... That will surely limit STDs!

OID would be better (1)

tomhath (637240) | about 8 months ago | (#46315391)

A hash doesn't provide any taxonomy. It would be better to use something like an OID [wikipedia.org] so you see how the organism relates to other organisms.

In a way isn't a genetic fingerprint a hash by its (1)

Sait-kun (922599) | about 8 months ago | (#46315427)

In a way isn't a genetic fingerprint a hash by itself.

A hash is an form of ensuring the (genetic) code is exactly the same.
You could basically see every living being as a walking collecting of genetic hashes. Some of these hashes we share others are unique to a species or sub-species or unique to a single person.

The only difference is that we do not understand the genetic code well enough to use them in the same way as a hash code.

Would destroy comics page completely (1)

140Mandak262Jamuna (970587) | about 8 months ago | (#46315433)

The comic page cartoonists are happy people, finding joy in simple things and daily events happening around them. They are amused by such little things they lead a happy contented life. The only reason they share their joy, in whatever little morsels, with lesser mortals like us is, someday they will have a louse or a worm or bacteria named after them. Without an incentive like this they will throw away their crayons and pencils and walk away.

I will support naming organisms by hash functions when hash functions produce funny output on the far side of sanity-insanity dividing line.

Re:Would destroy comics page completely (0)

Anonymous Coward | about 8 months ago | (#46315711)

Mods, take a look at the louse named after Gary Larson the author of the Far Side cartoons and the louse [wikipedia.org] named after him.

Will end in catastrophy, because of sex (0)

Anonymous Coward | about 8 months ago | (#46315525)

This idea was proposed by someone unused to sex, a scientist studying haplotype lineages. The world is however full of recombinants and don't forget that humans are a bunch of haplotype lineages too.

hash (1)

mnajem (642318) | about 8 months ago | (#46315553)

initially thought it was the process of hashtagging everything on earth with #twitter #hashtag

do they have the genetic code for all these lifefo (0)

Anonymous Coward | about 8 months ago | (#46315995)

SO how many genetic factors are to be the input variables for this hash?
How many have they collected and verified (across the whole species variation range) ?
ANd they mantion dinosaur -- kindof a dearth of genetic material to classify thee dont you think (ditto for much more recent extinct species)

SO this is just a 'ssystem' somone cobbled together?

Has it passed any tests indicating it will actually work 99.99% of the time (before any other effort is put to collecting ALL the data)

So ... both organisms ... (1)

Dabido (802599) | about 8 months ago | (#46331071)

At the rate we're killing organisms off, we'll only need two of these soon ... one for humans, and one for soylent green ... whatever that stuff is made from!!!!
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?