Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Automated Language Deciphering By Computer AI

samzenpus posted more than 4 years ago | from the what-about-dwarvish? dept.

Education 109

eldavojohn writes "Ugaritic has been deciphered by an unaided computer program that relied only on four basic assumptions present in many languages. The paper (PDF) may aid researchers in deciphering eight undecipherable languages (Ugaritic has already been deciphered and proved their system worked) as well as increase the number of languages automated translation sites offer. The researchers claim 'orders of magnitude' speedups in deciphering languages with their new system."

cancel ×

109 comments

Sorry! There are no comments related to the filter you selected.

THIS IS NOT A PROBLEM !! (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#32753028)

This is a good thing for all concerned !!

Sweet (1)

The MAZZTer (911996) | more than 4 years ago | (#32753044)

Universal translator, here we come!

Re:Sweet (3, Funny)

Fluffeh (1273756) | more than 4 years ago | (#32753054)

But will it go into your ear, or will it be injected via a syringe and live in your gut is the question?

Re:Sweet (5, Funny)

Anonymous Coward | more than 4 years ago | (#32753088)

Good news, it's a suppository.

Re:Sweet (1)

sortius_nod (1080919) | more than 4 years ago | (#32753226)

You forgot the third option... but we need a TARDIS for that.

Re:Sweet (-1, Troll)

Anonymous Coward | more than 4 years ago | (#32753266)

You forgot the third option... but we need a TARDIS for that.

all of this and niggers still speak a form of pidgin instead of english. you would think they would have overcome this by now. that is why they are jigaboos i guess.

Re:Sweet (0, Offtopic)

Cryacin (657549) | more than 4 years ago | (#32753340)

and niggers still speak a form of pidgin instead of english

I'll probably get burned in hell for feeding a troll, but I can't resist.

Your only language is obviously English, right?

Well let me point out that it's a "pidgin" mix between German, which was a "pidgin" version of Norse, as well as French, which is basically a "pidgin" version of Latin.

Therefore, by speaking English you speak a very similar amount of "pidgin" as the delightful people you are referring to.

I hope that your sister can comfort you as you cry yourself to sleep tonight. After all, that's what wives are for!

Re:Sweet (2, Interesting)

jd (1658) | more than 4 years ago | (#32753748)

Well, Old Norse is technically based on Old Germanic rather than the other way round, and Old English not only had Old Germanic input but Old Norse input as well. Along with an uncertain amount of Anglic (amazingly little is known about the Angles), possibly some Jute. English uses Norman French, plus modern French (which itself is derived from Norman French). Norman French survives in the modern world in Guernsey, Jersey and maybe some other Channel Islands but became extinct on Alderney.

To bring this Back On Topic, if English were lost, it would be almost impossible to use this program to recover it. English has input from too many sources, resulting in way too many loan-words of incompatible structure and too much incompatible grammar. However, one very interesting test of the program would be to map each of the derived phonemes in Pre-Indo-European to a character, then compare this derived PIE script with each Indo-European language in turn. If the derivation is correct, the number of correct guesses for translations of PIE words into each known IE language aught to be above what would be expected by chance alone AND the translations should remain compatible with the derivations the PIE engineers used in the first place. By comparing across the translations for all languages, the program may discover other word-parts that had not been noticed before.

It may be possible to determine if a language is truly isolate or not, by analyzing against a language multiple times using slightly different data sets and seeing if the results remain about the same. If this test works, then languages of uncertain/unknown ancestry (such as Basque and Etruscan*) can be tested against all 7,200 known languages to see if any of them produce a moderately stable match. No match means no connection with any other existent linguistic family tree.

*Etruscan is a bugbear. There is one book that is completely intact and undamaged. It's made of gold leaf. The academic who currently owns it has not published so much as a single line of the text, merely two of the illustrations. All other Etruscan texts are fragmentary (so you've very little context to work with and not many words that are definitely complete) or too short to be useful. We don't know what Etruscan is related to, but if the above hypothesis is correct, we could find out and then translate the book. But the damaged texts, such as a linen book used to wrap a mummy, are way too fragmentary. You'd never be sure if such a translation was correct. A complete book, on the other hand, would offer no possibility for mistake. It would work or it wouldn't.

Re:Sweet (1)

Luyseyal (3154) | more than 4 years ago | (#32758624)

Do you have a citation for that book? It's not mentioned in Wikipedia.

Thanks,
-l

Re:Sweet (1)

jd (1658) | more than 4 years ago | (#32760000)

Links follow:

Finding useful information on this book is... hard. You're right, Wikipedia doesn't even mention it. Anywhere.

Re:Sweet (1)

jordan_robot (1830144) | more than 4 years ago | (#32753060)

Universal translator, here we come!

pbbbbbbttt! I prefer to stick to translator microbes, thank you very much!

Re:Sweet (3, Informative)

doishmere (1587181) | more than 4 years ago | (#32753074)

Their method relies heavily on the unknown language being related to a known language by some degree. At their heart of their technique is Bayesian statistics applied to lexical and frequency analysis; for this approach to work, there must be some basis for comparison.

Re:Sweet (1, Insightful)

Anonymous Coward | more than 4 years ago | (#32753452)

I think that this is more a tool for the human deciphers than a magic tool for decipher the languages. This a great tool wen you have already obtained the key points of the language, with this you can evade the most tedious part that is going word for word to obtain the language and the reduce the necessary time for decipher it. Also with this tool is possible the case were you decipher the language but this language is wrong, but this don't mean that all of the deciphered is wrong as the most possible with ideographic writings is that you have deciphered correctly the writing and meaning but not the reading.

Re:Sweet (4, Funny)

grcumb (781340) | more than 4 years ago | (#32753200)

Universal translator, here we come!

Cool! Can I bring it into my next marketing meeting?

Re:Sweet (4, Funny)

Walt Dismal (534799) | more than 4 years ago | (#32753272)

Only if the gross gains in closing juncture exceed the long-term sustainability goals of the viability imperative for all mass interoperability. We at Mega Industries believe this will move us forward to our cloud-based monetization of the human-media dynamic which is strategically important in an ever-evolving mobile continuum. We have directed our customer experience champions to ensure consumers realize this when they call in with emphatic expressions of dissatisfaction.

Re:Sweet (0)

Anonymous Coward | more than 4 years ago | (#32753378)

Rough translation for the Marketing impaired:

"Sure, but show us the money, first!"

Re:Sweet (1)

L4t3r4lu5 (1216702) | more than 4 years ago | (#32754274)

Only if the gross gains in closing juncture exceed the long-term sustainability goals of the viability imperative for all mass interoperability.

Only if we can update the UI for version 2 and sell it a second time to the same saps.

We at Mega Industries believe this will move us forward to our cloud-based monetization of the human-media dynamic which is strategically important in an ever-evolving mobile continuum.

When everyone has it, we can turn it into a subscription-based cash cow.

We have directed our customer experience champions to ensure consumers realize this when they call in with emphatic expressions of dissatisfaction.

Tell the whining losers that premium support is only available with the Platinum Care package, and transfer them to "Gord-on" in the Mumbai sales office.

Re:Sweet (1)

roman_mir (125474) | more than 4 years ago | (#32754740)

Pffft, please, your plan is to have emphatically expressed dissatisfied consumers realize that your gross gains within the closing juncture exceed your long-term sustainability goals for all viability imperatives, which will allow the move to cloud-based monetization of the human-media dynamic? It is but a futile attempt, you may as well give up right now, no matter how much time your customer experience champions waste on a single call.

Here, at GOD Industries, we know better than to rely on such clearly misguided attempts of human-human interactions.

We simply induce meditative sublimation of continuous exasperation through excitation of vernacular instinctual continuum within the subject's predisposition to acceptance of the delirium through faith chakra. There is no possible manner in which the subject can abjugate oneself from the excited forces of undifferentiated love, and that is what we, at GOD Industries are specializing in: Love.

If you think your business plan can compete with ours, it is only because we haven't descended our super-existential love upon you person just yet.

Re:Sweet (0)

Anonymous Coward | more than 4 years ago | (#32755302)

Synergy?

Re:Sweet (1)

Dr. Eggman (932300) | more than 4 years ago | (#32755718)

I tried running your statement through the deciphering-AI, but the process killed itself before completion. I checked the debug logs, but the weren't very helpful. Just a bunch of 'e's, 'y's, 'a's, and some 'r's and 'g's strung together.

I...I think it was screaming...

Re:Sweet (1)

drachenstern (160456) | more than 4 years ago | (#32757934)

This.

Seriously.

Re:Sweet (0)

Anonymous Coward | more than 4 years ago | (#32753588)

Universal translator, here we come!

Cool! Can I bring it into my next marketing meeting?

Now if it only decipher law texts.

Re:Sweet (1)

oiron (697563) | more than 4 years ago | (#32753888)

Read it again: It depends on similarity to a known language...

Re:Sweet (1)

Posting=!Working (197779) | more than 4 years ago | (#32756010)

He said universal translator, as in it only works on languages of this universe. Marketing speak is from the anti-matter dominant universe, as evidenced by the fact that the more it is spoken, the less is actually communicated.

Re:Sweet (1)

MoriT (1747802) | more than 4 years ago | (#32758276)

There is a perfect translation system that extracts all content from marketing speech: earplugs.

Answers to all TFA questions (5, Informative)

cappp (1822388) | more than 4 years ago | (#32753068)

Just so we can keep the “didn’t read TFA” comments to a minimum: The four assumptions as laid out in the article are:

- The language being deciphered is closely related to some other language: In the case of Ugaritic, the researchers chose Hebrew.

- There’s a systematic way to map the alphabet of one language on to the alphabet of the other, and that correlated symbols will occur with similar frequencies in the two languages. The system makes a similar assumption at the level of the word: The languages should have at least some cognates, or words with shared roots, like main and mano in French and Spanish, or homme and hombre.

- The system assumes a similar mapping for parts of words. A word like “overloading,” for instance, has both a prefix — “over” — and a suffix — “ing.” The system would anticipate that other words in the language will feature the prefix “over” or the suffix “ing” or both, and that a cognate of “overloading” in another language — say, “surchargeant” in French — would have a similar three-part structure.

. The article also notes the success rates where it states that

Ugaritic has already been deciphered: Otherwise, the researchers would have had no way to gauge their system’s performance. The Ugaritic alphabet has 30 letters, and the system correctly mapped 29 of them to their Hebrew counterparts. Roughly one-third of the words in Ugaritic have Hebrew cognates, and of those, the system correctly identified 60 percent. “Of those that are incorrect, often they’re incorrect only by a single letter, so they’re often very good guesses,” Snyder says.

Critics noted that

The researchers’ approach, he says, presupposes that the language to be deciphered has an alphabet that can be mapped onto the alphabet of a known language — “which is almost certainly not the case with any of the important remaining undeciphered scripts.” It also assumes, he argues, that it’s clear where one character or word ends and another begins, which is not the case with many deciphered and undeciphered scripts. The decipherment of Ugaritic took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic.

Re:Answers to all TFA questions (4, Insightful)

MichaelSmith (789609) | more than 4 years ago | (#32753092)

The decipherment of Ugaritic took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic.

Maybe I should go around and write "computer" in English on all my computers, as a service to future language researchers.

Pfft, why? (5, Funny)

mdenham (747985) | more than 4 years ago | (#32753108)

Label at least one computer "ham sandwich" to confuse future language researchers.

Alternatively, label each computer with a character's name from (insert show of your choice here).

Re:Pfft, why? (1)

steelfood (895457) | more than 4 years ago | (#32753606)

In all of the computer labs I've been to, the name of the computer is visibly displayed in front somewhere. The names of all teh computers in the lab usually revolve around a common theme, e.g. periodic table of elements, Simpsons characters, HHGTTG characters, etc.

You better hope English never becomes extinct, because an important period in human history would be forever lost.

Re:Pfft, why? (2, Insightful)

L4t3r4lu5 (1216702) | more than 4 years ago | (#32754284)

How idiotic. Name servers that way if you must, but workstations should be named by geographic location, building, room, station number. Nicknames don't count, but for sanity's sake name your equipment logically.

Re:Pfft, why? (1)

ultranova (717540) | more than 4 years ago | (#32755054)

The word you are looking for is "systematically", not "logically". And unless you're talking about a whole building's worth of computers, it's simply not worth it to indicate a location in the name, "Huey" is a lot easier to remember than "B2R22S15".

Re:Pfft, why? (1)

L4t3r4lu5 (1216702) | more than 4 years ago | (#32755208)

This guy was talking about a computer lab. I get the impression that Huey, Duey, Louie, Barney, Smarmey, Charley, Blarney, Indigo Montarney etc will get particularly bothersome to keep tabs on as a convention. Why not B2R22Cad4? CAD machine 4 in lab 22, building 2. Not easy to remember, but memory isn't required. You have all of the information you need without having to learn anything but a naming convention.

Re:Pfft, why? (1)

ArsenneLupin (766289) | more than 4 years ago | (#32758670)

This guy was talking about a computer lab. I get the impression that Huey, Duey, Louie, Barney, Smarmey, Charley, Blarney, Indigo Montarney etc will get particularly bothersome to keep tabs on as a convention. Why not B2R22Cad4

You know, if you are the kind of person that prefers numbers over pronouncable names, computers also have IP addresses. Just use those, and leave the hostnames for the rest of us...

Re:Pfft, why? (1)

RadioElectric (1060098) | more than 4 years ago | (#32755146)

And then when it's moved to another room?

Re:Pfft, why? (1)

L4t3r4lu5 (1216702) | more than 4 years ago | (#32755290)

Good point. It's terrible that hostnames are hard-coded into the operating system on installation, and that sticky labels are permanent fixtures once applied.

Boy, I could sure save some money replacing equipment which needed to be moved by changing the hostname and printing a new sticker.

Re:Pfft, why? (1)

Hognoxious (631665) | more than 4 years ago | (#32755702)

It's terrible that hostnames are hard-coded into the operating system on installation, and that sticky labels are permanent fixtures once applied.

Funny you should mention that - I worked in a place once where that appeared to be true.

By trial and error I discovered what the IDs of some of the printers were and relabled them. Next day, somebody had stuck new labels over the top with the old, wrong, IDs on them.

Re:Pfft, why? (1)

L4t3r4lu5 (1216702) | more than 4 years ago | (#32756154)

That doesn't even make sense! What the hell?! I could understand if they had problem remembering the exact printer model when selecting which area to print to and gave it a name like "Barney", but amending an identifier specifically to make supporting it easier? It beggars belief...

Re:Pfft, why? (1)

Hognoxious (631665) | more than 4 years ago | (#32758100)

Encroaching on territory, I guess.

It was full of people who wouldn't jump in a stream if their feet were on fire unless someone specifically told them to. And no, it wasn't military/defense related at all.

Re:Pfft, why? (1)

Quirkz (1206400) | more than 4 years ago | (#32758004)

Boy, I could sure save some money replacing equipment which needed to be moved by changing the hostname and printing a new sticker.

I think you're missing the point there. Yes, it's pretty easy to change a computer name, but then you also have to update all people and/or software that connect to the server as well, and that's far from trivial. Sending out a mass email to the entire company saying "we've moved the following five servers from room A to room B, so please remember to change the corresponding digits in their names whenever you use them" doesn't go over very well.

I've worked at a place where the server names were completely meaningless (star wars character names), and at a place where the computer names were very regimented (location-purpose-numeral), and the truth is both systems have plenty of problems. In the first case, most of the trouble had to do with people who couldn't remember how many O's or U's were in "dooku" -- plus they didn't have any reason to associate the name with the product. Even with systemized names, there's plenty of trouble tracking down what you want. Was that application you're looking for on citrix server 3 or 4? Are the files on the application server in site A or site B? The differences between dcbappsrv02 and dbcappsrv03 and dccappsrv02 can be difficult to recall at times. Also, while it's easy to say "go to dooku" it's a gigantic pain in the ass to try to tell someone over the phone the server they want is "B2R22Cad4" -- people are pretty good with names that are simple names; names that are built up from a formula that jams together apparently random letters and numbers does NOT work nearly as well.

All the same, I'd lean toward picking something fairly formulaic over something just random; but anyone who insists the formula method doesn't have its own problems is in denial.

Re:Pfft, why? (1)

RadioElectric (1060098) | more than 4 years ago | (#32758342)

I guess there are some people who are better at remembering room numbers and some people who are better at remembering "names". I don't know the numbers of any of the rooms on my corridoor (including mine... and no I won't turn around to read it off the door) - but I do know which one is Mondrian, Monet, Magritte etc.

Re:Pfft, why? (0)

Anonymous Coward | more than 4 years ago | (#32754040)

Yeah, but that wouldn't confuse anybody. I mean, people will still watch Firefly 4000 years from now, won't they?

Re:Pfft, why? (1)

martin255 (930726) | more than 4 years ago | (#32757754)

Label at least one computer "ham sandwich" to confuse future language researchers.

You might be interested in this approximation of what it would cause: http://www.mcsweeneys.net/2010/6/10packman.html [mcsweeneys.net]

Re:Pfft, why? (1)

ImprovOmega (744717) | more than 4 years ago | (#32757778)

You laugh, but names from a TV show used to be the server naming convention at one place I worked. Makes for interesting conversations:

"Uhura is down again and Kirk is acting up, Spock is still blocking incoming attacks."

"Alright, I'll bring up Scotty and RedShirt to take some of the overload. Promote Sulu to be in charge until we figure out what's wrong with Kirk."

Re:Answers to all TFA questions (0)

Anonymous Coward | more than 4 years ago | (#32753118)

And also throw away any cans of Axe Body Spray you may have lying around. That could make things really confusing, plus that stuff is just naaasty.

Re:Answers to all TFA questions (1)

scaryjohn (120394) | more than 4 years ago | (#32753192)

Maybe I should go around and write "computer" in English on all my computers, as a service to future language researchers.

Just don't, like many would do, put your label on the monitor.

Re:Answers to all TFA questions (1)

Quirkz (1206400) | more than 4 years ago | (#32758072)

Occasionally I'll find a pile of server faceplates (or bezels or whatever you want to call them) on the floor, in front of a stack of label-free servers that have had their faceplates removed. Talk about a waste of time trying to sort that out. Yes, I fix that situation with a labelmaker as soon as possible, but it really astounds me that anyone would think labeling a removable part (and nothing else) is the way to go.

Re:Answers to all TFA questions (3, Interesting)

vlueboy (1799360) | more than 4 years ago | (#32753760)

The decipherment of Ugaritic took years and relied on some happy coincidences — such as the discovery of an axe that had the word “axe” written on it in Ugaritic.

Maybe I should go around and write "computer" in English on all my computers, as a service to future language researchers.

Extinct language researchers examining english would fail at this same task 3000 years from now. English has no nouns --it has brand names: today's "computers" have big "Dell" logos but not "Computer."

Also, how would researchers realize that [Apple Mac Glyph] isn't an integral part of our "ancient moon runes" if seen from their era? :)

Re:Answers to all TFA questions (1)

MichaelSmith (789609) | more than 4 years ago | (#32753804)

Going further OT: In Harry Harrison's Stainless Steel Rat books people from the distant future wondered why their ancestors had named their planet "dirt".

Re:Answers to all TFA questions (4, Funny)

mrsurb (1484303) | more than 4 years ago | (#32753998)

Also, how would researchers realize that [Apple Mac Glyph] isn't an integral part of our "ancient moon runes" if seen from their era? :)

They'd probably see it as having some sort of religious significance. And they'd be correct.

Re:Answers to all TFA questions (2, Interesting)

DurendalMac (736637) | more than 4 years ago | (#32753104)

Darn. So the Voynich Manuscript is probably not a prime candidate.

Re:Answers to all TFA questions (1)

oljanx (1318801) | more than 4 years ago | (#32753386)

I wouldn't worry too much about that. They Voynich Manuscript is likely the work of a madman, who used a very inconsistent cipher to encode plain text from a language he was not fluent in. Then he added several hundred little tiny pictures of naked women, and a bunch of plants he saw on some sort of "vision quest".

Re:Answers to all TFA questions (2, Funny)

L4t3r4lu5 (1216702) | more than 4 years ago | (#32754302)

So you're telling me he was at Woodstock '69?

For those who don't know what it was like, clicky [youtube.com]

Re:Answers to all TFA questions (2, Insightful)

jd (1658) | more than 4 years ago | (#32753624)

Neither is my great great grandmother's cookbook. Which really is a shame, as I strongly suspect the recipes make something more edible than what's served at the local coffee shop.

Re:Answers to all TFA questions (1)

jhoegl (638955) | more than 4 years ago | (#32753194)

If I didnt RTFA, what makes you think Ill read your translation?

Re:Answers to all TFA questions (1)

ultranova (717540) | more than 4 years ago | (#32755164)

If I didnt RTFA, what makes you think Ill read your translation?

Can you elaborate on that?

Re:Answers to all TFA questions (1)

OnePumpChump (1560417) | more than 4 years ago | (#32753416)

So this probably isn't going to help with Rongorongo, then.

Re:Answers to all TFA questions (1)

grouchomarxist (127479) | more than 4 years ago | (#32753570)

In the case of Rongorongo, if it is a written language, then it is probably a written form of the Rapa Nui, the language of Easter Island. In any case since Rapa Nui is a polynesian language we'd be able to compare it to other Polynesian languages. However, this has already been done with no success.

Part of the problem with Rongorongo and with other undeciphered scripts is that we don't know what counts as a distinct character, the character vs. glyph problem. It is not clear from the article if this system helps with that problem. The article doesn't have enough detail, but it seems that their system makes a lot of assumptions that you can't make when trying to work with an undeciphered script.

Re:Answers to all TFA questions (1)

mattj452 (838570) | more than 4 years ago | (#32754102)

The process they describe assumes the characters are known, alphabetic and words are somehow separated. So I guess until the characters have been separated, it won't do much good on Rongorongo.

Re:Answers to all TFA questions (1)

pookemon (909195) | more than 4 years ago | (#32753474)

The decipherment of Ugaritic took years and relied on some happy coincidences -- such as the discovery of an axe that had the word "axe" written on it in Ugaritic

Sorry but I had to lol at this. What was actually written on the axe was "Bill" - because it was his axe. And now the deciphered writings are all containing phrases like "That duck has an axe!", "The members voted and passed the new Axe" and "Monday - Remember to pay the Axes".

But I guess it does make sense to write "Axe" on an "Axe", just to be sure. Oh bugger - where'd I put my Axe, all I can find is my Bill.

Re:Answers to all TFA questions (0)

MichaelSmith (789609) | more than 4 years ago | (#32753494)

Yeah if you work on a building site you engrave your name on your tools. But fire axes in my building are labelled, as are toilets and emergency exits, even though the labels are pretty obvious.

The yarra river in Melbourne has that name because the local aboriginal people pointed to the river and said that word but it turned out later they were commenting on the rate of flow.

Re:Answers to all TFA questions (1)

jd (1658) | more than 4 years ago | (#32753680)

It's also why an inordinate number of mountains are called "your finger, you fool" and "who is this fool who doesn't know what a mountain is?"

Linear A Implications (5, Interesting)

DowdyGoat (1830958) | more than 4 years ago | (#32753124)

This is very cool for us undeciphered language fans.

In the article, the language author Andrew Robinson correctly points out that this computer program won't work for languages that don't have a known language that is close to them, say like for Linear A found on Crete, which is definitely not Greek like Linear B turned out to be. There is a lot of speculation that Linear A is a native Minoan (Cretan) script, largely unrelated to any other known script.

However, parallel with Linear A on Crete was a Cretan pictographic script, which may, or may not be related to Egyptian hieroglyphics. The Minoans had known trading ties to Egypt, which had written language long before them. If a relationship could be found (via this computer program) between the Minoan pictographic script and Egyptian hieroglyphs, then that might give insights into how the Linear A script was set up (which is a syllabary script).

The only difficulty is that there may not be enough of the pictographic script to work--I'd imagine you'd need a fair number of examples to really allow the computer to compare and contrast.

Re:Linear A Implications (0)

Anonymous Coward | more than 4 years ago | (#32753578)

Presumably the script homology analysis depends on similarity of strokes and letter structure. That doesn't work with pictographics, which are at best different representations of the same, and at worst plays on homophones. In the case of Chinese languages, you can even have different pronunciations for the same ideogram!

Re:Linear A Implications (3, Informative)

KritonK (949258) | more than 4 years ago | (#32753686)

Actually, the program might be able to help: From what I understand, the Linear A alphabet is related to the linear B alphabet, which has been deciphered, even though the languages may be different. We know a bit about context (what we have are mostly inventories), and we even know the meaning of one word: the one next to the total of the amounts in the inventory probably means "total". Furthermore, that word, ku-ro, is similar to a form of a Greek word for "total" ("houlon"), so it is very likely that the language is at least indoeuropean in origin. One could try using various indoeuropean languages as candidates for the related language, until the program comes up with something meanngful.

Now, if only we had a larger sample of the language of the disk of Phaestos...

Re:Linear A Implications (1)

Guido von Guido (548827) | more than 4 years ago | (#32758258)

Nah, it's not gonna be much help with Linear A. Although without a solid decipherment it's hard to be sure, a majority of the characters in Linear B also appear in Linear A. There are also names that appear in both scripts. This of course no guarantee that all the symbols had the same values in both scripts, but it's a reasonable starting point.

Furthermore, Linear A is a syllabary, not an alphabet, and they used logograms extensively. Ugaritic, being an alphabet, is much simpler. They haven't demonstrated the program against a non-alphabetic script. Identifying logograms is a big jump.

Finally, the language used in Linear A is unknown. Their program used knowledge of a well-known and very similar language (Hebrew) to decipher Ugaritic. If it turns out that Linear A is related to a known language, the program could presumably help. If it's related to an unknown language (like Etruscan), well, it's not going to be of much use.

Re:Linear A Implications (1)

jd (1658) | more than 4 years ago | (#32753696)

Well, a more obvious implication is that if you fed in some percentage of Linear A texts and Cretan pictographic texts, you'd get virtually the same results as feeding in a different set of texts (ie: symbols should always equate to the same opposite number) if they are truly related.

This would at least let you identify if the texts are indeed of the same language, even if you can't read it, which is further along than we are now.

Re:Linear A Implications (1)

Hognoxious (631665) | more than 4 years ago | (#32755800)

What's the difference between linear A and perl?
One day we might be able to read linear A ... drrrtish!

Re:Linear A Implications (1)

HungryHobo (1314109) | more than 4 years ago | (#32757134)

On the other hand it could be useful since a program could do such a test against every known language quickly as long as you rented enough CPU time.
I imagine such a task would take a long time to do by hand.

Next step: (2, Insightful)

BoppreH (1520463) | more than 4 years ago | (#32753140)

Voynich manuscript! [wikipedia.org]

If only we could find a language that is similar enough...

Re:Next step: (1)

MichaelSmith (789609) | more than 4 years ago | (#32753216)

Thats amazing. I will have to set aside some time to go through it. My guess is that the document is an attempt to create a written script for an Asian language which is only spoken. Cantonese comes to mind because speakers of that language currently borrow mandarin and chinese writing when they want to write stuff down.

Re:Next step: (1)

iserlohn (49556) | more than 4 years ago | (#32754394)

Cantonese is a dialect of Chinese (as is Mandarin). In fact it is more akin to Middle Chinese than modern Mandarin. It is commonly accepted that Tang dynasty poetry sounds better in Cantonese due to the more similar tonal structure. Basically, it is believed that Cantonese has gone through less changes over the (1500) years from Middle Chinese than Mandarin.

It is similar to how it is now believed that Elizabethan English sounds more like American English than British English / Received Pronunciation. When colonists leave a mother country to settle a new area, which is sufficiently cut off from the rest of its culture, it tends to preserve more features of the original spoken language.

For Cantonese, the areas of Guangdung Province (ie. Canton Province) was settled around the time of the Han and Tang Dynasties, displacing the native (most likely) Polynesian tribes that lived there before. You can write Cantonese in Chinese, but some charaters that are used is specific to Cantonese to denote Cantonese words.

Another thing to note is that Chinese was not invented to write Mandarin, but in fact was the script used to Classical Chinese (a standard form of written Chinese grammar and lexicon from well over 2000 years ago). All Chinese dialects adapted the script to write venercular words subsequently.

Re:Next step: (0)

Anonymous Coward | more than 4 years ago | (#32755236)

It is similar to how it is now believed that Elizabethan English sounds more like American English than British English / Received Pronunciation.

Believed by some Americans, you mean. Not by historical linguists.

Shakespeare would find the average American's pronunciation every bit as alien as RP -- just in different ways.

Re:Next step: (1)

lakeland (218447) | more than 4 years ago | (#32753444)

That's interesting, I have not come across this before.

I last worked in computational linguistics over five years ago and but when I left there were a good supply of techniques for automatically extracting meaning from an unknown text.

My own research was able to build up both a dendrogram and word vectors from any sufficiently large corpus, and a quick google search turned up http://www.springerlink.com/content/fp17278783422256/ [springerlink.com] which shows that the field is continuing to develop. I would expect that by now it would be pretty easy to feed a text like this in and get word associations out. From your word associations, building up a basic dictionary will still need you to bootstrap associated concepts but at least the task is much smaller and there's a lot of support for checking.

I don't recall much successful research into automatic parsing of unknown languages, but since I left the field it could've progressed. Shallow parsing would be a good place to start. Since the language's stemming is unknown you're going to be hard-pressed to parse it anyway but POS tagging should be doable.

I have not done any work with cyphered texts, so I'm assuming that approaches to natural languages will apply. No doubt there is research in this area, I'm just not familiar with it.

Re:Next step: (1)

Trepidity (597) | more than 4 years ago | (#32753566)

The problem is that one of their four assumptions is that the script for the undeciphered language maps characters 1-to-1 onto an existing language's script in a way such that letter frequencies are similar, which is something people have already looked for and which appears not to be the case with the Voynich manuscript.

Interesting (1)

jfoobaz (1844794) | more than 4 years ago | (#32753174)

Could be handy. It is a bit limiting that it requires you pair the target language with one where you're relatively sure that the morphology and word roots are fairly similar, and that the writing systems are similar (structurally and statistically).
I guess there might be some way to handle some possible differences in script type (comparing a language written with alphabetic system to one written using a syllabary or abjad) by producing a fake alternate writing system for the known language that would be plausibly similar to the target. You're probably screwed if you're going from a phonetic-type writing system to a (possibly partially) logographic one, though.
For cases where there were a variety of competing theories about the nature of the script and language it represents, it might speed up the process of checking alternatives. Maybe it's better to think of this as a tool for testing proposed solutions, rather than automatically discovering them.

Sigh. (1)

slasho81 (455509) | more than 4 years ago | (#32753240)

Unaided computer program != computer AI. Not even if you use Bayesian statistics. Leave the hyperbolic headlines to the common newspapers. After all, This Is Slashdot.

Re:Sigh. (0)

Anonymous Coward | more than 4 years ago | (#32754104)

Then what is AI? Is a chess AI, AI? After all, it only tries every possible combination (in the endgame) or does a tree search with some heuristics (when it can't check all of them). That's not very "intelligent" -- when you know the trick.

£337 $p33|{ |)00d$ (1)

ZirconCode (1477363) | more than 4 years ago | (#32753262)

1 b37 17 \/\/1££ |-|4\/3 pr0b£3|\/|$ \/\/17|-| |\/|¥ b3r |-|4xx £337 $p33|{

Re:£337 $p33|{ |)00d$ (1)

nextekcarl (1402899) | more than 4 years ago | (#32753442)

It probably would have problems with your leet hacker speak, but it isn't that hard to decypher. Then again, since some of the output I've had from OCR resembles your text, maybe not...

So, Linear A is next? (0)

Anonymous Coward | more than 4 years ago | (#32753326)

or Egyptian?

Voynich ? (1)

mbone (558574) | more than 4 years ago | (#32753574)

So, when are they going to apply this to the Voynich manuscript [wikipedia.org] ?

"Axe" on axe ?!? (1)

LongearedBat (1665481) | more than 4 years ago | (#32753650)

the discovery of an axe that had the word “axe” written on it in Ugaritic

A conversation in Semitic times:
"What's that?"
"Dunno..." examines the object "...it says on here that it's an axe."

Re:"Axe" on axe ?!? (0)

Anonymous Coward | more than 4 years ago | (#32754054)

More likely it went something like this:
"I hear this Axe fellow makes pretty good woodcutting implements."
"Yea, he even has to engrave them - prevents counterfeiting, you know?"

Re:"Axe" on axe ?!? (1)

Nyder (754090) | more than 4 years ago | (#32755234)

the discovery of an axe that had the word “axe” written on it in Ugaritic

A conversation in Semitic times:

"What's that?"

"Dunno..." examines the object "...it says on here that it's an axe."

Honestly, i would think that it was the name of the person who owned it myself.

Google is missing out (1)

WindBourne (631190) | more than 4 years ago | (#32753684)

They should put on-line a DB of documents that have been translated and then allow others to build a translator. In fact, if smart, they would do this as a competition in which the winner could create a new company based on it, with a large investment by Google.

Re:Google is missing out (1)

the_other_chewey (1119125) | more than 4 years ago | (#32754848)

Google has dominated the NIST machine translation competition for years before
they stopped participating. I don't think that they need too much external help.

Re:Google is missing out (1)

WindBourne (631190) | more than 4 years ago | (#32757128)

Yes, they have dominated. HOWEVER, with the approach that I just suggested, it would allow them to help move things up a notch, and give them a chance to buy a potential competitor down the road, and turn it into subsidiary.

Screw the article.... (2, Informative)

djupedal (584558) | more than 4 years ago | (#32753698)

IBM, as one example, has been on this hard since 2002 ( http://news.cnet.com/2100-1008-998264.html [cnet.com] ) when the prize was first announced....stop going all lady gaga over stuf that is so old it can't even be recycled properly.

You want to impress me... (3, Funny)

ngc5194 (847747) | more than 4 years ago | (#32753806)

... see if it can decipher some of the perl code I've had to take over.

Iberian from Basque Language? (1, Interesting)

Anonymous Coward | more than 4 years ago | (#32753952)

Iberian language was spoken in Spain before the Roman Empire. It has some similarities with Basque Language. The texts in iberian are few, anyway I wonder if this language could be decoded using this tool.

Yes but... (1)

fr4nk (1077037) | more than 4 years ago | (#32754266)

I need something that understands the binary language of moisture vaporators.

undecipherable languages? (1)

ArcadeNut (85398) | more than 4 years ago | (#32754656)

If they are undecipherable languages, how do they verify the results are accurate?

They don't know (1)

brokeninside (34168) | more than 4 years ago | (#32755546)

Which is why they tested it on a deciphered language. They are making the assumption that if it is relatively accurate in one case which meets all four of their preconditions that it will be relatively accurate in more cases which meet the same preconditions. That seems to me to be a reasonable assumption.

But also note that, at present, this tool best serves as an aid to those trying to decipher languages. The article states that the output has limitations that make it rather inutile for the general public. As such, the worst that can happen is that it might send researchers down the wrong road for a bit. But if it can provide one or more keys that will help researchers crack undeciphered languages, then it will be a massive help.

And, as the article points out, this approach may also lead to new progress in machine translation of known languages.

So it's not really a revolution in the field. But it is a new technique that looks promising with regards to helping areas that are currently sitting on plateaus.

Re:undecipherable languages? (1)

Guido von Guido (548827) | more than 4 years ago | (#32758452)

If they are undecipherable languages, how do they verify the results are accurate?

In general, there are two ways to test a decipherment. The first is to compare it to a bilingual text (e.g., the Rosetta Stone). Ancient Sumerian is apparently unrelated to everything else, but there were a lot of bilinguals so the decipherment is pretty firm.

The second method is to use the decipherment to decipher a new text. For instance, the first big test for Michael Ventris's decipherment of Linear B was using it on some newly discovered tablets. Obviously there's more uncertainty with this method, since it's still possible you're completely wrong. The more texts you can successfully decipher, the better the odds are that your decipherment is good. The Mayan glyphs and Linear B fall into this category.

In general, it's easiest to decipher if you've got a lot of texts, particularly bilinguals, and if the language is related to a well-known language. Ancient Egyptian was closely related to Coptic, and in the same family as Arabic and Hebrew, as was Ugaritic and ancient Akkadian (AKA Babylonian and Assyrian). The Mayan glyphs were written in a language in the Mayan family of languages, which are still spoken today. The language in the Linear B tablets turned out to be archaic Greek.

TFA is unintentionally funny (1)

the_other_chewey (1119125) | more than 4 years ago | (#32754864)

From TFA:

An incidental challenge in developing a computer system that could decipher Ugaritic (inscribed on tablet) was developing a way to digitally render Ugaritic symbols.

Riiiiight. What did they feed their software? Photographs of stone tablets?

But it can't translate... (0)

Anonymous Coward | more than 4 years ago | (#32755306)

While it might be able to translate an ancient, dead language; I doubt it will be able to ready every Perl program ever written.

Shaka! (1)

Wagoo (260866) | more than 4 years ago | (#32755390)

SHAKA! When the walls fell. :(

Now try it on something else. (1)

Arancaytar (966377) | more than 4 years ago | (#32756726)

Like this [wikipedia.org] .

ObVoynich (1)

DdJ (10790) | more than 4 years ago | (#32759658)

(Insert obligatory wishful thinking about the Voynich Manuscript here.)

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>