Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Coming Soon, The Google Translator

CmdrTaco posted more than 9 years ago | from the trumping-the-fish dept.

Google 418

compuglot writes "Google gave journalists a glimpse of its next generation machine translation system at a May 19th Google Factory Tour. "Google Blogoscoped" offers an excellent overview of the presentation. The system has been trained using the United Nations Documents as a corpus. This corpus is some 20 billion words worth of content. It uses existing source and target language translations (done by human translators at the U.N.) to find patterns it then uses to build rules for translating between those languages. Apparently it was successful where the current version had failed in translating certain phrases. If anyone were capable of making a serious go of MT, that would have to be Google."

cancel ×

418 comments

Sorry! There are no comments related to the filter you selected.

fascinating (5, Informative)

professorhojo (686761) | more than 9 years ago | (#12683644)

since the RTFAs lacked any kind of crunchiness, i sourced some great stuff here [jhu.edu] that does a wonderful job explaining how this system works, and gives the advantages the statistical translation method has over the rules-based approach. as well as the disadvantages.

fascinating stuff:

"Currently, most machine translation technology, including consumer-oriented programs such as Systran's Babel Fish, have been "taught" the rules of language, such as verb tenses and when to use parts of speech. Programmers painstakingly hand-build systems based on such rules. "The computer is told, if you see this thing in Russian, replace it with this thing in English," explains Yarowsky.

"While somewhat effective, such systems are time-consuming to build (consider how long it takes most humans to learn a language and all its rules), and resulting translations are still marred by grammatical and other errors. Those that do work fairly well usually tackle popular Western languages, such as French, German, and Spanish; there are few translation programs developed for other important tongues, such as Chinese, Turkish, or Arabic, let alone for more obscure languages like Tajik.

"To tackle a broader range of the world's languages, and to improve on the quality of machine translation, Yarowsky and his Hopkins colleagues are developing computer programs that can be trained to figure out any language using statistical analysis, i.e., looking at the probabilities of language patterns. In what's known as automatic knowledge acquisition, the computer could "learn" Serbian well enough to translate future documents or conversation, or at the least pick out pertinent words like "bomb."

"As Yarowsky explains: "Say you want to teach a computer how to translate Chinese: You give the computer 100,000 sentences in English and the same 100,000 sentences in Chinese and run a program that can figure out which words go to which words. If in 2,000 sentences you have the word Washington, and in about the same number of sentences you have the word Huashengdun, and they occur in the same place in the sentence, these words are likely translations.

"It's all just observation," Yarowsky adds. "Children do the same thing, but they also do it through visual stimulation and feedback. They see a book and hear the word 'book,' and eventually they learn that it's a book. They see a bird with its wings flapping around and learn that is called a bird. It's the same with machines, only they have much better memories. Computers could remember exactly when and where they saw the words bird and book."

"So, instead of telling a computer how to do something -- conjugate the verb 'to be' in Spanish, for example (I am = soy) -- researchers give it tens of thousands of examples and program the computer to find repeated patterns that the computer can use to conjugate new verbs. Trained this way, the program could potentially "learn" phrase structure and the rules of translation.

"As Yarowsky notes in his 100,000-sentence example, one way to accomplish automatic knowledge acquisition is to use bilingual or parallel text. The program "reads" a document in English and then a version in a second language. Such texts used by Hopkins researchers include the Bible, which is available on the Web in more than 60 languages, the Book of Mormon (over 60 languages), and the United Nations Declaration of Human Rights (240 languages).

"Aiding the computer is the fact that the English version of such texts can be annotated by hand or using another computer program -- essentially marked up to show, for example, that Jesus is a noun and pray is a verb. The translation program-in-training needs such information because it cannot translate future text just by substituting individual words in each language; it must also be able to analyze how sentences work. To do so, the computer program uses pattern recognition templates and other tools to understand sentences on a syntactic level. Simply put, the program is essentially given clues to know what to look for, notes Yarowsky: "It should figure out the subject, figure out the object, and other elements of sentence structure."

Re:fascinating (2, Funny)

Anonymous Coward | more than 9 years ago | (#12683709)

This is great and all, but I won't be impressed until it translates the gibberish that comes from the Iranian gas station attendant everytime I stop for gas.

For now, I just nod my head in ignorance, and count my change.

Re:fascinating (1)

I_Heat_Sexylaid (675028) | more than 9 years ago | (#12683944)

60% Of U.S. Believe Porting Open Source to Minor Hiper Type-R Modular Independent Cartoonists [who] Band Together for Star Trek XI Intel Preps [for] Debian Sarge ['s] First look at ["] Coming Soon, The Google Video ["] [considered harmful].

Re:fascinating (5, Funny)

Anonymous Coward | more than 9 years ago | (#12684037)

You go all the way to Iran to get gasoline? Who are you, George W Bush?

Re:fascinating (0)

tomhudson (43916) | more than 9 years ago | (#12684067)

This is great and all, but I won't be impressed until it translates the gibberish that comes from the Iranian gas station attendant everytime I stop for gas
Never work. Too much spit gums up the microphone.

Re:fascinating (0)

Anonymous Coward | more than 9 years ago | (#12683768)

One use that the article doesn't mention is that this could be used to train people to read other languages; try to read the article in the foreign language, then compare your reading against the auto-generated reliable translation. This approach is problematic with the sometimes bizarre and unreadable translations generated by current translation software...

Re:fascinating (5, Interesting)

NoMoreNicksLeft (516230) | more than 9 years ago | (#12683867)

Some questions:

Why can't a dictionary be made of nouns, of verbs? Why can't we have it statistically analyze the grammar for ambiguous words?

Does it only recognize exact matches? Especially with verb conjugation, I'd think any words 80% similar or so should be considered matches. Not all languages are as conjugation happy as latin or spanish or even english, and you often lose some nuanced conjugations when translating from one to the other.

What will be done about idioms? Translating these word for word often makes no sense at all, and for me at least (no idea what the official stance is), I'd rather they substitute in idioms with the same general meaning, but for the culture being translated to.

Does it work on alternate character systems, is it word boundary dependent?

Does it understand punctuation rules, will this post translated to spanish have the upside down question marks where they're supposed to be?

How many of the world's existing languages have enough text for this to even be feasible?

Re:fascinating (0)

Anonymous Coward | more than 9 years ago | (#12683894)

for example, that Jesus is a noun and pray is a verb

"Jesus is a verb, not a noun". Ricardo Arjona

Re:fascinating (4, Interesting)

MoonBuggy (611105) | more than 9 years ago | (#12683903)

Sounds like a very good approach, but am I the only one to see an issue in the texts they're using that are already available in multiple languages?

The examples given (two religious texts and a legal one) don't really sound like the best things for teaching a "blank slate" program a new language. I understand that it's looking for structure and rules rather than word-for-word links, but the Bible uses many outdated or non-standard phrases and sentence structures, as does most legal text I've ever seen. I'm not a linguist or a statistician, but from my uneducated viewpoint it sounds like problems might arise in the texts that are available for training the system. Anyone know how they're planning to overcome this?

Re:fascinating (5, Insightful)

elrous0 (869638) | more than 9 years ago | (#12683935)

or at the least pick out pertinent words like "bomb."

Why do I have a funny feeling that this research isn't being funded by philanthropic foundations?

-Eric

Re:fascinating (2)

browngb (823753) | more than 9 years ago | (#12683941)

Oh God, it's going to learn languages from examples? I hope they don't try this over the net, otherwise we'll have computers writing LOL, IC, and other nonsense.

Re:fascinating (4, Funny)

fizban (58094) | more than 9 years ago | (#12683987)

"Open the pod bay doors, HAL."

"STFU, Dave. LOL!"

Re:fascinating (1)

DrEldarion (114072) | more than 9 years ago | (#12683997)

It's all just observation," Yarowsky adds. "Children do the same thing, but they also do it through visual stimulation and feedback. They see a book and hear the word 'book,' and eventually they learn that it's a book

What would be really incredible is if they could combine this with Google Image Search and get the computers to be able to recognize pictures as words.

Re:fascinating (1)

Carnil (876285) | more than 9 years ago | (#12684026)

Could this approach be used in natuaral language recognition software?
The system could be made to learn the translation between common language structures into well formed, parseable sentences that it could then process.
Just a thought, maybe i'm being a bit too optimistic here.

Needs a *bit* more work... (4, Interesting)

TripMaster Monkey (862126) | more than 9 years ago | (#12683650)


Just to illustrate, here's the summary of this story, translated to German and back to English using Google's current version [google.com] :

Google gave a Glimpse of its machine Uebersetzungsystems the following production at the factory route of the A May 19 to journalists. Google. "Google Blogoscoped" offers an excellent overview of the representation. The system was trained with the nation documents as korpus. This korpus is something 20 billion word value of contents. It uses the existing target language translations (takes place via human translators at the U.N.) Samples find, which use it then to establish guidelines for translating between those languages. Apparent it was successful, where the present version had failed, if it translated certain cliches. If everyone of forming a serious were capable, of the M.Ue., those would go to have having to Google.

Re:Needs a *bit* more work... (1)

mattmentecky (799199) | more than 9 years ago | (#12683665)

Isnt that what TFA is about? That the new version needs a "bit" more work so they developed a new system?

Re:Needs a *bit* more work... (1)

TripMaster Monkey (862126) | more than 9 years ago | (#12683704)


That's why I said 'just to illustrate'.

Bork bork bork! (4, Funny)

AtariAmarok (451306) | more than 9 years ago | (#12683673)

Here is the result as interpreted by the Swedish Chef:

"Guugle-a gefe-a a Gleempse-a ooff its mecheene-a Uebersetzoongsystems zee fullooeeng prudoocshun et zee fectury ruoote-a ooff zee A Mey 19 tu juoorneleests. Guugle-a. "Guugle-a Bluguscuped" ooffffers un ixcellent ooferfeeoo ooff zee representeshun. Zee system ves treeened veet zee neshun ducooments es kurpoos. Thees kurpoos is sumetheeng 20 beelliun vurd felooe-a ooff cuntents. It uses zee ixeesting terget lungooege-a trunsleshuns (tekes plece-a feea hoomun trunsleturs et zee U.N.) Semples feend, vheech use-a it zeen tu istebleesh gooeedelines fur trunsleteeng betveee thuse-a lungooeges. Epperent it ves sooccessffool, vhere-a zee present ferseeun hed feeeled, iff it trunsleted certeeen cleeches. Iff iferyune-a ooff furmeeng a sereeuoos vere-a cepeble-a, ooff zee M.Ue-a., thuse-a vuoold gu tu hefe-a hefeeng tu Guugle-a."

Looking forward to a www.borkle.com which returns all its results in such a format.

better than most Engrish. (1)

enigmals1 (667526) | more than 9 years ago | (#12683722)

Hey, it's a lot better than most of the Engrish you find on /. and IM. ;)

Re:Needs a *bit* more work... (1, Informative)

Anonymous Coward | more than 9 years ago | (#12683766)

The current version of "Google translates" is based on Babelfish (a rule-based machine translation system), it isn't based on Google's research into SMT (statistical machine translation)

Re:Needs a *bit* more work... (1)

JWeinraub (773433) | more than 9 years ago | (#12683788)

For the time being, machine language isn't supposed to do your German homework well enough the teacher actually belives you did it. I think it's useful for getting a rough idea. Once it's in the target language, since we are all fluent in the language it translates into, we can figure out what its supposed to say in perfect grammar. And when those weird words that don't get translated, I am sure we can just Google it and find out what it means. This still does have a long way to go, but it does do a decent job. However, with the Google Browser, I am sure it will be neat seeing blog comments in several different languages. As far as the reader of the site is concerned he thinks its a blog in their native tongue...So reading the site untranslated might have several different languages as comments, which can be neat all on its own.

Re:Needs a *bit* more work... (0, Offtopic)

grasshoppa (657393) | more than 9 years ago | (#12683847)

You gotta be kidding me, this was modded as flame bait?

MODS: Wake the fuck up. THIS post can be considered flame bait ( watch, I'll be modded insightful or interesting ).

Re:Needs a *bit* more work... (0)

Anonymous Coward | more than 9 years ago | (#12683946)

I modded this as redundant, but since it's now modded 5, Interesting, I'm going to post to get my mod points back.

It shouldn't be news to anyone that the current google/altavista-translator is crap. It would have been much more interesting to see how the new translator would handle the news blurb.

Re:Needs a *bit* more work... (1)

I_Heat_Sexylaid (675028) | more than 9 years ago | (#12683967)

["] Life Exists On Other Planets [,] Platforms is Harmful [a] Blue Line 580W PSU Review [." a] Success In Two To Three Years. [for the] Mac Mini Look-Alike [is] Coming Soon [in] new Battlestar Galactica Episodes [with a] Translator for Skype Users [considered harmful].

Google's translator (2, Interesting)

bcmm (768152) | more than 9 years ago | (#12683653)

So what powers Google's current translator? I have seen it give word-for-word the same as Babel on some occasions (but with better handling of non-ASCII characters).

Re:Google's translator (5, Informative)

iantri (687643) | more than 9 years ago | (#12683736)

SystranSoft's Systran [systransoft.com] is behind almost all of the machine translation srevices on the Internet, lincluding Google's.

Re:Google's translator (1)

metlin (258108) | more than 9 years ago | (#12683793)

Wow, that's just fantastic!

Thanks, I was looking for some of the less common languages, and it turned out that Systran has those.

Owe you one, mate.

Re:Google's translator (1)

Nytewynd (829901) | more than 9 years ago | (#12683775)

From the sounds of things, Google learns with a neural network. It has the ability to learn new mappings based on pattern matching. Babblefish sounds like a distinct mapping of phrases that have been hand coded.

Theoretically, Google can get better at translating over time, as it's neural network learns better connections. It might even get better than a human translator if it goes long enough. There will always be small discrepancies, but if the bulk of the text is correctly translated, that would be good enough.

Integrate with GMAIL! (5, Interesting)

RubberDogBone (851604) | more than 9 years ago | (#12683688)

Make this work with Gmail and I'd even pay money for it!

Tired of getting email from Amazon.DE on my Gmail account and having to copy and paste it over to Babelfish.

That would be very useful for me.

Coming soon...... (0, Flamebait)

Cmdr Whackjob (883018) | more than 9 years ago | (#12683690)

www.googledot.org and www.appledot.org

Anyone care to make a bet? (4, Funny)

Weaselmancer (533834) | more than 9 years ago | (#12683695)

That Microsoft will announce a new revolutionary language translation service sometime in the next two weeks or so?

Re:Anyone care to make a bet? (2, Informative)

Anonymous Coward | more than 9 years ago | (#12683798)

Well, it's not like they don't have the technology...

http://research.microsoft.com/nlp/Projects/MTproj. aspx [microsoft.com]

Unsupported assertions (2, Insightful)

gowen (141411) | more than 9 years ago | (#12683710)

If anyone were capable of making a serious go of MT, that would have to be Google.
Erm... why is that? Is it because machine translation in some sense search technology? Because they've hired reknowned experts in natural language processing? Because they've got a lot of money slushing around and employ a lot of generally smart people?

Oh, no. It's because geeks like Google. Therefore, Google are capable of superhuman feats that mere scientists -- those with years of experience in relevant fields -- are incapable of doing.

Re:Unsupported assertions (-1, Offtopic)

gowen (141411) | more than 9 years ago | (#12683771)

Why does expressing preference for critical thinking over mindless cheerleading always get moderated as "Flamebait"?

I didn't even say bad things about Google, only that the submitter was making unsupported (which is not the same as untrue) assertions.

Re:Unsupported assertions (3, Funny)

tobybuk (633332) | more than 9 years ago | (#12683840)

Look pal, you said something about Google that could be taken a negative. Here on Slashdot that is only slightly better that saying something good about Windows. But thank your lucky fucking stars you didn't decide to disparage the immortal being that is Linux. That's worse than flushing the original Koran down the pan.

Re:Unsupported assertions (4, Insightful)

stevejsmith (614145) | more than 9 years ago | (#12683778)

No, it's because Google has tons of talent, money, already-archived text to work with, computers, respect in the industry, and consumer base. I can't think of a company that possesses these characteristics more so than Google.

Re:Unsupported assertions (1, Insightful)

gowen (141411) | more than 9 years ago | (#12683821)

Well, (oh dear, here comes the Flamebait mod again), I'd argue that Microsoft has more of all of those, with the possible exception of "respect in the industry." As does IBM, Dell, Cisco ... and any number of other well established, Blue Chip IT companies.

Furthermore, Google's ideas are not new. People have doing things like this for years. But here on slashdot, a google press release about their latest software which doesn't even exist yet gets treated like the announcement of an earth shattering invention.

Re:Unsupported assertions (1)

benjcurry (754899) | more than 9 years ago | (#12683942)

Yes...Google also has a history of fulfilling on its hype, in stark contrast to MS.

Re:Unsupported assertions (1, Insightful)

gowen (141411) | more than 9 years ago | (#12683973)

Really? Google search is great, and Gmail's a adequate front end attached to a webmail system whose sole selling point is the massive amount of storage space.

But have you seen the monstrosity when that front end got belted onto the deja Usenet archive? Google Maps is usable, but it's hardly ground breaking.

And other than those things, exactly what hype have google delivered on?

Re:Unsupported assertions (1)

benjcurry (754899) | more than 9 years ago | (#12684038)

Well, I think I mentioned what they had delivered on. Gmail and Google Maps are groundbreaking in the sense of being some of the richest client-side applications the web has seen as of yet. Gmail is a joy to use, well organized and hassle-free (IMO). I haven't seen the Usenet/fron end thingamabob you mention, though. Google Maps offer many advanced features. My favorite is "bicycle shops near 121 Main street, Podunkville, VA". Brings up all the bike shops in close proximity to the address, with their phone #'s, etc. Like the yellow pages on steroids.

Re:Unsupported assertions (3, Interesting)

stevejsmith (614145) | more than 9 years ago | (#12684018)

Dell and Cisco are not in this business. IBM is not hemorrhaging with cash in the way Google is. Microsoft is not in the business of providing free Internet accessories. In any case, Google has a track record of innovative ideas ("innovative ideas" meaning that not only did they come up with it and implement it partially, but they invested full-on into it, bet money on it, and made it better than the competition) and is most likely of any company who would announce this to actually pull through with it. If some little start-up announced this (as I'm sure a few have), people would take it with a grain of salt. But that Google announces it, I'm sure most people believe fully that Google will deliver on its promise.

And you're right, people have thought of this exact idea (I'm sure every other computer major and linguist has, in fact, since the birth of ENIAC--I know the idea's crossed my mind tons of times, not that I'd have the slightest clue how to do it), however actually attempting to do it with a reasonable chance of success? I'm going to say Google is the first.

Plus, I got the impression from the article that the serve is operational, just not available to the public. If you'll read the article, you'll find that the translator properly translated a fairly complicated phrase from Arabic to English. I'd guess that this service is, from a technical standpoint, at least 95% done -it's just the packaging and touching-up that needs to be done.

Re:Unsupported assertions (5, Interesting)

KagatoLNX (141673) | more than 9 years ago | (#12683834)

Ummm, geeks like Google because Google employs scientists. Which mere scientists were you talking about?

Were you talking about the PhDs at universities busy teaching classes, churning out research papers to avoid being fired (an ugly numbers game some departments play), or perhaps burning time generating volumes of grant paperwork?

Oh, maybe you were talking about the scientists employed by the private sector. I'm sure the management teams wherever they work are willing to take the time and care that Google won't.

You do know how may PhDs Google employs, right? Not to mention that they won't be fighting for resources there either. No backstabbing, liquidating MBAs trashing their corporate budget. No football-crazed alumni assassinating their funding proposals either.

Also, I would remind you that "mere scientists" often come up with the needed research (there are volumes in MT alone), but rarely can afford to put in the years that it takes into a good implementation.

Geeks love Google because it is, in many respects, where the best of business meets the best of academia.

Re:Unsupported assertions (0, Flamebait)

gowen (141411) | more than 9 years ago | (#12683909)

churning out research papers to avoid being fired
I love how you believe "churning out research papers" is somehow orthogonal to doing research.

Re:Unsupported assertions (2, Insightful)

benjcurry (754899) | more than 9 years ago | (#12683915)

Oh, come on! It's because in the past, most of what Google has undertaken has been enormously successful and useful. Yeah, they hire alot of smart people and have lots of money. Gmail (IMO) is the golden standard of free webmail. Google Maps (IMO) is the best map system out there. They also are responsable for Adsense, Adwords and I think they even have a search engine that gets a good amount of hits per diem. Maybe there is a reason to think this translation thingamabob will be good!

Re:Unsupported assertions (2, Insightful)

imroy (755) | more than 9 years ago | (#12684069)

Erm... why is that?

Because Google has shown that it knows how to handle large amounts of human-created content and create useful information from it. The search engine was just the start. Just look at the spell checker they added. It doesn't use a dictionary, just the mass of web pages they spider monthly. It's not always perfect, but it allows it to be more adaptive than other methods. This translator looks like something similar along those lines.

so name.. (1)

Turn-X Alphonse (789240) | more than 9 years ago | (#12683714)

Googlefish or babelgoogle? Maybe we shouldjust change "internet?" to google and every site much have google involved.

Googlesoft.com
Googlenix.com
Opengoogle.org
g ooglejournal.com

Re:so name.. (0)

Anonymous Coward | more than 9 years ago | (#12683750)

maybe it's going along with the current trend and will be: translate.google.com

Piffle (4, Funny)

ear1grey (697747) | more than 9 years ago | (#12683720)

If anyone were capable of making a serious go of MT, that would have to be Google.
An interesting story, but please, for the love of all that's balanced and objective; tell me again how that smudge on your nose really is chocolate.

Re:Piffle (1)

Heisenbug (122836) | more than 9 years ago | (#12683880)

Piffle yourself. They have 100,000 servers to throw at statistical analysis, they have enough cash floating around to offer sign-on bonuses that even Microsoft can't beat, they have a history of applying PhDs to practical problems, and they have obvious business interests in making machine translation more useful. Google-worship aside, they're certainly a top contender in my book.

Of course, I don't know anything about this specific field, and that article sure was pretty fluffy. I'd be interested in more informed analysiseses ...

Re:Piffle (1)

aicrules (819392) | more than 9 years ago | (#12684014)

While somewhat opinionated, I would say that's more of an observation on Google's recent ( 3 years) slew of application feats. As a whole, they have released and announced more major application efforts than any other company. I believe the observation is based on this, rather than it being just an editorial comment.

Altavista Babelfish (4, Funny)

yotto (590067) | more than 9 years ago | (#12683721)

When questioned on the matter, Altavista's Babelfish translator gave this quote:

Google does not have anything on my amazing abilities of the translation!

if anyone... (5, Interesting)

rdc_uk (792215) | more than 9 years ago | (#12683724)

Actually, my bet for most likely to make a real go of machine translation would be...

IBM

Look how far they ran with chess programs, because they felt like it...

If they decided to go the same distance with translation...

Re:if anyone... (1)

nfk (570056) | more than 9 years ago | (#12683905)

They could beat Kasparov and get his expletive reaction on the spot.

Re:if anyone... (2, Funny)

LiquidCoooled (634315) | more than 9 years ago | (#12683916)

They won't have any money left to fritter on useless projects after SCO beats them ;)

Re:if anyone... (2, Funny)

rbarreira (836272) | more than 9 years ago | (#12684075)

I believe your thoughts are upside down...

Re:if anyone... (3, Informative)

digidave (259925) | more than 9 years ago | (#12684039)

Yeah right. Not while they're trying to convince customers to buy their current generation of crap translators. I got sucked into an IBM conference two years ago where they tried to convince me that their Websphere translator was "near perfect" and that it was ready to be deployed on web sites wanting to offer content in multiple languages. They even went so far as to bring in supposed unbiased happy customers who testified that the Websphere translator was as good as human translators.

In the conference was mostly IBM platinum partners (development firms who specialize in IBM "solutions" and make IBM enough money to be called platinum partners) and they seemed to buy into it. Of course, platinum partners tend to believe everything IBM tells them.

pl0s 5, Troll) (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#12683726)

One or 7he other

Bubla *Cick BAle (1)

kristopher (723047) | more than 9 years ago | (#12683732)

Bubla *Cick BAle Walkie *Hotka BaCa Sopika *luek Gack *Zoek Pael Quazic Translate that google!

Re:Bubla *Cick BAle (1)

wootest (694923) | more than 9 years ago | (#12683889)

Hey! My mother was a saint!

Re:Bubla *Cick BAle (1)

CyberKnet (184349) | more than 9 years ago | (#12683926)

That's the idea. Given enough examples this gibberish and it's counterpart in english, eventually the system could start to 'translate' it. Personally, I'd like to see this used for the reverse. Feed enough random input and english texts for their 'counterparts' and use the service to create a new language.

Re:Bubla *Cick BAle (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#12684050)

"It's a trap!"

Feh. That was easy...

Only works for translating speeches (4, Insightful)

Shotgun (30919) | more than 9 years ago | (#12683753)

If your blog sounds like a politician giving a speech at the UN, this service will do a wonderful job. Doubtful that it will do any better that Babelfish otherwise.

The biggest problem in artificial intelligence is that the system learns the material that it is trained to, and only that material. Computers don't generalize or extrapolate the known into the unknown worth a damn.

Re:Only works for translating speeches (0)

hayh (706697) | more than 9 years ago | (#12683811)

It would never work for a lot of /. posts, because it would assume that the original text's speeling and grammer are correct ;)

Re:Only works for translating speeches (2, Funny)

Dystopian Rebel (714995) | more than 9 years ago | (#12683955)

And if the peeps chin-wagging at Kofi Annan's gig don't interpret 733T 5P3AK, you're in the saddle!*

*Up the river without a paddle.

Re:Only works for translating speeches (1)

atomm1024 (570507) | more than 9 years ago | (#12684031)

"733T 5P3AK"
"TEET SPEAK" ?

Teat speak?
Talking like a boob? :)

Wait, why? (4, Interesting)

Ieshan (409693) | more than 9 years ago | (#12684071)

"Computers don't generalize or extrapolate the known into the unknown worth a damn."

Fortunately, that's not all that google has to go on. Google has 8 billion webpages, in many different languages, most of which are written by non-speechwriters. Not only can they analyze words based on translated context, but they can analyze words based on intra-language context, to form associations between words and meanings.

The real trick is getting down two important linguistic concepts: "Sandhi Rules" (for instance, the use of "an" before a vowel and "a" before a consonant, which are totally regular but more complicated than a word-to-word matchup), and the "degree" or "quality" of words, which indicate the type of adjective most appropriate in any given context.

For instance, "erudite", "learned", "educated", "knowledgeable", "skilled", and "cunning" could all be related words, but many of them have positive or negative assocations which may only really be conveyed by understanding the meaning, irony, or sarcasm of a particular phrase.

For instance, "John has been skilled in writing beautiful code for most of his adult life" is quite different from "John has been educated in writing beautiful code for most of his adult life", or "John has been erudite...". The first one is probably right if John has had a natural inclination to doing it properly, the second if he has undergone some training (though we don't know the actual state of his ability), the third (though the word doesn't even really make sense here) if he has been arrogant about his ability, shouting RTFM! every time someone asked him a question.

Good online translators for other languages (2, Insightful)

metlin (258108) | more than 9 years ago | (#12683757)

While Google's existing translator and Altavista's Babelfish are good, they do not help in the translation of several other languages.

That would be a really good benefit - for instance, I wanted something translated to and fro from Svensk (Swedish), but I really couldn't find any translation service that did.

Good translation of the more common languages would be nice, but simple translations, even - of a variety of languages would be really useful.

Re:Good online translators for other languages (1)

TyrelHaveman (159881) | more than 9 years ago | (#12684005)

Jag talar inte svenskt, men I grundar detta för att översätta för mig.
http://www.systranet.com/ [systranet.com]
You have to sign up after like 5 translations, but it IS free to do so. It'll do to/from/between Swedish, Arabic, French, Greek, Spanish, Portuguese, Italian, German, Dutch, Russian, Korean, Japanese, and Chinese (Simplified and Traditional).

Yeah for foreign spam! (2, Funny)

Anonymous Coward | more than 9 years ago | (#12683758)

At last I can translate all those non-English spam emails I get! There'll be no more missed opportunities to buy chinese viagra, woohoo.

Re:Yeah for foreign spam! (1)

fuzzybunny (112938) | more than 9 years ago | (#12683921)

This is the best one I have ever received. For you German speakers out there. And note the footer and b1ffsteriffi/
Date: Mon, 30 May 2005 06:44:20 -0700 (PDT)
From: harris peters
To: sassisch@yahoo.com
Subject: Grüße

HALLO LIEB, WEISS ich, DASS DIESER BUCHSTABE ZU IHNEN, DA eine ÜBERRASCHUNG,
aber, sich nicht SORGEN, alle KOMMEN MAG IST GUT. Ich BIN Herr HARRIS
PETERS, GESCHÄFTSSTELLENLEITER FINANZIELLEN VERTRAUENSCBankPlc, der IM
MAURITIUS GELEGEN Ist. VOR EINIGEN JAHREN, KAM Ein MANN, der Herrn SHAW
SMITH GENANNT wurde, den, Who AUS IHREM LAND, GENAU VON IHREM TEIL IST, ZU
MEINEM LAND (MAURITIUS) IM GUMMI SECTOR.UNFORTUNATELY ZU INVESTIEREN, ER
STARB IN EINEM SELBSTCAbbruch. Herr SHAW SMITH GESTORBEN, DIE SUMME DER
DOLLAR 15MILLION US IN MEINER BANK LASSEND. Ich ERBITTE HIERMIT IHRE
UNTERSTÜTZUNG ZU HELFEN, das GELD ZU BEHAUPTEN. Ich WERDE SIE BENÖTIGEN,
ALS Der VETTER SPÄTEN SHAW SMITH ZU DIENEN, WEIL IM AUGENBLICK, ER KEIN
FOLGENDES Der STÄMME HAT, DAMIT Das GELD AUF GEBRACHT WERDEN Kann. WENN SIE
RECIEVE DAS GELD, SIE 40% NEHMEN, DAS ÜBER DOLLAR 6MILLION WIE IHR ANTEIL
IST UND SIE GEBEN MIR DAS ANDERE 60%. Die REGIERUNG PLANT, Das GELD ZU
ÜBERNEHMEN, WENN KEINS OBEN DARSTELLT, DA SEIN FOLGENDES VON KIN.I
ÜBERPRÜFT, Daß ALLES UNTER STEUERUNG IST, DA Ich Die NIEDERLASSUNG
MANAGER.SO BIN, das SIE NICHTS HABEN, Sich ABOUT.ALL ZU SORGEN, SIE TUN
MÜSSEN SOLLEN MIR ANTWORTEN, WENN SIE INTERESSIERT SIND, ALSO WIR Die
NESSECARY-, DOKUMENTE FÜR Die ÜBERTRAGUNG ZU VERARBEITEN BEGINNEN KÖNNEN.
GESCHÄFTSSTELLENLEITER DES DANKES HARRIS PETERS F.T.B
harris_peters@yahoo.com
__________________ ________
Cashette stops spam. 100% effective and free! Go to http://www.cashette-inc.com/ [cashette-inc.com]

Pre-emptive strike (3, Funny)

eno2001 (527078) | more than 9 years ago | (#12683764)

Since it's become "hip" to bash Google these days and support either MSN's search technology or Yahoo, I'm making a pre-emptive strike for the IT fashionistas:

"Duh!!! The best machine translator in the world already exists and there can be no improving upon it! Babblefish (thank you Altavista) has been doing this for well nigh a decade. All you Johnny-come-latelys are probably going to rave on with fanboy adoration of Google (the company that can do no wrong)!!! To top it all off, you lot apparently know nothing about Microsoft's language transtlation project which is slated to be deployed as part of Longhorny in 2010. Online language translation from Google will fail because Microsoft will have it built into the OS itself. Why send your document online for translation when the OS itself will not only translate it, but it will correct the grammar, punctuation and generate a WMA file in one of ten thousand gorgeously rendered synthetic voices. Google has lost. Google as been trolled. Google will have a nice day".

We now return you to your regularly scheduled pos[tt]en.

Re:Pre-emptive strike (1)

ConceptJunkie (24823) | more than 9 years ago | (#12683895)

and generate a WMA file in one of ten thousand gorgeously rendered synthetic voices

They might now have 10000 synthetic voices, but I bet they still all sound like GORF.

Old news... (4, Funny)

jasonmicron (807603) | more than 9 years ago | (#12683774)

There is already a tranzilator [gizoogle.com]

T.Q. (4, Insightful)

moviepig.com (745183) | more than 9 years ago | (#12683807)

The system has been trained using the United Nations Documents as a corpus.

Seems one could devise a TQ (tranlsation quotient) measuring the effectiveness of machine (or human) translators. Take any standard reading-comprehension test, a send its text material through the translator, and back ...and then compare the scores of subjects taking the resulting test vs. those taking the original.

(Before such translators make their way into, say, diplomatic circles, I'd sure hope there's some objective demonstration of near-infallibility...)

oh no! (5, Interesting)

danharan (714822) | more than 9 years ago | (#12683814)

I don't ever expect such translation to work perfectly, but taking existing phrases should lead to useful first drafts.

This will mean one less possible career for me, and fewer babelfish induced laugther moments.

As a fluently bilingual person, I often recognize expressions that were translated in Canadian government documents. "Anglicisme" is the word the french have for it.

There's subtlety to languages we may forever lose. Take for example:

"Je donne ma langue au chat" - "I give up (answering a riddle) instead of the more picturesque "I give my language to the cat". Well, that should be tongue, but hey, it's just babelfish!

"Bullshit" won't produce "merde de taureau". That is a strange expression you anglos have, don't you realize?

"Il pleut comme vache qui pisse" will give us "it's pouring cats and dogs" rather than "it's pouring like cows' a'pissin". The french also have never heard of cats and dogs falling from the sky.

While an improved Babelfish may improve our mutual comprehension, please pause for a moment to consider all the linguistic hilarity we'll forever lose.

Re:oh no! (4, Funny)

fuzzybunny (112938) | more than 9 years ago | (#12683897)

While an improved Babelfish may improve our mutual comprehension, please pause for a moment to consider all the linguistic hilarity we'll forever lose.

Yeah, like me going to work for Bull [bull.com] in 1997, and searching for "comment dit-on, le, fuck, le chose sur lequel on tappe, thingy qui connecte a l'ordinateur, ah yeah, le clavier". French Bull dude: "ah, le keyboard."

Hilarity indeed.

Re:oh no! (1)

bhima (46039) | more than 9 years ago | (#12683937)

Most of the work I do is in both German & English and you're right "the linguistic hilarity" is delicious! Particularly when you include regional dialects rather than just "proper grammar".

Re:oh no! (1)

benjcurry (754899) | more than 9 years ago | (#12683991)

I'm bilingual as well (Egnlish/Spanish), and I certainly enjoy being able to speak both. However, bemoaning the potential consolidation of languages is a bit of a useless battle, as the internet has already dug the grave for wide variety of colorful sayings and phrases in languages all over the world. This is the way the evolution of language has always happened, it's just happening more quickly in the information age. As one brand of "linguistic hilarity" dies, the nature of human beings will only birth another to take its place.

What next? (1)

chrisnewbie (708349) | more than 9 years ago | (#12683816)

I predict we'll see google developping the Universal translator pin.
then the warp drive,,then teleporter and why not everlasting youth?

Oh yeah!

3 cents... (1)

BipinG (860191) | more than 9 years ago | (#12683830)

Most of the time you don't know in what language the text is written in. When you get a alian looking content..... most of the time, you don't know the best way to make sense out of the shit! they should have something that detects (pattern matching etc.....) the language in which the context is written in!

20 Billion? (1)

Bananatree3 (872975) | more than 9 years ago | (#12683839)

That should be 200 billion words according to the article [outer-court.com]

All your base (1)

1967mustangman (883255) | more than 9 years ago | (#12683843)

So how do you think it will handle all your base are belong to us? Seriously thought it will be interesting to see how well they can make it work. My expereince so far with translators has been dreadful

How about Google Calendar? (1)

blankoboy (719577) | more than 9 years ago | (#12683858)

When are we going to see calendaring functionality with Gmail? You know it's in the works in Google labs...come on Google! ;)

Time to move the AI bar (3, Interesting)

TopSpin (753) | more than 9 years ago | (#12683878)

First, this is outstanding; Google, unsatisfied with traditional machine translation techniques, pioneers their own design. I'm certain their advertisers will be pleased to have their adds auto-translated to whatever language is necessary.

Second, I think we'll witness a case of having the AI ante upped once again when another traditional AI challenge is met. Wikipedia puts this best; When viewed with a moderate dose of cynicism, AI can be viewed as 'the set of computer science problems without good solutions at this point.' Once a sub-discipline results in useful work, it is carved out of artificial intelligence and given its own name.

Other uses... (1)

HaydnH (877214) | more than 9 years ago | (#12683891)

This sounds very interesting... imagine the possibilities for localization of applications - I'm sure a simple script could be created to extract strings from source, parse them through the translator and substitute them in your chosen language, this could save a LOT of time!!!

I can't wait for a Welsh version of firefox =P

IMHO it's too early for that (1)

trandism (835011) | more than 9 years ago | (#12683893)

Good luck to them, but I doubt that they are gonna make it.

OK, make in 10 years or sth

I'm into natural language processing myself and it seems to me that it's very difficult to build a system that works globally on all kinds of input.

They'll have to LISP it to death!

Anyway my $0.02

Machine Translation may never get there.. (1)

acomj (20611) | more than 9 years ago | (#12683911)

A relative worked in an "internationalization" department, creating software/manuals in many langugages.

In order for machine translation to be as good as human translation, you fist need to determine what the sentance "means". Often times you need to track previous sentances to determine meaning of things like the word "it". Human languague is not very detailed and relies on common knowledge experences to infer meaning.

Its very hard. Some langauges are easier than others for this stuff. German/french/spanish all change the gender of the word "the" based on the noun and give clues about how its used in a sentence. This can help a little.

For many web pages this approach may give an understandable translation, but for literary references and books (manuals etc) machine assisted translation is now the norm.

even using AI determining meaning is very difficult. google semantic processing for companies trying. One is CYC, a stanford spin off.

http://www.cyc.com/ [cyc.com]

Lovely translation source... (5, Funny)

isa-kuruption (317695) | more than 9 years ago | (#12683913)

So when you go to translate.google.com and translate something, the result will be legal-eze in the resulting languages.

Spanish: "Que pasa?"
English translation: "With regards to the current situation, how is the day progressing?"

how do they know? (1)

blue_adept (40915) | more than 9 years ago | (#12683930)

FTA:
researchers working on this enabled the system to translate from Chinese to English without any researcher being able to speak Chinese

Hmmm.. and they that it works because...?? ;)

DVD's subtitle tracks (3, Funny)

Jotham (89116) | more than 9 years ago | (#12683932)

DVD subtitle tracks would be another good addition to help pick up slang too (most have an english track along with a couple others depending on the region)... all time-synced and easy to match up...

(I'm guessing that it'd fall under fair use and google wouldn't have to struggle to get the movie studios approval, (even though such tech would benefit the studios too))

Language choices (0)

Anonymous Coward | more than 9 years ago | (#12683953)

But can it translate Pig Latin, Bork Bork Bork!, and Klingon?

Starting Wars ! (4, Funny)

justanyone (308934) | more than 9 years ago | (#12683959)


In 'Hitchhiker's Guide to the Galaxy' (the 'trilogy' of books, not the recent movie), it's mentioned that the babelfish has effectively started many, many wars. The reasons seem to be that any being can be rude to any other being without a serious set of translations that explain exactly what the rude terms mean and how they should be regarded.

I'm highly concerned for this warmongering that Google has undertaken.

Reference Here: http://www.bbc.co.uk/cult/hitchhikers/guide/belgiu m.shtml [bbc.co.uk]

Picture this: I write a blog entry with either bad punctuation or erroneous content. Under the old system (pre-Goolge translation), I would receive several flames about my idiocy. With Google translations:

* People around the world will be confused and angered about my punctuation;
* Vastly larger numbers of people will complain about my erroneous content;
* Other people will step up to my defense and a massive flame war will ensue;
* Idiots eveywhere (who speak other languages) will echo my idiocy by believing the erroneous content I posted;
* The signal to noise ratio of the net will rise markedly;
* I will still be unsure of whether to count on my fingers starting with my thumb or forefinger depending on which European country I'm in.

I believe this pro-war, anti-peace, conflict-ridden idea of making everyone THINK they understand each other is ripe for critism. God made everyone else speak funny, I think it should stay that way! Only right thinking people speak my language anyway, and everyone else should just shut up and sit down!

(WARNING: above post contains carcinogenic levels of sarcasm, fasciousness, satire, irony, and adjectives. Please unplug brainstem and wipe with a clean, damp cloth before continuing.)

But it's evolutionary! (0)

Anonymous Coward | more than 9 years ago | (#12684052)

People using a translator who don't take the time to familiarize with grammatical-lexical quirkies of mechanical translation and 'take offense' should be rounded up along with all those people who are so fond of taking offense on behalf of others who might be offended. Grind up for shrimp feed.

Imagine a world in which everyone stopped to consider environment, context, and cultural POV when engaged in conversation with others.

But, evolution won't let this happen. It favors numbers, rapid breeding, and in the case of humans, the hive-nest-swarm-colony-'what have you' of group focus on simplistic solutions serving fulfilment of immediate desire.

Translate that GOOGLE!

to translate: (1)

dep01 (730107) | more than 9 years ago | (#12683986)

That happens being, the Google has an updated technology and it goes, it will make a method it is a first in them,! It congratulates in them. To them being company percentage chance to this!!

Two thoughts (0)

duffbeer703 (177751) | more than 9 years ago | (#12683993)

- If they use UN documents as a guide, the Google MT engine will be excellent at translating bureaucratese between languages. I'm not sure if that's a good thing!

- Its obvious that the US Gov't is dumping money into Google -- I often wonder if Google is a front for some US gov't agency.

hype (1)

Lazy Jones (8403) | more than 9 years ago | (#12684030)

If anyone were capable of making a serious go of MT, that would have to be Google.

Oh, come on. I (still) like Google, but that's a bit silly, no?

Middle East Media (0)

rlp (11898) | more than 9 years ago | (#12684054)

MEMRI (memri.org) does a nice job of translating articles, essays, and even video from various media in the Middle East.

yeah, but can it translate this? (4, Funny)

nullset (39850) | more than 9 years ago | (#12684056)

Wenn ist das Nunstruck git und Slotermeyer? Ja!... Beiherhund das Oder die Flipperwaldt gersput. be careful! If you translate this you may end up dead.....

opera (0)

Anonymous Coward | more than 9 years ago | (#12684066)

Yes, but can it translate German or Italian opera to english and still have it rhyme? :-)
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?