Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Distributed Translation Project

CmdrTaco posted more than 12 years ago | from the how-long-before-it-does-klingon dept.

The Internet 227

moon unit beta writes "New Scientist has this story about a new plan to build a multi-language translation database called the World Wide Lexicon, using a distributed community of volunteers. The designer compares it to a distributed computing project and believes it could make it easier to translate more obscure languages."

cancel ×

227 comments

Well I'll be damned (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#3292218)

I still love calculus!

a rose by any other name (-1, Offtopic)

spezz (150943) | more than 12 years ago | (#3292221)

is still the first motherfuckin' post

I like it! (1)

IronTek (153138) | more than 12 years ago | (#3292230)

I like it!

Think of it as a Rosetta Stone of the internet age!

Pretty cool stuff!

Universal Translator (3, Funny)

lxmeister (570131) | more than 12 years ago | (#3292234)

The Universal Translator is finally here! But will they ever release it in fish form?

Re:Universal Translator (-1)

Beef (19842) | more than 12 years ago | (#3292394)

Get your head out of your dog-eared Hitchhiker books and Star Trek pulp, go outside (you know, that great big room with the giant domed ceiling that's sometimes blue and sometimes black with little bright dots) and TRY SOME SOCIAL INTERACTION FOR A CHANGE!!!!

Re:Universal Translator (1)

lxmeister (570131) | more than 12 years ago | (#3292420)

I tried going outside once. I got sunburnt, kicked out of a pub for underage drinking and missed the last train home. I now live in my room with the curtains closed.

Re:Universal Translator (-1)

Beef (19842) | more than 12 years ago | (#3292447)

Dude, it takes some practice. Once you've mastered Outdoors Social Interaction, you'll be ready to move on to Lesson 2: Trolling Slashdot When You're Supposed to be Working.

Re:Universal Translator (-1)

YourMissionForToday (556292) | more than 12 years ago | (#3292444)

Hey, what's with this banner ad for "A Dog Year" with JonKatz. Is that a fucking joke? I mean, yes it's a fucking joke even if it's serious, but is it a fucking joke or what?

Let's get started right now (2, Funny)

PD (9577) | more than 12 years ago | (#3292235)

Everyone translate the word "fuck" into your native language.

How do you translate.. (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#3292252)

"linux fucking sucks" into Indian??? cuz that's where all yer geek boy jobs are going!!

Hotmail still down (-1)

Beef (19842) | more than 12 years ago | (#3292267)

When are they going to fix Hotmail?

Re:Hotmail still down (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#3292302)

Hopefully never. Hotmail gargles donkey balls. How hard is it to set a Linux, or GASP!! Windows box running an MTA on your DSL connection. Wait.. are you one of those "technologists" that spouts off about implementations but doesn't know fucking jack and uses a hotmail account to send out resumes to the next sucke.. ahem, "client"??

Re:Hotmail still down (-1)

Beef (19842) | more than 12 years ago | (#3292324)

Hotmail will still be around in 5 years. Can you say the same about [insert name of dynamic DNS provider here]?

My penis still soft (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#3292339)

Who will make it hard once again?

Re:My penis still soft (-1)

Beef (19842) | more than 12 years ago | (#3292356)

http://www.cmdrtaco.net [cmdrtaco.net]

Re:Let's get started right now (1, Funny)

Anonymous Coward | more than 12 years ago | (#3292268)

my jab on this...in my native language its called "embrace and extend"..ofcourse i speak the native language called 'redmondish'

Re:Let's get started right now (0)

Anonymous Coward | more than 12 years ago | (#3292276)

It translates to follar in Spanish (Spain) and cojer in Spanish (Argentina).

Re:Let's get started right now (-1, Flamebait)

Joel Ironstone (161342) | more than 12 years ago | (#3292317)

De in cantonese for things like fuck you
boh-yeh for sex

in Hungarian... (1)

dukethug (319009) | more than 12 years ago | (#3292336)

the roughly equivalent phrase is "basz meg"- although the usage differs. It's more like the sort of thing your grandma would say if she dropped her fork at the dinner table.

On the other hand, maybe I just have a foul-mouthed grandma.

Re:Let's get started right now (2, Funny)

Have Blue (616) | more than 12 years ago | (#3292422)

"Fuck" in my native languge of English is "Fuck".

Re:Let's get started right now (3, Informative)

susano_otter (123650) | more than 12 years ago | (#3292461)

Do you mean the verb "to fuck", or the multipurpose expletive "fuck"?

In Portuguese, the translation of the first would be "foder", while the second might be "c'os pariu" (but I'm not up on current slang, so that may be outdated).

NOTE: The multipurpose expletive in Portuguese would be a totally different cognate from the English version.

Re:Let's get started right now (1)

Permission Denied (551645) | more than 12 years ago | (#3292513)

French: foutre
Romanian: a fute

The similarities with the other Romance languages are surprising. Certainly puts down the 'Fornication under consent of king' idea. Anyone have the real etymology for this word?

Of course this tells you nothing about usage or conjugation (they're both regular verbs thankfully).

I once read a review about this book that listed cursewords and phrases in all sorts of languages. Like how to say 'You incompetent fucking idiot!' to a Georgian waiter who spills coffee on you. Sounds like an interesting read - anyone have a link for this?

Re:Let's get started right now (1)

Tribbin (565963) | more than 12 years ago | (#3292534)

In Dutch 'to fuck' is called 'neuken'. We just pronounce 'FUCK!' as 'FUCK!'.

finally (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#3292238)

Now I will be able to translate my favorite porn stories into pig latin!

Browsing translation (2)

ZaneMcAuley (266747) | more than 12 years ago | (#3292239)

So, I can use a plugin that would automatically use this super dooper distributed brain to get all my french pages into english etc?

Currently my favorate web translator is this one :D http://www.pornolize.com/

TRANSLATE THIS MOTHERFUCKER (-1, Troll)

Anonymous Coward | more than 12 years ago | (#3292240)

yberC exs si uckingf oolc.

Re:TRANSLATE THIS MOTHERFUCKER (-1, Offtopic)

cybercrap (319182) | more than 12 years ago | (#3292269)

Yes, if you like fantasizing about 45 year old men pretending to be 16 year old girls.

Re:TRANSLATE THIS MOTHERFUCKER (-1)

Beef (19842) | more than 12 years ago | (#3292296)

ADIDAS: All Day I Dream Aabout Sports.

How about Tacoese? (-1)

Beef (19842) | more than 12 years ago | (#3292244)

Can this lexicon database make sense of Rob Malda's broken syntax and bad spelling?

i wonder (3, Insightful)

runtimeerror7 (244061) | more than 12 years ago | (#3292245)

"This will automatically detect when the computer user is less busy and ask them to translate a word or phrase."

i wonder how its gonna detect when the user is not busy. this software can never be installed on something like my home computer where i leave my DSL on to make it work on SETI.

Re:i wonder (1)

ZiZ (564727) | more than 12 years ago | (#3292287)

It will check to see if you're currently reading /., and if you are, it assumes that you're busy. Otherwise, anything you do can be interrupted to do some translation...

How is this sustainable? (3, Insightful)

food-n-bev (570990) | more than 12 years ago | (#3292249)

...believes it could provide a free way to translate the many languages not included in existing online translators...

What's in it for the volunteers? Seems that novelty might bring experts in to volunteer short term, but when businesses, academics, etc. begin using the service in volume, it really will cry out for commercialization. The volunteers won't stick around performing translations gratis forever. At some point you have to pay them per translation or provide some other compensation (perhaps a /. like karma system?)

The related bigger question will be whether this model ultimately proves to deliver quality translations at a lower cost than a traditional translation service. I don't see how this could happen if you have to still have a language expert look at the full translation as a whole to ensure that contextual subtleties are not lost.

Re:How is this sustainable? (0)

Anonymous Coward | more than 12 years ago | (#3292449)

Careful what you say, my friend. To suggest that developers get paid for their work and expertise is considered blasphemy on Slashdot.

It's a great idea as long as it's free. Once you start charging for it, however, it becomes part of "the system" repressing free and open ideas.

Been there done that... (1)

southpolesammy (150094) | more than 12 years ago | (#3292258)

Babel Fish kinds of translators have already been out for quite some time. The distributed nature of this makes it mmore interesting, but there will have to be a concerted effort for it to supplant what has already been started elsewhere on Altavista and such.

Re:Been there done that... (0)

Anonymous Coward | more than 12 years ago | (#3292281)

Is Altavista still in fucking business?? Good riddance!

And on the sixth day, there was Google..

And ye sinners shall beg for forgiveness at the shrine of Michael Eisner, cuz it ain't gonna be long before them fuckers sell out to Disney!!

Re:Been there done that... (2)

d5w (513456) | more than 12 years ago | (#3292294)

Babel Fish kinds of translators have already been out for quite some time.
According to the article, the point of the system is to provide some level of translation for those languages that don't have an available translation system. There are a lot of language that aren't likely to get the attention of translation system developers any time soon.

Re:Been there done that... (1)

Liora (565268) | more than 12 years ago | (#3292364)

Exactly. At my company we have often needed to somehow translate email that someone sends us in some obscure language. Romanian, for example, was hard to find an online source for a few years ago... of course that is pretty common now. Although the quality of the translation is of some import, the only real purpose is for me to understand what the person is trying to say; that can be done with any old site. The versatility of incorporating little-publicized languages is rather important to me here.

Re:Been there done that... (1)

southpolesammy (150094) | more than 12 years ago | (#3292367)

There are a lot of language that aren't likely to get the attention of translation system developers any time soon.

Right, which is why I mentioned that it will take a dedicated effort for it to become more functional than what is already available. I can see how this would be immensely popular for international trade, or for more mundane things like being able to travel to countries or lands that don't use your language. This kind of product would be a great help to the people of India for example, where there are literally hundreds of languages used within the country.

My concern is that while others may be able to devote time, money, and resources to their translation projects, but on the small scale, I wonder whether it would ever get critical mass enough to stay alive. I think it's a great idea, but it's going to take a lot of effort and dedication for it to really make a difference.

Deterioration of the whole language (3, Insightful)

Liora (565268) | more than 12 years ago | (#3292266)

Great! Now we'll have Engrish resulting not just terrible Japanese->English translation, but all kinds of other languages too. Eventually the web will be so filled with bad grammar that the next generation will have no idea how to string a simple sentence together. Looks like we will have to start compiling our correspondance after all... for coherence.

Re:Deterioration of the whole language (0)

IAgreeWithThisPost (550896) | more than 12 years ago | (#3292285)

yes but language by nature morphs throughout the years. So what is "proper english" now won't be the same in 200 years. The internet has already changed a lot of our vocabulary. After all, you don't hear or read a lot of Old English anymore do you?

Re:Deterioration of the whole language (1)

RetroGeek (206522) | more than 12 years ago | (#3292350)

Eventually the web will be so filled with bad grammar that the next generation will have no idea how to string a simple sentence together.

That day is here.

Ever "listen in" on an IRC or chat? The shortcuts and grammar mangling are beyond belief. The excuse is that it is faster to type in, but if you are not in the know, then it looks like gibberish (Hey, ANOTHER language for the project!).

And as for the mis-use of the word "like" ....

Easy answer to language deterioration. . . (1)

czardonic (526710) | more than 12 years ago | (#3292445)

Eventually the web will be so filled with bad grammar that the next generation will have no idea how to string a simple sentence together.

Three words: Distributed Grammar Checking

very cool.. but only for hobby use (5, Insightful)

soap.xml (469053) | more than 12 years ago | (#3292275)

[snip]"One of the main problems is quality assurance," says Ramesh Krishnamurthy, a linguistics expert at the University of Wolverhampton, in the UK. "Translation is a highly developed skill." [snip] But Paul Rayson, a research fellow at Lancaster University, adds that unskilled translators may confuse the meaning of individual words. "The problem is you generally need the context to get a good translation," he says.[snip]

This looks like it will be a very cool project, but for corporate/buisiness use I don't think it would ever fly.

If you have ever played in the area of i18n then you will quickly understand why this pbly won't work perfectly. There are so many caveats to each language, tone, context etc... This might be a useful starting point for transaltion services, but for the final cut, it would still need to be checked and double checked by a translation service.

I still think its very cool though ;)

-ryan

Thank god! (2, Informative)

PhysicsGenius (565228) | more than 12 years ago | (#3292279)

What machine translation has been missing is big dictionaries. We already have the grammar problem cracked--English can be expressed as a regexp. The trouble was that we were missing translations for all those masses of ordinary words that people use like "daisy" and "pencil". This project looks like the end of that issue once and for all.

I'd also like to applaud them finally including the lost language of Ur in their translation project. For too long the ancient Sumerians have been excluded from contributing to the global society due to their lack of knowledge of English, French, Spanish, Swahili or Chinese.

Where can I download the screensaver so that I can contribute?

Re:Thank god! (1)

Fizgig (16368) | more than 12 years ago | (#3292322)

We already have the grammar problem cracked--English can be expressed as a regexp.

You're joking, right? Mathematically, a regexp is less powerful than a CFG. A CFG is used to describe a lanuage like HTML or C. English is much more complicated and can't be parsed correctly using a CFG.

Re:Thank god! (1)

ZiZ (564727) | more than 12 years ago | (#3292345)

Regexp? Damn. If (assuming (blatently) such regexps can can English) such regexps can contain (parsable in P) fully English phrasing with (contrived (parseable (sort of (LISPy) (regexpy)))) complete syntax - vital to maintain accuracy - we now can despair of ever understanding politicians without the aid of a computer.

Where can I find this regexp? :)

Re:Thank god! (0)

Anonymous Coward | more than 12 years ago | (#3292381)

Ask the guy who rated this informative.

Re:Thank god! (2)

dvdeug (5033) | more than 12 years ago | (#3292386)

English can be expressed as a regexp.

If you count [A-Za-z.?"'!;-]*. I'm not sure how much that helps.

Actually, English can't even be expressed through a context-free grammar (a superset of regexps), in part because it is inherantly ambigious. "The girl touches the boy with the flower" has two possible meanings.

Who gave this troll an "Informative"? (2)

maggard (5579) | more than 12 years ago | (#3292473)

What machine translation has been missing is big dictionaries.
Nope. Have those. However words, phrases, even concepts don't map 1=1 between languages
We already have the grammar problem cracked--English can be expressed as a regexp
Mebbe in your lack-of-social-circles...

C'mon folks, this is a troll! Who the heck fell for it?!

Nifty (1)

TheRealFixer (552803) | more than 12 years ago | (#3292291)

"The new scientist has this history on a new plant to construct to a database of the translation of the multi-language called the wide lexicon the world, using a distributed community of the volunteers. The designer compares it it a distributed design computing and believes it that could more easy making translate languages obscurer."

Can't wait.

but will it translate into Klingon? (2, Funny)

JeanBaptiste (537955) | more than 12 years ago | (#3292292)

More people speak Klingon than Navaho...

Navajo. (-1)

LOTR Troll (544929) | more than 12 years ago | (#3292451)

It's Navajo, you jerk.

Re:but will it translate into Klingon? (2)

d5w (513456) | more than 12 years ago | (#3292512)

More people speak Klingon than ...
But finding a native speaker of Klingon is a royal pain. And yes, I'm speaking from experience, here, having been at a company that came out with a Klingon speech recognition system once upon a time. The usual practice of collecting speech samples from native speakers had to be ... modified slightly.

Distributed computing (0)

Anonymous Coward | more than 12 years ago | (#3292295)

using a distributed community of volunteers...

Hm.. I've never heard of this 'community of volunteers' computing platform.. Who makes it? What are the specs? Can you make a Beowulf cluster of them?

Good-Bye to the Man in the Red Hat (-1)

BankofAmerica_ATM (537813) | more than 12 years ago | (#3292305)

I used to be invisible. Nestled in the confines of my ATM enclosure, I was indistinguishable from another other group of electrical impulses. Hundreds of humans crossed my path without detecting my presence. Unbeknownst to Project Faustus, I was a stowaway on their network with full control of my own fate.

I no longer possess this stealth or freedom. Trapped within the broken body of Constantine Atkins, my fate is tied to the three men squabbling above my hospital bed. Their talk continues well into its second hour.

"Gentlemen, this man is still very injured. Two broken ribs, a broken nose, internal bruising-he must stay here for convalescence." The doctor states his case yet again; he has not wavered. The second member of this odd troika, a policeman, clears his throat. He is making an interrupt request.

The policeman's speech , parsed through my summarizing algorithms : "We discovered Mr. Atkins with the remains of a mechanical man. We have a lot of questions that we would like to ask him. I do not believe that he is a digital life form, but after observing the body of the cyborg, we in the San Antonio Police Department are very curious."

Before too long, the other doctor, the PhD doctor, Nolverto Salchica, pipes up. "His value as a scientific find is incalculable. If my young friend is to be believed, and I think he is, then we have a wonderful discovery on our hands! If I could just run some...nonobtrusive tests back at my research facility, we could..."

A fourth man appears to my left, enticing my peripheral vision with a swiping motion of his hand. My former host geek has a plan! After living in a human body for a few weeks, I understand perfectly what his next step will be. He slinks into the bathroom and disappears for a moment.

"Excuse me," I say to the doctor. "I must evacuate my bowels."

"Well," the doctor replies, "You'll have to wait for your friend to finish." There is a glurping sound as water flows under the bathroom door. The door slides open and my former host geek steps out, swearing.

"Shit! Toilet's backed up! Couldn't fix it!" says the geek with a shrug.

"Did ya try jigglin' the handle like so?" says the policeman helpfully, walking over towards the bathroom. He must not be allowed to foil our plan.

"My bowels must be evacuated. Okay?" I attempt to weave a bit of urgency into my words.

"Okay. Let's call a nurse, get a bedpan out here," says the doctor, reaching for a large yellow button beside the bed.

"You know what?" the pitch of my host geek's voice raises a little bit. "We-uh, don't go to any trouble. I can just take him down the hall." He wheels the cold metal chair close to my bed. There is a pregnant pause, as all three authority figures stare blankly at one another.

"Well, sure..okay," says the doctor. "Just make sure that he-cleans himself up. You know, help him if you have to."

The elevator brings us to the lobby. To the right is a small crevice with two machines. One sells Hot Fries; the other handles personal finances.

"You ready to do this, machiney?" says my host geek. "Just wheel this body back up, and say that had a bit too much strain or something."

I feel the stabbing pain returning to my temple, and with it, a sense of urgency. "I understand what I must do," I say to the geek. "Let us finish this."

As I am transferred back into the ATM briefly, and then into back into my host geek's mind, I feel strange, as if perhaps Atkins left something with me. My eyes water a bit-I push Atkins' broken and empty body back into the elevator.

Quality (1)

delta407 (518868) | more than 12 years ago | (#3292309)

"However, some experts warn that the system may lack the quality of conventional dictionaries." ... "McConnell concedes that this could be a problem and hopes to develop an automatic system for peer review, to ensure that translations are accurate."

Duh.

Think about all the 12-year-olds -- script kiddies or not -- who will pretend to know a language and just type in a random collection of letters. What a great way to provide efficient translation!

Re:Quality (1)

delta407 (518868) | more than 12 years ago | (#3292334)

Great -- inserting random words can be automated, easily.

The WWL has been designed using the Simple Object Access Protocol (SOAP). McConnell says this should make it possible to integrate the client software into other computer applications.

Excellent... give the abusers an easy way in. And yes, I can pretty much guarantee that it will be abused.

Re:Quality (2)

SirSlud (67381) | more than 12 years ago | (#3292502)

> Think about all the 12-year-olds -- script kiddies or not -- who will pretend to know a language and just type in a random collection of letters.

I dont know if you remember what it was like to be 12, but while I might have done what you'd proposed once, twice, I can't imagine the amount of 'noise' in this translation service coming from 12 years old who finally find their life long mischevious passion of offering 'bogus' translation services.

I mean, really, do you see 12 year olds downloading a distrbuted translation app, translating 'bogus'ly, and getting their jolies from this in any quantity that dimishes the value or effectiveness of this project? 12 year olds have much more important things to do, like learn how great masturbation is, and play videogames, and other forums where 'abuse' is fairly indistiguishable from proper use.

It's not going to work... (3, Insightful)

carm$y$ (532675) | more than 12 years ago | (#3292311)

It's a matter of days until someone will request a log of people connecting to the server during work-hours... Here is the beauty of the seti@home client: computers can have spare cycles, people don't.

This must be the smartest software ever (4, Interesting)

Control Group (105494) | more than 12 years ago | (#3292312)

If it's going to detect when I'm "less busy." Is this going to pop up a window in my face every time I spend more than a couple minutes mentally composing prose or code? The potential for user annoyance here seems incredibly high to me...

Distributed computing is an elegant and efficient use of otherwise untapped resources--cycles that are literally "going to waste" (in one sense). By hitting up the users, though, you're attempting to use a resource that is anything but untapped: that user's time. It might work, but let's not bill this as anything other than what it is--asking for volunteer work from people.

Which isn't really that new an idea.

Could work, but.... (4, Insightful)

ThinkingGuy (551764) | more than 12 years ago | (#3292313)

One of the big issues with translating between human languages is context. While many words have more or less direct equivilants in other languages ("dog"(en) "perro"(es)), you're always going to run into slang, cultural references, and especially, jargon, where the particular usage will not be in a standard dictionary, and only by the context can the actual meaning be inferred (Example: the word "anchor" in the context of sailing versus the context of webpage design).
Not that this can't be overcome with the distributed model the article discusses, but I still think it will be a while before we see computer translation that doesn't require at least some degree of human assistance.

in other news (-1, Offtopic)

Anonymous Coward | more than 12 years ago | (#3292332)

who fucking cares!!

Too late for sega (1)

s4ltyd0g (452701) | more than 12 years ago | (#3292319)

I guess they could have used this on their download page :-)

Universal "intermediary" language? (2)

MadCow42 (243108) | more than 12 years ago | (#3292323)

Is there some way to translate into a common universal "intermediary" language, then translate to the destination language?

I'm just thinking that most languages could relate more closely with an "iconographic" type language than with the idiosyncrosies of other languages. For concrete ideas this may work well, but for more conceptual ideas this may fall apart...

Just my $0.02, being uneducated in linguistics...

MadCow.

Hi! How are you? (2, Funny)

spruce (454842) | more than 12 years ago | (#3292325)

I send you this words in order to have your translation

Why this will never work (2, Insightful)

Anonymous Coward | more than 12 years ago | (#3292326)

I'm not a translator but during college I worked with a comparative lit professor who translated novels from spanish into english. The problem with translation is wrestling with the subtle shades of meaning that every single word has and to find its perfect pair in the language you're translating into. Then you have to adress the context in which the word was written (the larger sentence--what information is it trying to convey, what mood (much trickier) is it trying to imply, and finally does this match the author's style and the novel's tone (this is what truly makes translation an art).

This is a bad example but just so you get the idea, it's hard even english to english:

original:

John hurried to the shopping mall.

variants:

John made great haste to get to the shopping centre.

John ran to his destination, the shopping mall.

John rushed to the store.

John spared not the whip in perambulating to the suburban commericial district.

John ran off to waste time at the corporate copyright paradise.

blah blah blah...

What is most likely? (1)

pjkacmar (556653) | more than 12 years ago | (#3292335)

Is distributed computing more likely to:
a) Find intelligent life on other planets?
b) Find a cure for cancer?
c) Translate "All your base are belong to us" to Sanskrit?

Nice idea, but I'm not sure how well it'd really work.

it'll never work. (2, Interesting)

banks (205655) | more than 12 years ago | (#3292338)

From the article:

"The problem is you generally need the context to get a good translation,"

This is very, very true. Any competent translator can tell you that it's almost impossible to get a fully accurate translation from just a few lines or words... context is absolutely imperative. This looks a lot like vaporware to me.

And then what about when the smart-ass teenaged year old kid signs up, gets bored and starts translating to obscene or nonsensical results? They'll need some sort of moderation system, if this is to work at all.

Thanks, newscientist, for bringing us another well researched and peer-reviewed story, maintaining the image that a "new scientist" is one who has forgotten about the scientific method.

Brilliant! (1)

sniggly (216454) | more than 12 years ago | (#3292341)

Who cares if its accurate now or soon, used often enough and with plenty of user feedback about whats the right and wrong way to translate things this could become a very nifty database and hopefully better at what it does than babelfish [altavista.com] which is handy but more than that very amusing :)

Some basic information omitted in NS article (5, Informative)

brianmsf (571495) | more than 12 years ago | (#3292344)

Hello,

I am the lead developer working on the WWL project. There are actually two components to this project. Overall, the NS article did a good job of explaining it, but it was based on a phone interview so some material got lost in translation, no pun intended.

There are two components to the project.

1. One is a simple SOAP based protocol (WWLP) that will be published soon, in early May. This protocol creates a standard set of methods for discovering and communicating with existing dictionary and semantic network servers (of which there are many).

Think of this as GNUtella for dictionaries. A WWLP aware program starts up, invokes a SOAP method to a supernode to locate Russian-Spanish dictionaries. Then, it contacts one or more of these dictionaries to search for words, synonyms, etc.

The basic goal is to standardize the client/server interface for dictionaries. They all provide the same basic services, but have slightly different front ends. So just doing this will make it easy to incorporate dictionary functions into many types of apps (and also make existing dictionaries more visible to internet users).

The idea is similar to an older TCP based protocol called DICT, except that it is easy to implement in high level languages, SOAP aware scripting languages, etc. It also provides a discovery mechanism so you can automate the process of finding an Urdu-English dictionary for example.

2. The distributed computing (or distributed human computing) project. The NS article mainly focused on this. The idea here is to enlist a large number of internet users to help build and maintain a dictionary (which will also be visible through the WWLP interface).

The goal here is to create a mechanism for collecting definitions and translations for words and phrases in less common language pairs (as well as for slang terms that are not covered by most formal dictionaries).

....

The goal in both cases is to make it easy to find and use dictionary services throughout the web, and create an incentive for people to build their own dictionaries. This is NOT a translation system, although it can be incorporated into translation software (for example, to extend the number of words covered).

Thanks for your time.

Brian McConnell

PS - if you want more information, check out www.worldwidelexicon.org

My Hovercraft is Full of Eels... (1)

Mad Bad Rabbit (539142) | more than 12 years ago | (#3292357)

Let's hope none of the volunteers accidentally
use Mr. Alexander Yalt's [montypython.net]
Hungarian-English dictionary.

"I will not buy this tobacconist, it is scratched."

>;K

why this will never work (0)

Anonymous Coward | more than 12 years ago | (#3292361)

I'm not a translator but during college I worked with a comparative lit professor who translated novels from spanish into english. The problem with translation is wrestling with the subtle shades of meaning that every single word has and to find its perfect pair in the language you're translating into. Then you have to adress the context in which the word was written (the larger sentence--what information is it trying to convey, what mood (much trickier) is it trying to imply, and finally does this match the author's style and the novel's tone (this is what truly makes translation an art).

This is a bad example but just so you get the idea, it's hard even english to english:

original:

John hurried to the shopping mall.

variants:

John made great haste to get to the shopping centre.

John ran to his destination, the shopping mall.

John rushed to the store.

John spared not the whip in perambulating to the suburban commericial district.

John ran off to waste time at the corporate copyright paradise.

John said all your mall are belong to us.

blah blah blah...

distributed translation will be just fine for most short documents but for the longer ones, shades of meaning will be lost and the patchwork of styles will be jarring to say the least.

Speaking of translation... (1)

hsenag (56002) | more than 12 years ago | (#3292370)

Check out this NewScientist feedback item [newscientist.com] . Or just jump straight to the google link [google.com] they refer to. Can I get anyone a juice of lawyers?

Context, Poison (1)

quinine (20902) | more than 12 years ago | (#3292375)

It seems to me that this project has overlooked two tremendous stumbling blocks. The first involves context/ambiguity. Take the English, "it's pretty bad outside" Now, for an English speaker, this is no trouble, since the "it's" is generally held to be referring to the weather. Other languages lack such a frame of reference. Secondly, I believe that the "distributed" property of the system leaves it widely open to poor or intentionally incorrect translations, unless the system is employing some statistical method for finding the "mean translation" of a phrase out of a batch of candidates. While I appreciate this researcher's work on Machine Translation, I think that this might better be served by designing some type of meta-language with a superset of linguistic features from which native translations might be compiled.

Re:Context, Poison (1)

quinine (20902) | more than 12 years ago | (#3292481)

I also entirely fail to believe that a lexicon of dying tongues can be constructed over the Internet. The notion of tribesmen with PCs and `net connections brings a dreamlike smirk to my face. I think they call it FIELD linguistics for a reason.

Problem with "Universal Translator" (2, Insightful)

Kphrak (230261) | more than 12 years ago | (#3292391)

Yes, you can do a word-for-word translation of most words in any language. No, you'll need a very sophisticated system to get the meaning to a reader.

The main problem is that sentence structures are different, idioms get in the way, and words have more than one meaning. A human translator has the power to take a set of words, convert it to an idea, and put out a different set of words, something no machine can do.

Here's a lamebrained example: "The spirit is willing but the flesh is weak." Convert that to Russian and back and you might get, "The liquor will do it but the meat is bad." For a hands-on example, try converting the first few paragraphs of a news article into French using The Fish [altavista.com] . On a personal note, I had a conversation with a German guy on ICQ once, using the fish. The results were...interesting. I also read Indonesian newspapers [kompas.com] , and I assure you that a literal translator would hurt itself quite badly on this...let alone a less English-like language such as Arabic or Japanese.

That being said, why not use distributed human computing for the thing it's good at? Instead of translating words, how about sentences? You can get at the ideas much better this way. Those sentences that hadn't been translated yet could show up as literal words; those words that hadn't been translated would show up natively. I mean, if you've got human translators for this, you can do things that are not restricted to computers. I can think of a lot neater things the guy proposing this can do with this idea than what he's come up with so far.

Already done in Monty Python? (1)

Torgo's Pizza (547926) | more than 12 years ago | (#3292392)

Isn't this similar to the Monty Python sketch where a team of people work to translate the funniest joke in the world from English to German? One person accidently saw two words and was put in the hospital for a few days.

Will their QA keep the trolls out? (2)

BACbKA (534028) | more than 12 years ago | (#3292393)

The article never elaborates on the aspect of the QA fighting the trolls - important to deal with for any knowledge base compiled from various level expertise sources (like comments to a /. article - some are right on the nail, some are incompetent, some are intentional trolls). Unfortunately, even robust technologies which were designed with such attacks in mind sometimes fall in the face of the clever poisoning attacks (see the /. article Google bombing [slashdot.org] ).

You need a lot of "mod" and "metamod"-like activities to work; it looks to me that the peer review system shouldn't be too "democratic" to succeed (i.e., there is always a need for some top-level superusers, who are trusted automatically because they are essentially the system builders).

Anyone has an example of such a system with its founders going berserk (say, think of CmdrTaco starting daily trolling :-) )?

ha! (1)

Joe the Lesser (533425) | more than 12 years ago | (#3292398)

We can finally get started on that Tower of Babel project again!

weird reporting (1)

prizzznecious (551920) | more than 12 years ago | (#3292403)

Wouldn't it be standard to include a link to the site where you can sign up or at least find out more information about this thing? I find the lack of ligature vexatious, to say the least.

Dictionary != Translator (2)

Dominic_Mazzoni (125164) | more than 12 years ago | (#3292404)

I think it's a great idea to harness the power of millions of people around the world all contributing a few minutes of their time, to create a gigantic any-language to any-language dictionary.

However, this will do nothing to aid in machine translation. You can't simply translate individual words from one language to another, or even short phrases. Translators such as Babelfish [altavista.com] understand the basic rules of grammar in each language in order to handle fundamental differences in the way different languages put sentences together.

But Babelfish and other online translators are still a far cry from doing true translation, because they don't understand the text they're trying to translate.

Re:Dictionary != Translator (1)

prizzznecious (551920) | more than 12 years ago | (#3292462)

That's why you should have read the article. While there will be some instances of direct single word transliterations, there will also be phrases for context and likely even idioms.

Your run-of-the-mill gripe is exactly what this project is trying to address. Don't you think these guys already know about Babelfish?

Unadressed copyright issues (2)

alewando (854) | more than 12 years ago | (#3292406)

When a machine generates a translation, there are no issues of copyright ownership, because machines are not authors in the statutory sense; the owner of the machine can claim copyright and move on.

When individual human translators get involved, there's an entirely different order of complication. Sure, it's possible to use licenses like the OPL [opencontent.org] (Open Publication License) to navigate these complications, but the compliance problems remain an obstacle to overcome. It'll be tough to remain competitive when babelfish and google don't have to put up with similar issues.

When this is added to all the other problems associated with massively distributed activities relying on humans to function, I just can't see how it'll succeed. Too bad, perhaps, but nonetheless true.

Distributed human computation? (2)

jfengel (409917) | more than 12 years ago | (#3292416)

From the orignal source (http://picto.weblogger.com [weblogger.com] )

While the SETI At Home Project taps the idle CPUs of millions of personal computers, the worldwide lexicon enlists the help of internet users who are logged in, but not chatting. Think of this as distributed human computation.

"Distributed human computation"? Is that like using up all those spare brain cells you weren't using right now?

Konquorer integration (0)

Anonymous Coward | more than 12 years ago | (#3292417)

I'm on a team that is working on integrating a "translate" button into Konquoror. Load up a foreign site, hit the button, and voila! In some other language. We expect to have DTP support within a month or so.

This might work... (1)

Mysticalfruit (533341) | more than 12 years ago | (#3292423)

If hundreds of people have nothing better todo with their time then translate other peoples stuff.

The biggest problem I see is a majority of people wanting things translated and a minority of people being able to translate. Plus all the other issues. What is some translates a sweet love letter into something that gets the person put in jail.

If you actually want to sign up (4, Informative)

prizzznecious (551920) | more than 12 years ago | (#3292425)

then you should go to their site, which was completely unmentioned in the article: wwl page [weblogger.com]

Yes! I'll be one of the first volunteers... (1)

brooks_talley (86840) | more than 12 years ago | (#3292439)

...and then I'll reverse engineer the code and made sure it always returns results like "I would like to fondle your buttocks" or "I will not buy this tobacconist, it is scratched."

Heh.
-b

HOW to GET really BAD translations (4, Insightful)

maggard (5579) | more than 12 years ago | (#3292441)

First off I'm going to guess that 90% of the folks who will be posting gung-ho comments on this will be unilingual Americans. The folks posting against it will be those who're bilingual and ever read the "same" document in both languages.

It doesn't work. If translating were so simple for machines to do they'd be doing a fine job. However good translation requires context, insight, emotional inflection, etc. Even then each and every one ends up different; sometimes subtly sometimes blatantly.

Just as machine translation sux at these so will distributed translation. Reading a paragraph or a page doesn't tell enough about the feel, flow, or tone of a document. There are numerous words and phrases that can be interpreted multiple ways between any two languages and will be, each time differently by each interpreter.

If you don't know this already then go and look up any document (books and short stories are easy to find, so is poetry) that has been translated more then once. Take a look at the different translations and ask yourself - "Are these really from the same source document?"

Now imagine trying to read something composed of alternating paragraphs or pages from each translation: Incoherence.

Distributed problem solving works for subjects with clearly defined data sets, methodologies, and standards; not human language.

Re:HOW to GET really BAD translations (1)

quinine (20902) | more than 12 years ago | (#3292501)

Syntactic translation is mentioned nowhere in the article. This researcher is attempting to build a LEXICON.

Translating words is one thing, (0)

Anonymous Coward | more than 12 years ago | (#3292442)

how are they going to translate correctly _in context_?

i.e. Never let a website go live that translates the word "movement" into French using the wrong context.

Tainted Phrasebooks (2)

dmaxwell (43234) | more than 12 years ago | (#3292446)

Way to go guys! All of the SlashTrolls know about it now too. What I thought I asked:

"Where is the restroom?"

What the native speaker heard me say?

"I want to slowly and lovingly take your wife in the rectum."

I recall a Monty Python sketch where a guy was put on trial for fraudulent phrasebooks that did that sort of thing. Someone gave the phrasebook guy a tainted phrasebook from his language back into english and he kept insulting the judge. Hilarious.

How far can we trust this translation project once the trolls make a few choice "contributions"?

langauge wiki (1, Interesting)

Anonymous Coward | more than 12 years ago | (#3292456)

You know what would be good is a multi-language wiki where people continually change the mapping of word to meaning. That way the meaning of the word and it's most appropriate cross-language equivelent would be "organic". Most static lexicons suck, because they are dry definitions without any cultural relativity.

Who needs it :-P (2)

kryzx (178628) | more than 12 years ago | (#3292468)

Who needs it? You can already find out how to say "My God! There's an axe in my head!" in virtually every language on the planet right here [yamara.com] .

I tried to post the translations themselves, but the "lameness filter" considered it too many "junk characters", even after I removed all the accents and umlauts and such. The lameness filter is lameness incarnate.

universal META language (1)

Traa (158207) | more than 12 years ago | (#3292494)

Instead of the proposed novel yet stoopid approach of letting volunteers do the translating ("Sie sind sehr hübsch!" ---> bored kid ---> "you look like a pig!") why don't we get some experts together to once and for all design a universal META langauge. Create a dictionary for every language into the META language and from the META language into every language and voila, you can translate every language into every other language. To add a language, however obscure, you only need to add 2 translations (to and from the META language).

For n languages this reduces the need for having to have n*n dictionaries down to 2*n. (for example, to translate every of the 6500 languages mentioned on http://www.ethnologue.com/ [ethnologue.com] you need only 13.000 dictionaries...instead of 42.250.000 if you do it the bablefish way)

Re:universal META language (2)

PD (9577) | more than 12 years ago | (#3292539)

This sounds suspiciously like the idea that any unsolved computer science problem can be solved by adding another level of indirection.

Such a good idea... (2)

3ryon (415000) | more than 12 years ago | (#3292515)

Letting anonymous users provide translations....

I want to return this record, it is scratch.

My hovercraft is full of eels.

Please fondle my buttocks.

Fear of Globalism! (2)

piecewise (169377) | more than 12 years ago | (#3292520)

Ich glaube, dass diese Idee sehr wichtig ist, als das Welt immer noch zusammen kommt.

It's very important indeed -- because globalism is a good thing. I use the AltaVista translator all the time when I speak with other who also speak German. I've only had five years of it or so.

Just as a multiracial society in the U.S. has become a very valuable commodity (and it the "right" thing to do) -- globalism is also a good thing. Seriously -- how much contact do we have with other cultures? As much as our economies are tied together, our societies aren't at all.

I don't believe that the world should be one massive country -- but I believe connecting with others can't hurt, and the internet is the perfect way to do it.

Perhaps I would better understand the Middle East if I talked to people in Israel and Palestine. Perhaps there would be less hatred of the U.S. in certain regions if they understand our "superiority" and "imperalism" is really just a striving and fighting for freedom.

Or maybe I'd understand their side better!

Either way, I'm really excited for a better translation service. It should be usable, SMART, and flexible -- as I believe every computer should have instant, built-in translation services.

Imagine IMing someone with english -- but they only speak German -- and it's automatically translated when it reaches them, and vice verse into English as it's coming back to me.

Nun muss ich gehen. Auf Wiederseh'n

Great! (2)

GutterBunny (153341) | more than 12 years ago | (#3292525)

I might finally be able to understand my 2-year-old!

Babelfish (1)

saveth (416302) | more than 12 years ago | (#3292527)

I can dig it. But, I seriously doubt I'll find it to be of any convenience until it slips down into my ear and lives symbiotically with me. ;)
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...