×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

IBM Strives For 'Superhuman' Speech Tech

ScuttleMonkey posted more than 7 years ago | from the fansubbing-in-jeopardy dept.

Software 289

robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

289 comments

Which ... (3, Interesting)

spiny (87740) | more than 7 years ago | (#14555797)

Which witch blew the blue candle out ?

Re:Which ... (1)

lahvak (69490) | more than 7 years ago | (#14555853)

Not really a problem. Machine translation already can handle many words that spell the same but have different meaning (homographs), based on context and position in the sentence. With speech recognition, you just have more of those, you have to throw in homonyms, too.

For simple example, blue in "the blue candle" cannot be a verb.

Re:Which ... (1)

paedobear (808689) | more than 7 years ago | (#14555996)

As someone who works in his non-native language in the world of high-tech, I'd love to see the miracle context-aware machine translation software you speak of.

Re:Which ... (-1, Flamebait)

Anonymous Coward | more than 7 years ago | (#14556122)

DUDE! You totally said homo!

Re:Which ... (5, Interesting)

jcupitt65 (68879) | more than 7 years ago | (#14556023)

Or I can wreck a nice beach versus I can recognise speech.

Sometimes you need rather a large context to disambiguate: is this sentence part of a discussion on shore-front management, or spoken language understanding?

Just what we need... (0)

MichaelSmith (789609) | more than 7 years ago | (#14555806)

One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles.

...More opportunities for Arabic speaking people to misinterpret western media.

Yes, I know that this is meant to be better speach recognition, but how about on the fly translation?

Re:Just what we need... (4, Insightful)

pubjames (468013) | more than 7 years ago | (#14555869)

More opportunities for Arabic speaking people to misinterpret western media.

I think you've got it the wrong way round haven't you? Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?

Re:Just what we need... (1)

MichaelSmith (789609) | more than 7 years ago | (#14555966)

Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?

Yeah, that too.

Re:Just what we need... (0)

Anonymous Coward | more than 7 years ago | (#14555936)

All your base, are belong to us.

Re:Just what we need... (0)

Anonymous Coward | more than 7 years ago | (#14555973)

it's quite obvious Arabic was an intentional choice and not because they wanted to start on languages in alphabetic order ... you can't say IBM's not being patriotic (unlike during WWII?)

Re:Just what we need... (4, Insightful)

user9918277462 (834092) | more than 7 years ago | (#14556116)

There's a very good reason they're testing this tech on Arabic speech primarily. Although they won't say it, I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.

Coherency? (4, Insightful)

PrinceAshitaka (562972) | more than 7 years ago | (#14555810)

From The article "For now, all video processed through Tales is delayed by about four minutes, with an accuracy rate of between 60 and 70 percent" and "The accuracy rate could be increased to 80 percent, Roukos added"

Still even at 80 percent how good is this translation. If that 20% is the important parts of speech You could still be left clueless. Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.

I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason. Sentences in different languages have different sentence structures. While in English the verb is usually the second part, in other languages the verb comes many times last (German). For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.

Re:Coherency? (3, Interesting)

Yahweh Doesn't Exist (906833) | more than 7 years ago | (#14555849)

yes, there will always be delay for the reason you state. but that's true even with human translators, yet no-one claims real-time meetings between people via translators is a waste of time.

since even "live" boradcasts are usually delayed several minutes for technical and legal reasons anyway, if this technology can get to the state where you're just one or two sentences behind real-life it will be effectively real-time anyway for almost all practical purposes.

Re:Coherency? (1)

grimJester (890090) | more than 7 years ago | (#14555948)

In what cases is a four minute delay noticable if the picture and sound are delayed four minutes too? I'd love this for watching movies that are currently completely incomprehensible to me.

For the 80% part, it's good enough to get the gist of what is said. It won't compete with professional human translators, but it will make translation easily available for those who don't have access to a translator.

Re:Coherency? (2, Funny)

sumdumass (711423) | more than 7 years ago | (#14555987)

I'm wondering if this was used durring the lead up on Iraq? "i'm unclear if there are bombs here" and end up getting translated into "there are nuclear bombs here".

Re:Coherency? (2, Informative)

wizrd_nml (661928) | more than 7 years ago | (#14556014)

For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.

Not necessarily. An on-the-fly translator could translate words as it hears them filling in the translated words in the correct location in the sentence. In other words, the sentence doesn't have to be completed in order. It can dynamically expand to fit in new words.

If you listen to human translators doing on-the-fly translation you'll see this is how they work.

Re:Coherency? (3, Interesting)

dancallaghan (890674) | more than 7 years ago | (#14556021)

but I personally don' think there could ever be real time translation for the following reason. [German]

You are going to have that problem whether it's a machine doing the translating or a human. As I understand it, interpreters of German get around this by some quick-thinking restructuring of the translated sentence, or they simply lag a half-sentence or so behind.

The real problem for machine translation is, and always has been, determining the sense of a word from context (indeed I recall a recent Slashdot article about some guy who suggests this is the separating factor between computers and animal intelligence). Most languages have a great many homonyms whose meaning a listener can determine only from the surrounding contenxt and, often, general background knowledge of the language or topic at hand.

And German is an easy one (4, Informative)

Ogemaniac (841129) | more than 7 years ago | (#14556030)

It is as closer to English as any other language. In general, European languages have the same basics as English (such as "the") and are fairly easy to learn and translate. Right now I live in Japan, where the language and its underlying way of thinking basically run in the reverse direction of English. To translate, you are essentially running the whole thing backwards. Worse yet, the fundamental parts of the language are quite different. For example, Japanese does not have articles or prepositions, though it has post-positions that roughly correspond. However, there are fewer of them, so they have "lots of meanings" when translated into English. Translation can be a "#$#, even for a human who understands both languages very well (which is why anime comes off so corny sometimes). There are countless times where there is just no simple way to express a thought in one language that is trivial in the other.

Re:Coherency? (1)

MaxiumMahem (933757) | more than 7 years ago | (#14556070)

I don't think 20% inaccuracy will be a problem. One of the great capacities of the human mind is to develope correct inferences from limited information. Of course the developers should always strive to do better, but being able to understand 4 out of every 5 words is probably enough for someone to grasp the meaning of phrase, especialy within a larger context of information.

And it's not as if people achive 100% accuracy, even in their own languages. We constantly misread and mishear things all the time, yet somehow communication manages to function. Indeed, I suffer from mild dislexia and often misread words in books, usualy without even realising it. I would guess I read at probably only 90-95% accuracy, but comprehend at close to 100% anyways, it would be intresting to test this (any psychologist present?). Human translators obviously do much worse than a person in their native language does. So the computer may be coming pretty close.

Lastly, grammer is probably the least of the issue for the program. Languages have their own code, complete with important bits of meta-information like conjugations and articles to tell what bit means what. People have trouble dealing with new/diffrent rule concepts, probably due to the ingraned way we learn languages, but they are easy for machines. Translating from one bit of grammer rules is pretty easy mechanicaly. The bigger issue for them is to actualy understand what is said.

Re:Coherency? (1)

Urza9814 (883915) | more than 7 years ago | (#14556181)

The verb only comes at the end in German if you have more than one verb...the first verb is always the second piece of the sentence.

Re:Coherency? (1)

Fruit (31966) | more than 7 years ago | (#14556210)

Actually the verb comes last in relative clauses, which also happens to be the base word order for German. In main clauses a bit of movement causes the verb to end up as the second word of the sentence.

Re:Coherency? (1)

kklein (900361) | more than 7 years ago | (#14556272)

This is abysmal. Dr. Paul Nation and others have found that even if a non-native speaker understands 95% of the words in a text, he cannot accurately guess the remaining 5% and comprehension will suffer greatly. This isn't hard to imagine being that even at 95% text coverage, 1 in every 20 words will be unknown. Even if they bump this up to 80%, that is still useless. That's 1 in 5 words!

Granted, I'm pulling from my research in a different field (second language acquisition), but the concepts are the same. What is even worse is if this is WRONG 40% of the time! Then that isn't just MISSING information, it is INCORRECT information! This is utterly useless and probably represents more of an impediment to coherence than an aid.

I'm a geek, and I know we'll get computers to speak fairly authentic language in my lifetime, but as a linguist moving into cognitive science, I'm telling you all it's gonna be awhile. We barely know how WE do it, let alone make it happen with a completely different system architecture!

first? (5, Funny)

Anonymous Coward | more than 7 years ago | (#14555811)

however the researchers stated "We still can't figure out what Bob Dylan is saying"

A great advance in technology! (2)

themysteryman73 (771100) | more than 7 years ago | (#14555816)

Reminds me of a Simpsons episode "Hello Homer, it's me, KITT from Knight Rider"

Seriously though, this is a great advance in technology, but will it still be as funny to listen to? It's always fun typing in words into speech recognition programs and listening to the unexpected results!

Re:A great advance in technology! (0)

Anonymous Coward | more than 7 years ago | (#14556143)

I think you have msunderstood the meaning of the phrase "speech recognition" :-)

Words to learn for .. (-1, Troll)

Anonymous Coward | more than 7 years ago | (#14555822)

..arabic TV

Death, America, To.

NSA Babelfish (2, Funny)

Elixon (832904) | more than 7 years ago | (#14555837)

I cannot wait when I buty the first eBabelfish gadget that I will put in my ear so I can understand spoken language of my russian colegues... ;-) :-) I hope that someobody will not consider it as "important technology for the national security" and will not restrict it by any mean...

(I'm sure that this eBabelfish is already installed - not in my ear - but on the telecommunication centers...)

Re:NSA Babelfish (1)

sumdumass (711423) | more than 7 years ago | (#14555999)

You don't want to understand what they are saying. I have heard them and just trust me on this.

BTW nice 'buty'

Re:NSA Babelfish (1)

Elixon (832904) | more than 7 years ago | (#14556155)

It makes me curious even more... But from now on they should be taking care of what they say... they will never know who has the headphones connected to iPod and whose headphones are connected to the iPod w/ "IBM Babelfish Inside" logo sticked on... :-)

( s/buty/buy/ => Sorry, I'm sometimes fighting with my notebook's keyboard...)

Opensource? (1, Interesting)

Anonymous Coward | more than 7 years ago | (#14555841)

Will IBM make this technology public or will it be proprietary?

Re:Opensource? (3, Insightful)

omeg (907329) | more than 7 years ago | (#14555968)

Of course it won't be open source. They achieved what they dub a "breakthrough in speech recognition". They plan on making a lot of money with this.

sheesh (0)

Anonymous Coward | more than 7 years ago | (#14556072)

go on, take a wild guess.

Foreign languages are complex... (5, Insightful)

pubjames (468013) | more than 7 years ago | (#14555857)

I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.

It's not until you learn another foreign language that you realise how complex languages are, and how subtle. Learning another language can literally change the way you think about things.

This type of technology will make people think they completely understand a foreign language, but they won't. Their understanding will be crude, without the subtleties and cultural understanding.

I can speak English and Spanish fluently, and if I watch an English film with Spanish subtitles I'm always thinking - damn, they missed a good joke there, they got that wrong, etc. (Equally so with a Spanish film with English subtitles). And film subtitles are done by professional translators. God only knows what a terrible job a computer would make of film translation.

Re:Foreign languages are complex... (2, Insightful)

Viol8 (599362) | more than 7 years ago | (#14555890)

"It's not until you learn another foreign language that you realise how complex languages are, and how subtle."

And how wierd sometimes. English for example loves to use the word "up" in all
sorts of unsuitable places:

give up
shut up
fed up
wash up
fuck up
laid up
muck up
turn up
free up
look up
make up
put up
screw up
hang up
wrap up
hold up
grow up

Wtf?

And home come we say "didn't he.." but in longhand its "did he not...". Shouldn't
it be "did not he"? Why does the "not" shift to the other side of the pronoun?
But then all languages have similar wierd , illogical syntax.

Re:Foreign languages are complex... (4, Funny)

MPHellwig (847067) | more than 7 years ago | (#14555921)

And of course: "Up yours!" ;-)

Re:Foreign languages are complex... (1)

bogado (25959) | more than 7 years ago | (#14556167)

And of course: "Up yours!" ;-)

Well, in this particular case dosen't "up" means what the word supose to mean?

Re:Foreign languages are complex... (5, Funny)

Splab (574204) | more than 7 years ago | (#14556166)

From boondock saints:
Rocco: Fucking... What the fuck. Who the fuck fucked this fucking... How did you two fucking fucks...
[shouts]
Rocco: fuck!
Connor: Well, that certainly illustrates the diversity of the word.

Think that just about covers it...

Re:Foreign languages are complex... (3, Interesting)

Mushdot (943219) | more than 7 years ago | (#14555946)

I have a friend works in Japan and he tells me the same. He often goes to watch English films that are subtitled in Japanese and tells me that they completely miss-translate most of the jokes and miss subtle nuances of speech. One example he gave was a scene from 'The Full Monty' (im doing this from distant memory so it might not be quite right - in fact, a bad translation :-)

One of the characters is shouting up to someone in their bedroom window. They don't respond to the shouting and the character says "He obviously can't hear me because of his triple glazing".

This is a sarcastic comment relating to the house owners supposed wealth but in Japanese it was translated as:

"He has thick windows"

Perhaps in this case there was no easy way to translate - but I suspect films are probably translated in one pass and there is no time to understand the context of each sentence spoken so it's left to literal translatation only.

Re:Foreign languages are complex... (1)

pubjames (468013) | more than 7 years ago | (#14555965)

Another example of this I saw in a french film recently. A character was overhearing a conversation about a ship being under quarentine. He said "Is it the captains birthday?" Makes no sense at all in English but in French it is a play on words and (feeble) joke. Impossible to translate.

Re:Foreign languages are complex... (1)

Red Alastor (742410) | more than 7 years ago | (#14556188)

Do you remember the joke ? I speak french and I can't figure out what it originaly was.

Re:Foreign languages are complex... (1)

pubjames (468013) | more than 7 years ago | (#14556216)

The (stupid) character assumed that the captain was having fortieth birthday party - forty being "quarante" in French, so a "quarantaine" sounds a bit like a word for a fortieth birthday party. I said it was feeble. But it is an example of a joke that's impossible to translate.

Re:Foreign languages are complex... (0)

Anonymous Coward | more than 7 years ago | (#14556234)

Won't it be something like "quarante" = 40 and quarantine sounding a bit like 40th?

Captains 40th birthday?

Dunno, 'twas a guess.

Re:Foreign languages are complex... (1)

pubjames (468013) | more than 7 years ago | (#14555983)

I suspect films are probably translated in one pass and there is no time to understand the context of each sentence spoken so it's left to literal translatation only

I think it is more to do with the fact that they have to write the subtitles so that they can be read at the speed of the speech. And so they cannot go into subtleties. In fact often when there is fast dialogue they will miss whole phrases out.

Japanese and English are quite different (2, Insightful)

Ogemaniac (841129) | more than 7 years ago | (#14556044)

and it is usually extremely difficult to translate jokes. The senses of humor are quite different as well. I think this is part of the charm of anime, actually - we are laughing at things Japanese aren't always intended to find funny, while missing half of the jokes that are supposed to be there.

Re:Foreign languages are complex... (2, Insightful)

virtualsid (250885) | more than 7 years ago | (#14556032)

I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.

I'm not quite sure what you mean here not bother because of this technology?

I can't see anyone not wanting to bother learning a language because of this technology. Not unless it was a babelfish/universal translator type technology - i.e. basically invisible. In which case, what's the issue? ;-)

What are you going to do:
a) Walk around with a little device which translates with 60-80% accuracy when you're in a country where people speak a language you do not understand.
b) Try to learn the language so you don't have to rely on a gadget?

I think I know which one I'd choose - not that I can speak anything other than English, but I do try.

Once devices get to 100% accuracy, my argument disappears. I'd love for that to happen too :-)

Sid

Re:Foreign languages are complex... (1)

pubjames (468013) | more than 7 years ago | (#14556060)

I'm not quite sure what you mean here not bother because of this technology?

Perhaps you a not like most people... I often hear English only speaking people say there is no point in learning another language because everyone learns English these days. This just gives them another excuse.

Re:Foreign languages are complex... (1)

virtualsid (250885) | more than 7 years ago | (#14556093)

I wrote:
I'm not quite sure what you mean here not bother because of this technology?

(I also can't write sense!)

You wrote:
Perhaps you a not like most people...

Perhaps you're right, perhaps I'm not like most people. In any case, this technology is not yet the kind that is useful to most people I believe.

I do think it's cool technology, but not really a cause for concern with languages.

Re:Foreign languages are complex... (2, Insightful)

anum (799950) | more than 7 years ago | (#14556077)

Learning a foreign language is a net good and the only way to really understand another culture is to experience it. That said, there are a large number of languages and an even larger number of cultures. Do you intend to learn/experience them all?

Can you see no good in a rough translation for some purposes?

Calculators have largely eliminated the need (an in some cases the ability) for people to do basic math. Therefore we should eliminate calculators before these people start believing that they completely understand cube roots when they just know how to push buttons.

Oh yeah, that reminds me...Cartoons aren't real.

Good luck IBM and I hope this stuff becomes viable soon.

Re:Foreign languages are complex... (1)

pubjames (468013) | more than 7 years ago | (#14556083)

Can you see no good in a rough translation for some purposes?

Of course.

But from the description I think this is being developed for military or intelligence work. In those fields, mistranslations can cause death. And unfortunately I think the current administration is unsophisticated enough to think that machine translation is better than (more expensive) human translation.

Re:Foreign languages are complex... (2, Interesting)

anum (799950) | more than 7 years ago | (#14556174)

Ya, I got ya'.

I almost added "I just hope GWB doesn't decide to fire all his intell linguists based on this post" but it seemed kind of like bashing the Prez and i would never do that...

Cheers

Re:Foreign languages are complex... (1)

polar red (215081) | more than 7 years ago | (#14556081)

This is one of the (many)reasons I hate dubbing, and i am lucky to live in a land where subtitling is only abondonded in children-targetted TV-programmes.

Re:Foreign languages are complex... (0)

Anonymous Coward | more than 7 years ago | (#14556266)

And film subtitles are done by professional (let's do not exagerate) translators.
Who mostly have no idea what they are talking about... they may be good on translating common language but when it comes to some specific stuff like scientific or technical terms, they usually just translate them literally and it ends just like a machine translation.

Ghee... (4, Insightful)

Anonymous Coward | more than 7 years ago | (#14555864)

Hmm, instantaniously translation from arabic, wonder who "cough cough echelon cough!" they are marketing this to.. ?

If they REALLY want to test it properly... (4, Funny)

Viol8 (599362) | more than 7 years ago | (#14555872)

...they should send it to Glasgow on a saturday night just after the pubs
have closed.

"Ye loooiii ahhh me jimmeh??! *belch* C'mere ya wee electrahnich bastid, I'll
shoo ye!"

Re:If they REALLY want to test it properly... (1)

LiquidCoooled (634315) | more than 7 years ago | (#14555898)

Clippy: "It looks like your having a seizure, would you like me to call an ambulance?"

Available with old version of Mandrake Linux (1)

yamum (893083) | more than 7 years ago | (#14555879)

ViaVoice was shipped with an older version of Mandrake Linux.

Anyone know where I can get this from?

On-The-Fly (4, Informative)

Trurl's Machine (651488) | more than 7 years ago | (#14555901)

They really do it on the fly? You mean, [on the surface of] [a particular] [insect of a Musca domestica species]?

I have read a lot of auto-translated documents and it is always a good laughter in terms of "crapslation cabaret". So far, there is no technology that could auto-translate a text document succesfully. The "80% success" is a myth - they just count how many words were found in the vocabulary, not how many of them were put into a good context. A "fly" translated as an insect would be accounted as a success!

Even if you are not a bot but a human being with some knowledge of the other language and culture, it's very easy to involuntary offend someone or just to make a ridiculous faux-pas. Polish and Czech languages, for example, are very much alike and use common roots for many words, but because of the way both languages evolved, some neutral terms on one side of the border have become offensive on the other side. Czechs evolved an euphemism for sexual intercourse based on the verb "to look for". Poles still use this word when they look for something, which leads to constant crapslation cabaret gags when a Polish tourist appears in a Czech town "looking for a parking lot". Now, auto-translate this...

Re:On-The-Fly (1)

coofercat (719737) | more than 7 years ago | (#14556019)

Doubtless what they're saying is over-stretched hype. However, the application of speech recognition to translation to natural language processing makes for some interesting stuff.

The problems you outline happen in English -> Canadian (and probably American too), let alone more complex translations (try calling a Canadian a 'native' - doesn't tend to go down well, but it's normal fare in the UK).

However, add in "domain knowledge" and you're in some interesting territory. I think this is essentially what Google did - they fed in oodles of texts in the various languages so that the system could statistically match phrases. At a simple level, you could have a lookup table of common colloquialisms (eg. 'he's kicked the bucket'(English/UK) == 'he broke his pipe' (French/FR)).

Of course, the only way to really get this going is with natural language processing. At the moment though, computers can (AFAIK) only understand things they're expecting, as opposed to understanding anything and then reproducing it in another language. A way to go there, but I'm sure IBM are on it already... Natural Language processing has to evolve with the language, so it's always a bit of a moving target, and hard to do, because the kids keep inventing new versions of the language ("naar wot I'm sayin'?).

What are the dangers of seeing this in the wild anytime soon? Very slim, I'd say. Of course, they may release the raw speech-to-text engine as a binary, but the rest of it is experimental at best, and currently has enormous amounts of R&D budget absorbed into it (and NL will probably be on subscription). You may be able to buy it as a service sometime though, I guess...?

Re:On-The-Fly (2, Insightful)

Red Alastor (742410) | more than 7 years ago | (#14556211)

However, add in "domain knowledge" and you're in some interesting territory. I think this is essentially what Google did - they fed in oodles of texts in the various languages so that the system could statistically match phrases. At a simple level, you could have a lookup table of common colloquialisms (eg. 'he's kicked the bucket'(English/UK) == 'he broke his pipe' (French/FR)).
The problem is that why French/FR people will understand the expression, others like French/CA won't. And even if they did special lookup tables, you'll still miss subtely. For instance, if I want to use the expression you gave as an exemple as a warning to someone in French/CA, I could say "You'll break your neck." which would carry the same meaning. But if I say that someone broke his neck, then it should be understood literally.

Re:On-The-Fly (1)

Aceticon (140883) | more than 7 years ago | (#14556179)

Portuguese is both spoken in Portugal and Brasil.

Still, for example the slang word use in Portugal for "traffic jam" (bicha) is the slang word in Brasil for "gay".

Talking about the congestion on the streets of Lisbon takes a whole new meaning in Brasil.

Re:On-The-Fly (1)

blackest_k (761565) | more than 7 years ago | (#14556286)

machine translation is ropey admittedly but one of the best for polish english translation is
English Translator3 www.techland.pl
Earlier versions didn't know the difference between a shower of rain and taking a shower for instance. although you still need to take care with Polish and polish the capital P makes a difference.
it does provide alternative translations so you can do a basic translation and apply a more appropriate translation.
It's getting old now so perhaps there has been an update.

Superhuman speech? (-1, Troll)

Council (514577) | more than 7 years ago | (#14555933)

IBM Strives For 'Superhuman' Speech Tech

The project was originally conceived to translate, into flawless English, the slurred speach of Christopher Reeve.

. . . *rim shot*

Christopher Reeve... (0)

Anonymous Coward | more than 7 years ago | (#14556219)

He's dead, you insensitive clod!

IBM and Google cooperation to come? (2, Interesting)

Mostly a lurker (634878) | more than 7 years ago | (#14555934)

IBM has been one of the pioneers in speech recognition for a long time. However, indications are that Google (in the lab) [slashdot.org] has been making tremendous progress in translation. While the two companies are bound to be fierce competitors, it would seem they would both have much to gain from cooperation in the area of language recognition and translation.

This won't make speech recognition mainstream (4, Interesting)

thbb (200684) | more than 7 years ago | (#14555949)

As it has been the case for the past thirty years, the description of the prowesses of the system are still written in the conditional form: "...IBM technology can be used to control computers and devices..." rather than the active form: "is being used"...

Ben Shneiderman is the person who, in my opinion, articulates the best the limits of speech recognition [umd.edu] .

One of my favorite phrases to explain this issue is: "You don't want to speak to a computer, because you can't speak and think at the same time". More precisely, speech utterance makes use of some modules in our brain which are required for planification too. Hence, you can't plan as well what to do next when you speak, which is a big hurdle in the type of intellectual activities one carries with a computer.

Awful default TTS (3, Insightful)

Council (514577) | more than 7 years ago | (#14555957)

Speech-to-text is cool, but for 30 years they've been predicting it's the next new thing in interfaces, and it's remained a niche thing as it gets better and better. Maybe it'll hit the point where it's flawless and suddenly find new markets, but we'll see.

What really bothers me is the state of Windows text-to-speech. The TTS that ships with the most popular operating system on Earth is easily trumped in understandability by a small third-party program I downloaded literally TWELVE YEARS AGO. I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.

This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.

I'm glad people are doing good speech research, (I know I've seen a demo of good IBM TTS somewhere) but I hope it finds its way into Windows someday.

Re:Awful default TTS (1)

nogginthenog (582552) | more than 7 years ago | (#14556036)

I know what you mean. I remember the speech functionality that came with my Amiga in 1989 was superior.

Re:Awful default TTS (1)

mrjb (547783) | more than 7 years ago | (#14556228)

Amiga? In 1982, the TI-99/4a with Terminal Emulator II and speech synthesizer already did what XP's tin man does nowadays. Pity that machine was crippleware, you had to buy all kinds of add-ons for it to get some power from it.

Re:Awful default TTS (2, Informative)

wfWebber (715881) | more than 7 years ago | (#14556046)

Then again, if they supplied a version that produced awesome quality voices, they'd be accused of trying to kill their TTS competition.

That said, in Microsoft Windows Vista (ETA 2019), the default TTS engine will be replaced by a new one sporting Anna [wikipedia.org] . Have heard her in the preview and I have to say, it's one hell of an improvement.

Re:Awful default TTS (1)

Viol8 (599362) | more than 7 years ago | (#14556048)

Probably BECAUSE speech is a niche market , MS don't want to spend the
money on making it any better. So long as it sort-of works then the marketing
droids have something apparently bleeding edge to waffle on about in the sales
pitch knowing full well very few people will use it and discover how crap it
is, and the ones who do are such a small percentage anyway that they won't care.

ViaVoice (1)

TheRealDamion (209415) | more than 7 years ago | (#14555979)

The xvoice team have failed to get IBM to recompile newer ViaVoice libraries, or even the same code against a more modern libc, ld.so and gcc environment making it quite hard to keep it working on newer distributions. It's also limited to ia32. They certainly don't seem likely to release the source code.

So I'm surprised to see an announcement like this one.

American or English? (2, Interesting)

squoozer (730327) | more than 7 years ago | (#14555989)

I realize that Anericans and British (English at least ;o)) speak essentially the same language but I have yet to find any speech recognition software that can get more than roughly 85% of what I say correct. I have a fairly soft neutral english accent with pretty good enunciation so I would have expectd to be getting a recognition rate in the high 90%s. I'm wondering if, as most of this software is developed in the US, it is tuned specifically to pick up on english with a US accent? I realize that you train the software for your voice but AIUI all you are doing is tuning a basic speech model. Has anyone else had this problem or is it just me?

Tip (0)

Anonymous Coward | more than 7 years ago | (#14556109)

Pronounce any word ending with 'ing' as if it ended in 'in'.

That should boost your accuracy.

Re:Tip (1)

squoozer (730327) | more than 7 years ago | (#14556274)

I gave up on speech recognition as everything but a toy a while ago but your tip could lead to some interesting mistakes. Take for instance the sentence fragment "Runing to the door". If it is pronounced as you suggest it could easliy be misunderstood by the machine to be "run in to the door" which could have nasty consequences.

Re:American or English? (2, Funny)

Vengeance (46019) | more than 7 years ago | (#14556145)

I'm sorry, what?!?!?

I cannot understand a word you're saying. What's with that accent?

Re:American or English? (1)

IamTheRealMike (537420) | more than 7 years ago | (#14556157)

Existing speech recognition engines rely on statistical approaches just like this "miracle" product does to disambiguate sounds and words, and yes about 80% accuracy sounds right. Of course this is too low when competing against a keyboard, even though speech recognition could be a lot faster by the time you corrected all the mistakes it works out slower - hence the reason it's only used in limited applications.

I have virtually no accent at all, except for very mild British overtones, yet speech recognition has never worked well for me either.

Oh oh oh. (3, Funny)

Anonymous Coward | more than 7 years ago | (#14556003)

I think it was about 1996 or maybe 1997 when I attended an IBM demonstration (for retailers) for its speech recognition software. Anyway, the lady who was narrating the text and. talking. like. a. robot. to. do. it. was half-way through when, for no apparent reason, the word uterus appeared in the text.

So I'm sitting here thinking of how funny it was to the juvenile me back then, and how unfunny it seems right now. Oh well.

Not _that_ amazing (2, Interesting)

johndoe42 (179131) | more than 7 years ago | (#14556026)

It's been well-known among language researchers that both speech recognition and parsing/comprehension are much easier when applied to a small problem domain. SRI in Palo Alto and CSLI at Stanford, for example, have a number of very impressive speech recognition packages that understand, for example, medicine-related sentences. The dashboard controls just sound like a logical progression of this to faster computers and an even smaller problem domain. They're cool nonetheless.

The translation, on the other hand, sounds damned impressive. For unrestricted content, especially with an untrained voice (I imagine that IBM isn't individually training to each Al Jazeera talking head), 70% recognition sounds quite good. 70% accuracy post-translation ought to be quite a bit better than what's currently out there. The description of MASTOR, however, is useless -- it could easily describe anything that isn't word-for-word translation.

Re:Not _that_ amazing (1)

dchaley (949272) | more than 7 years ago | (#14556106)

Since you mentioned CSLI at Stanford, they are in fact already working on a speech-driven (human to system and system to human) in-car radio and navigation system [stanford.edu] in collaboration with Bosch. The prototypes are very impressive, but unfortunately not many details are available on the public web.

So yes, this is cool stuff, but as you say, not _that_ cool.

Buyer beware (4, Insightful)

99luftballon (838486) | more than 7 years ago | (#14556085)

Speech recognition has long been the land of inflated promises and little returns. Anyone remember Lernout & Hauspie and its supposed 15 minutes learning time?

Speech recognition is riddled with problems. From a computing side it's enormously processor intensive and memory hungry. From a computer side it's very com,plex code and the 'learning' process is fraught with problems - surnames, company names and locations are all very poorly recognised.

So don't rush to buy. Let the labs check it out first.

Trusted Computing (1)

The New Andy (873493) | more than 7 years ago | (#14556120)

This is one of those things that won't be possible with trusted computing. With encrypted audio+video streams for everything, all these cool technologies won't be able to be made. Hopefully, someone makes a program like this which goes mainstream - that ought to educate people about trusted computing as soon as they try to sneak it in.

I'll just be happy if (1)

el_womble (779715) | more than 7 years ago | (#14556147)

it does what the current generation of speech recognition claims to do. I have yet to find any dictation software that is even remotely accurate, and the voice command software has been pap, at least for me. There is something about my accent that really upsets speech recogntion software.

Nintendogs: I've stopped trying to train my dog, its never going to happen.
Apple Speech: Only works if I use a terrible californian accent. Not worth the embarresment.
Nokia: Even with just one voice command, my girlfriends name, if still can't match my voice.

If this can translate foreign languages in to American (sic) then it definately sounds like it could stand a chance at translating English into text and command.

tech support (0)

Anonymous Coward | more than 7 years ago | (#14556190)

so, can it do a bangalore accent, maybe it can call itself for support when gets into difficulties. but then if it's real time will the onscreen subtitles just say "your call is important to us". ouch, all those poor call centers

learn the langage ? (0, Flamebait)

Anonymous Coward | more than 7 years ago | (#14556217)

why ? those c**ksuckers in US governement agencies can't learn arabic ?

WHAT ABOUT BROKEN ENGLISH/HALF ENGLISH? (0)

Anonymous Coward | more than 7 years ago | (#14556229)

Huh, kAKO TO TRANSLATE?

funny this subject should come up... (2, Interesting)

dafragsta (577711) | more than 7 years ago | (#14556257)

I've actually never used any speech recognition software before today. That said, today just happens to be the day. That said, I tried out Dragon NaturallySpeaking for the first time, and it is a complete coincidence that this topic should come up. I'm actually dictating this post with Dragon, as we speak. ha ha

the training process definitely has its ups and downs. The more you work with it however, the more it becomes attenuated to your own speech patterns and moreover, the quirky words we use every day. If you can get past the first two or three hours, you'll see that it is totally worth the effort, especially if this IBM tech isn't available to end-users for some time. There is also an aspect of the software training you, while you train the software. At the present time, I can dictate to slightly slower than I can probably type.

In the end, I can see where this would make a writing e-mails and other such time-consuming tasks, which involve spellchecking, grammar, and other proof reading significantly quicker. When you really hit your stride, it's easy to write at the speed of thought, which is really appealing. There are caveats, however. it's very easy to dictate several sentences worth of tax and taken for granted that it to everything down the way you attendedselect tax select select tax undo

Re:funny this subject should come up... (1)

dafragsta (577711) | more than 7 years ago | (#14556277)

A good case-in-point example of a pitfall is that I totally forgot that some of the text that I dictated from within the Slashdot window was mangled to hell and gone. That said, within Notepad, the results are very acceptable to say the least. I'm definitely getting closer to being able to write at the speed of thought. When the application is hitting on all eight cylinders. Another thing I forgot to mention is that occasionally it will get confused. If it happens to get confused with regard to the built-in commands. You will want to straighten that out in a hurry, because it was easily the most frustrating part of the training process. In the best case scenario, you'll be using these built-in commands to make the training process go faster.

I am surprised (1)

Shar-Kali-Sharri (890290) | more than 7 years ago | (#14556275)

... how critical people have been in their replies 'till now. I mean sure there are bound to be problems with this tech, but I think what's really interesting is the implications of a mostly succesful on-the-fly translation, - babblefish anyone... Supposedly with fast enough computers and advanced enough programs - imagine being able to commicate with everyone in the whole **cking world.... This would have enormous consequences for everything... humanity unite - (or problably bloody warfare ...). It might be true that this would problably remove some peoples motivation for learning other languages... but if look at the world today, there are quite a lot of bi-lingual people, but how many tri-lingual and in extreme consequence of this tech - 500-lingual.... You could potentially communicate with bloody QuEthc-indians..... This is what I think is the real issue here - not that some subtitles might miss a joke....
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...