Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

The Future of Speech Technologies

Zonk posted more than 8 years ago | from the you-want-some-toast? dept.

Technology 101

prostoalex writes "PC Magazine is running an interview with two of the research leaders in IBM's speech recognition group, Dr. David Nahamoo, manager of Human Language Technologies, and Dr. Roberto Sicconi, manager of Multimodal Conversational Solutions. They mainly discuss the status quo of speech technologies, which prototypes exist in IBM Labs today, and where the industry is headed." From the article: "There has to be a good reason to use speech, maybe you're hands are full [like in the case of driving a car]. ... Speech has to be important enough to justify the adoption. I'd like to go back to one of your original questions. You were saying, 'What's wrong with speech recognition today?' One of the things I see missing is feedback. In most cases, conversations are one-way. When you talk to a device, it's like talking to a 1 or 2 year old child. He can't tell you what's wrong, and you just wait for the time when he can tell you what he wants or what he needs."

cancel ×

101 comments

Sorry! There are no comments related to the filter you selected.

Solution to "one-way" problem (3, Funny)

blair1q (305137) | more than 8 years ago | (#14589089)

I have a solution to the "one-way" communication problem.

More popups.

Audio popups!

Heads-up display popups!

Holy blackberries! Get me my patent attorney!

Oh no! (5, Funny)

Ardeocalidus (947463) | more than 8 years ago | (#14589125)

"Car, brake"

"I'm sorry, Dave. I'm afraid I can't do that"

Re:Oh no! (4, Funny)

lukewarmfusion (726141) | more than 8 years ago | (#14589299)

That's because the car thought you said "break." You should speak more clearly.

Prostitute Schedule for Jan. 28 at the MBOT in SF (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#14589350)

Folks, check out the updated prostitute schedule [fuckedcompany.com] for January 28 at the Mitchell Brother's O'Farrell Theater (MBOT) in San Francisco. The MBOT is the most convenient way for you to buy a blow job, a hand job, and full service (i.e. vaginal sexual intercourse).

I kid you not.

Please establish a hypertext link to this message. Spread the word!

Re:Prostitute Schedule for Jan. 28 at the MBOT in (1)

Zantetsuken (935350) | more than 8 years ago | (#14591477)

god, are you seriously spamming ads for an online prostitue service on /.? I sure hope /. captured your IP and stored it in the database that holds the post you made so they can ban your spammin ass from posting... your lucky they probly wont find it since they arent likely to go read every post for shit like that...

Re:Oh no! (1)

michelcultivo (524114) | more than 8 years ago | (#14589395)

- Car, STOP! - I did it again...

Re:Oh no! (1)

6*7 (193752) | more than 8 years ago | (#14590430)

It's CARR and Michael.

the footer offs peach take no allergy (2, Informative)

backslashdot (95548) | more than 8 years ago | (#14589155)

mast and the stand can't aches.

(the future of speech technology must understand context)

Re:the footer offs peach take no allergy (3, Informative)

knipknap (769880) | more than 8 years ago | (#14589233)

The present of speech technology already does, and did so for years. One problem is that you don't have a huge enough word corpus for training that technology (the knowledge of context is always limited to the domain that you have been training it against).

its been a while (3, Insightful)

joe 155 (937621) | more than 8 years ago | (#14589158)

I've been waiting for years for speach recognition technology to get to an acceptable standard and over that time I've used a couple, the one i got lately (dragonsoft I think) was ok, but they need to come quite a bit further before I'll be adopting all the way.

I'm looking forward to when I can say "computer, open openoffice for me mate" and it'll go "sure"... That'll be sweet.

Re:its been a while (1)

Fus (809178) | more than 8 years ago | (#14589379)

Ever try Nitrous VoiceFlux? http://www.voiceflux.net/ [voiceflux.net]

Re:its been a while (4, Interesting)

SoSueMe (263478) | more than 8 years ago | (#14589438)

Dragon Naturally Speaking from Nuance [nuance.com] is about 75-80% accurate out-of-the-box. It is the other 20-25% that you have to invest the time in to get it to your liking. Even after a few months, you will probably still only reach up to 95% accuracy.
Using it when you have a cold, sore throat or when you have been indulging in your favorite alcoholic beverage can corrupt your voice profile and set you back considerably.

Never let someone else use it under your voice profile.

Will voice rec systems ever be 100% accurate and spearker independant? Maybe, but I don't expect to see it for a long time.

Re:its been a while (1)

Grimboy (948054) | more than 8 years ago | (#14591224)

This is something that needs to be inbuilt. Are you [user]. Have you been drinking? Do you have a cold? For the second two you'd probably then apply some kind of clever filter or make the profile looser, and don't write changes.

Re:its been a while (2, Interesting)

eam (192101) | more than 8 years ago | (#14592046)

We use Dragon in a digital dictation system for the radiology department where I work. We moved to the system about 6 years ago.

We have all the problems mentioned (except drinking). There are also some others that you might not consider. For example:

As the day wears on, the radiologist will get tired, and the recognition will become worse.

Also:

A radiologist who started at 6:30AM will see the sound characteristics of the room change dramatically as more people begin working and activity in the reading room increases. Even environmental systems cycling on & off can affect the recognition.

Despite this, when we receive a complaint about the voice recognition and we observe the user in action, they usually achieve 90-95% accuracy. That is really the most the vendor ever claimed was possible.

It is my understanding that for radiology practices in which the doctors share the profits, the voice recognition systems are a hit. You can see why when you look at the numbers. When we adopted the system, we had been using transcriptionists at a cost of about $600,000/year. After the change the annual cost of the speech recognition system was about $100,000. That doesn't take into account the greatly decreased turn-around time. Now we could have your report emailed to your doctor before you get your pants back on.

For Immediate Release: (-1, Troll)

Anonymous Coward | more than 8 years ago | (#14589165)

GNAA Announces Full Cybermilitary Support of the German Government
GNAA Announces Full Cybermilitary Support of the German Government
Mikhail Borovsky (GNAP) - Moscow, Russia - GNAA President timecop and Vice-President jesuitx held a press conference live via satellite from GNAA US HQ in Tarzana, CA where they announced full cybermilitary support of the German government following the German injunction against Wikipedia. From the German Wikipedia site at www.wikipedia.de,

"Liebe Freunde Freien Wissens, durch eine vor dem Amtsgericht Berlin-Charlottenburg am 17. Januar 2006 erwirkte einstweilige Verfugung wurde dem Verein Wikimedia Deutschland - Gesellschaft zur Forderung Freien Wissens e.V. untersagt, von dieser Domain auf die deutschsprachige Ausgabe der freien Enzyklopadie Wikipedia (wikipedia.org) weiterzuleiten."

This roughly translates as, "Dear friends and comrades, Wikipedia has been shut down as of January 17th, 2006 due to a court injunction by the government of Germany, due to extensive support by Wikipedia for the Jews and the state of Israel".

This type of support was made illegal in Germany in 1939 by the Berlin Pact, signed by Adolf Hitler and Josef Stalin. Angela Merkel, chancellor of Germany has announced that this injunction will not be lifted until Wikipedia stops supporting "Die Juden".

"We also feel this injunction came in due time, as Wikipedia is being overrun by articles pertaining to non-notable blogs with completely useless information (or "blogs [wikipedia.org] "), which are also illegal in the Great Republic of Germany. We are pleased to receive the support of the Gay Niggers, as they have already declared war on the blogs [wikipedia.org] , and know how to defeat this communist ideal before it can become a threat to freedom," said Mrs. Merkel.

About Germany

"Bundesrepublik Deutschland" was founded before the middle ages by the Visigoths. The government was non-notable per above until the late 1930's, when Germany underwent an extremely positive and successful cultural revolution. Today, Germany is a beacon for free economy and a land without Jews.


About GNAA:
GNAA (GAY NIGGER ASSOCIATION OF AMERICA) is the first organization which gathers GAY NIGGERS from all over America and abroad for one common goal - being GAY NIGGERS.

Are you GAY [klerck.org] ?
Are you a NIGGER [mugshots.org] ?
Are you a GAY NIGGER [gay-sex-access.com] ?

If you answered "Yes" to all of the above questions, then GNAA (GAY NIGGER ASSOCIATION OF AMERICA) might be exactly what you've been looking for!
Join GNAA (GAY NIGGER ASSOCIATION OF AMERICA) today, and enjoy all the benefits of being a full-time GNAA member.
GNAA (GAY NIGGER ASSOCIATION OF AMERICA) is the fastest-growing GAY NIGGER community with THOUSANDS of members all over United States of America and the World! You, too, can be a part of GNAA if you join today!

Why not? It's quick and easy - only 3 simple steps!
  • First, you have to obtain a copy of GAYNIGGERS FROM OUTER SPACE THE MOVIE [imdb.com] and watch it. You can download the movie [idge.net] (~130mb) using BitTorrent.
  • Second, you need to succeed in posting a GNAA First Post [wikipedia.org] on slashdot.org [slashdot.org] , a popular "news for trolls" website.
  • Third, you need to join the official GNAA irc channel #GNAA on irc.gnaa.us, and apply for membership.
Talk to one of the ops or any of the other members in the channel to sign up today! Upon submitting your application, you will be required to submit links to your successful First Post, and you will be tested on your knowledge of GAYNIGGERS FROM OUTER SPACE.

If you are having trouble locating #GNAA, the official GAY NIGGER ASSOCIATION OF AMERICA irc channel, you might be on a wrong irc network. The correct network is NiggerNET, and you can connect to irc.gnaa.us as our official server. Follow this link [irc] if you are using an irc client such as mIRC.

If you have mod points and would like to support GNAA, please moderate this post up.

.________________________________________________.
| ______________________________________._a,____ | Press contact:
| _______a_._______a_______aj#0s_____aWY!400.___ | Gary Niger
| __ad#7!!*P____a.d#0a____#!-_#0i___.#!__W#0#___ | gary_niger@gnaa.us [mailto]
| _j#'_.00#,___4#dP_"#,__j#,__0#Wi___*00P!_"#L,_ | GNAA Corporate Headquarters
| _"#ga#9!01___"#01__40,_"4Lj#!_4#g_________"01_ | 143 Rolloffle Avenue
| ________"#,___*@`__-N#____`___-!^_____________ | Tarzana, California 91356
| _________#1__________?________________________ |
| _________j1___________________________________ | All other inquiries:
| ____a,___jk_GAY_NIGGER_ASSOCIATION_OF_AMERICA_ | Enid Al-Punjabi
| ____!4yaa#l___________________________________ | enid_al_punjabi@gnaa.us [mailto]
| ______-"!^____________________________________ | GNAA World Headquarters
` _______________________________________________' 160-0023 Japan Tokyo-to Shinjuku-ku Nishi-Shinjuku 3-20-2

Copyright (c) 2003-2006 Gay Nigger Association of America [www.gnaa.us]

You mean now Clippy speaks?! (0)

Anonymous Coward | more than 8 years ago | (#14589198)

Hey, it looks like you're trying to write a letter! Let me..."

*BANG*

*monitor explodes*

Me (covered in monitor pieces): So, Dirty Harry was right after all! A bullet from a .357 does blow Clippy's head "clean off"!

.44 Magnum... (0)

Anonymous Coward | more than 8 years ago | (#14589329)

PUNK!

Re:You mean now Clippy speaks?! (1)

AndroidCat (229562) | more than 8 years ago | (#14589485)

Nah, you'd have to hack the clippit.acs file to get him to talk. And that would be evil. [primus.ca] Bwahahaha!

What's wrong with speech? (5, Funny)

Nuclear Elephant (700938) | more than 8 years ago | (#14589207)

What's wrong with speech recognition today?

I took a brief poll, and nobody seems to have a problem:

Bruce: I sure like being inside this fancy computer.
Vicki: Isn't it nice to have a computer that will talk to you?
Agnes: Isn't it nice to have a computer that will talk to you?
Kathy: Isn't it nice to have a computer that will talk to you?

Except the trinoids, who complained:
We can not communicate with these carbon units.

I wasn't sure which Carbon they were talking about.

MOD PARENT UP (1)

dch24 (904899) | more than 8 years ago | (#14589322)

Apple has had speech technology for years!

Re:MOD PARENT UP (2, Informative)

penguin-collective (932038) | more than 8 years ago | (#14589586)

Yes, and Apple's speech recognition technology is many years behind the state of the art. IBM and others had better speech recognition and speech synthesis a decade ago than Apple has today.

And where exactly is new speech technology supposed to come from inside Apple anyway? They fired all the people who knew anything about speech in the 90's and shut down the labs.

Re:MOD PARENT UP (1, Interesting)

Anonymous Coward | more than 8 years ago | (#14589768)

That is subjective. IMO, Apple's approach was right. The only commercially "successful" approaches so far have been for dictation. I'd say that's still a niche market. The rate of error for transcription is way beyond the rate of even the most bubble-headed secretary, and that's not even considering specialty terms for particular fields of industry.

I don't think too many people realized how much more useful Apple's speech technology became when S.I. was teamed with AppleScript. I'm sure IBM's technology is/was superior, but so was Token Ring. Sometimes the better approach is start at 'cheap and appealing' before jumping to advanced.

Re:MOD PARENT UP (0)

Anonymous Coward | more than 8 years ago | (#14593724)

"Yes, and Apple's speech recognition technology is many years behind the state of the art."

That is subjective.


No, it's not "subjective", it's objective: their technology is outdated. It's outdated in terms of how it works, in terms of what it does, and in terms of how well it does it.

IMO, Apple's approach was right.

That's besides the question. The fact remains that Apple did not invent speech recognition, contributed little to it, has nobody working on advancing the state of the art, and is shipping technology that's years out of date. In fact, Apple is doing today what they used to accuse Microsoft of: they aren't investing in research themselves, they are just taking other people's technologies and claiming them as their own.

Re:What's wrong with speech? (0)

Anonymous Coward | more than 8 years ago | (#14590604)

Hmm, odd. When I took the poll, my samples were WAAAAAAY offtopic:

Deranged: "I need to go on a really long vacation."
Junior: "My favorite food is pizza."
Albert: "I have a frog in my throat."

I had to stop midway through Good News - damn, that kid doesn't know when to quit.

Re:What's wrong with speech? (0)

Anonymous Coward | more than 8 years ago | (#14592349)

Ah references to the good ole chips in the Amiga..... Haven't heard those names for a while. I'm not sure abou the trinoids though...

Speech is the future! (1)

mokolabs (530326) | more than 8 years ago | (#14589245)

"In this 10-year time frame, I believe that we'll not only be using the keyboard and the mouse to interact, but during that time we will have perfected speech recognition and speech output well enough that those will become a standard part of the interface."

Re:Speech is the future! (1)

vertinox (846076) | more than 8 years ago | (#14590679)

Heh. I think it would be more useful to have eye movement controled cursors.

Re:Speech is the future! (2, Insightful)

Grimboy (948054) | more than 8 years ago | (#14591426)

I think mouse and keyboard with screen is far faster than audio recognition/feedback will ever be.

Re:Speech is the future! (1)

CastrTroy (595695) | more than 8 years ago | (#14591899)

And keyboard only is much faster than mouse+keyboard if the system is designed correctly. Except in the case where you are required to point at spots on the screen, such as for editing images, I would rather not use a mouse at all. Word Perfect 5.1 was the king of word processors. Everything could be done as a combination of alt/ctrl/shift and the F keys.

your (0, Flamebait)

Anonymous Coward | more than 8 years ago | (#14589250)

"There has to be a good reason to use speech, maybe you're hands are full"

That's 'your', not 'you're'

When can online 'journalists' stop making mistakes like this?

Re:your (2, Funny)

legalize.ganja.now. (923280) | more than 8 years ago | (#14589302)

blame their speech recognition software

Re:"maybe you're hands are full" (1)

product byproduct (628318) | more than 8 years ago | (#14589314)

Robyn Peterson has the same problem as the speech recognition systems he's reporting on -- he can't hear the difference between "you're" and "your".

Re:"maybe you're hands are full" (2, Funny)

Rotund Prickpull (818980) | more than 8 years ago | (#14589364)

Neither can I, so I have to cheat and use these things called "context" and "rules of English grammar".

Re:your (1)

kurzweilfreak (829276) | more than 8 years ago | (#14593850)

This just goes to prove the point yet again that a technology won't take off until it's used by the porn industry.

Language Acquisition... (5, Interesting)

GnomeChompsky (950296) | more than 8 years ago | (#14589257)

I'm a linguist, and it seems to me that Speech Recognition would be incredibly, incredibly useful in the research that's going on right now into Language Acquisition.

You see, the problem right now is that there's really not much data that's in the public domain for linguists/psychologists/what-have-you to study, because it's incredibly, incredibly laborious to do longitudinal studies of children's utterances, or of input to the child. People spend hours and hours and hours transcribing 20 minutes of tape. They're understandably reticent to just share their data out of the goodness of their hearts. Even when they do, it's never a large sampling of children-and-their-interlocutors from-birth-to-age-X, it's usually just one child and maybe his or her parents from age 8 months to 3 years.

So we have arguments about whether or not kids hear certain forms of input (Have you used passive voice with your child recently? Where's your child going to learn subjacency?) that go back and forth between psychologists and linguists, and people perform corpus studies on 3 children and feel that that's representative -- never mind the fact that these three kids were all harvested from the MIT daycare centre, and were the children of grad students or faculty members, and thus may not be representative of the population at large.

Speech recognition would make it much, much easier to amass large corpora of data for larger samples of the population. It'd make it much more likely for people to share their data. And, what's more, it'd likely be possible to have a phonetic and syntactic-word-stub (for lack of a better word) transcription made from the same recording. We'd have a better idea of how the input determines how language is acquired by children, and what sorts of stages children go through.

Re:Language Acquisition... (2, Interesting)

QRDeNameland (873957) | more than 8 years ago | (#14589442)

Very interesting. Since you're a linguist, I wonder if you might address a concern I've had about speech recognition technology in general.

I've dabbled a bit with Dragon Naturally Speaking in the past (v.7) and frankly found it still too immature to be of much use to me. I find it still far easier to deal with an accurate yet artificial interface (keyboard and mouse) than an inaccurate but more "organic" interface (speech recognition).

But one of the things that stood out from the experience was the way in which I found myself quickly (if frustratedly) adapting my speech patterns to comply with the machine ability to interpret me.

Is anyone out there considering the consequences of speech recognition technology on the evolution of human speech? It seems to me that any speech technology is going to be imperfect to some extent, but the better it gets, more people are going to use it and those people will inevitably end up adapting their speech patterns to the machine.

Could this technology end up homogenizing human speech patterns to fit the computer's speech recognition model? Is this even a valid concern in your opinion, and if so, is anyone in the linguistics field considering these implications?

Re:Language Acquisition... (1)

Drakonite (523948) | more than 8 years ago | (#14589737)

You are concerned that people will adapt their speech patterns to be more clear and easier to understand, and that it will catch on?

I have a problem seeing how we should be worried...

Re:Language Acquisition... (1)

QRDeNameland (873957) | more than 8 years ago | (#14589947)

If such clarity and ease of understanding comes with the cost of restricting the range of human expression or the future evolution of language, then yes, I would say that's a reason to be concerned. I happen to believe that linguistic idiosyncracies like slang are an important part of human expression and the ongoing evolution of language, and I suspect the vast majority of linguists would agree.

Re:Language Acquisition... (1)

knipknap (769880) | more than 8 years ago | (#14589810)

I'm a linguist, and it seems to me that Speech Recognition would be incredibly, incredibly useful in the research that's going on right now into Language Acquisition.

Err... I'm confused, isn't this research going on exactly to provide speech recognition/transition systems with the data? So a perfect speech recognition system would make further acquisition unnecessary. What else would you want to collect the data for?

Re:Language Acquisition... (1)

sseaman (931799) | more than 8 years ago | (#14590851)

Language acquisition is a lot more than speech recognition, of course. Speech production and speech comprehension are a big part, and likely much more complex. The poster is claiming that the development of a speech to text tool would aid language acquisition research because it would aid in transcription, which is a big part of the research. Those transcriptions may be used to train neural networks, but those neural networks are very far from what anyone would call an accurate model of language acquisition or human speech recognition. The transcriptions are also used as qualitative and quantitative data for linguists and psychologists interested in cognitive development.

Additionally, just because a useful algorithm is developed to parse a speech signal into symbols, that doesn't mean that's the way humans do it. Deep Blue might seem to play chess like a human, but it doesn't really play chess like a human, does it?

Re:Language Acquisition... (1)

kklein (900361) | more than 8 years ago | (#14592607)

Human language acquisition, not machine language acquisition.

Also, as another linguist, let me add the part the parent forgot: In the literature, the term "language acquisition" is usually distinct from "language learning." Language acquisition is usually the term used to describe the process by which children acquire their "native" language(s). It appears to be very different from language learning, which is what you do if you start studying a foreign language after you have left the so-called "critical period," usually thought to be around 7 years old.

Re:Language Acquisition... (1)

John Muir (912474) | more than 8 years ago | (#14589824)

I'm of the mind that humans have an innate understanding of certain linguistic building blocks, which we then play around with more as we grow up. An inherited pre-existing structure which our minds expect to experience around us and from which all our languages are derived.

The linguistic parallel to the collective unconscious.

If developing artificial speech and hearing with computers takes us closer to this, then I think the results should be extraordinary.

But it's just my two cents obviously!

Re:Language Acquisition... (2, Informative)

Yellow5 (519383) | more than 8 years ago | (#14589844)

I work with speech recognition and to me, your comments sound a little misleading. When "people spend hours and hours and hours transcribing 20 minutes of tape" they usually aren't simply transcribing to text. The time is consumed by transcription of all the additional features in the text (ie. time alignment of words and phonemes, prosody, additional syntactic information such as parsing structure or part of speech tags). This is where all the time is spent. There are, of course, automatic processes for each of these annotations, but some work much better than others. My opinion is that through the next 10 to 15 years, each piece of the speech recognition puzzle will come together to create ASR systems that will be comparable to human transcribers (you only have to be 95% correct to transcribe in a court room).

Re:Language Acquisition... (0)

Anonymous Coward | more than 8 years ago | (#14589891)

I'm a linguist

Maybe you are, but you aren't so cunning at it, are you ?

Re:Language Acquisition... (1)

stapedium (228055) | more than 8 years ago | (#14590304)

I'm not sure if you are aware of it, but several speech databases are avialable to researchers. Some have licences with a yearly fee, but some are free of charge.

Try http://www.cavs.msstate.edu/hse/ies/projects/speec h/databases/ [msstate.edu] for a list of some of these, including the CMU kid speach database.

Re:Language Acquisition... (2, Informative)

GnomeChompsky (950296) | more than 8 years ago | (#14592017)

Yes. I am aware; it's just that there isn't as much data available as there needs to be in order to be able to say with any confidence that, yes, this is what speech to children looks like, and this is what speech spoken by children looks like. Because like it or not, you have to get your grad students transcribing things for hours in order to get anything out of it. You want to research bilingual acquisition? Fine, but you're probably going to have to do years of legwork to get data for even three children learning the same two languages at the same time. Speech recognition would cut down significantly on the amount of time it took to take down utterances on either end. Which would be an enormous plus.

Re:Language Acquisition... (1)

mungojelly (853032) | more than 8 years ago | (#14591829)

All sorts of scientific research is going to get fantastically easier as we approach the Singularity. If you have a really tremendous amount of data available, then instead of having to go out and collect data in order to answer questions that occur to you, all you need to do is extract your query from the records. You might say to your computer (your computer can talk, naturally, since it's in this thread): "How many times had Joanna been exposed to the subjunctive before she made this utterance?" or "When this model of chip has failed in the past, what was the average ambient temperature?"

Scientific progress is going to go "boink."

<3

Re:Language Acquisition... (1)

johnny maelstrom (171040) | more than 8 years ago | (#14592743)

You could establish a speech-recognition@home type of application, in the style of SETI@Home et al. If you create an application that can do some useful, but well defined set of tasks using speech recognition, you could build up a very useful open (as in useable by all) and free (as in beer) data base of everything you just mentioned to aid a linguistic study.

Set the application to do something useful for the user that downloads and installs the application as an incentive and well-define the task such that it addresses a small enough to be finite and study-able set of language. As users use the application they get the system to learn the language (multiple users together would be useful too to be analogue to a child's multiple sources of learning) and as the system learns you build data, which is shared and accrued from all the distributed installations of the application.

How about a Firefox plug-in/extension to do this. Is Firefox a simple enough application to start this with? It could be both fun for most users and practical for those using accessibility functions with their computer.

IBM Speech - Needs Superhuman sales to survive? (5, Interesting)

Anonymous Coward | more than 8 years ago | (#14589259)

On the other hand, IBM is not actually selling much speech technology.

Scansoft, who earlier all but cornered the market for Optical Character Recognition (OCR) technology, did the same with speech recognition by acquiring the largest players in this space, SpeechWorks and Nuance. Scansoft changed their name to Nuance as a part of that last acquisition.

IBM, meanwhile, has been struggling to find a market for their "Superhuman" (sneer) speech reco technology. A few years ago, they sold distribution of their retail desktop product, ViaVoice, to (wait for it) Scansoft. Their commercial product was RS/6000-AIX-only until a couple of years ago, when they ported it to more platforms, including Windows and Linux, and integrated it more tightly with their Rational and WebSphere marketing platforms.

The current enterprise product sounds really sexy, at least for Rational-WebSphere shops. You can develop your WebSphere VXML application in Eclipse and leverage all those groovy WebSphere services you've built. No (or not much) special skill required!

The problem is that their target market is Telecom Managers, who face a choice between IBM, with a few hundred ports installed, and Nuance (-ScanSoft-SpeechWorks), with tens- or hundreds-of-thousands of installed speech reco ports. Telecom Managers live in a world where their clients expect six-sigma/five-nines reliability. This is a hard sell to make.

The question is, how long can IBM keep pouring money into speech R&D and product development in the face of dismal sales? Some in the industry expect the answer is, "Not too much longer." And that. of course, makes nervous enterprise buyers even more nervous and less likely to buy.

It's not about product, its markting IBM Research (0)

Anonymous Coward | more than 8 years ago | (#14589517)

This has nothing to do with voice recognition.

IBM wants money from every business for its patents portfolio, but nobody knows what they've invented in the software business, it just seems like they have a lot of failed products.

So today's PR is Voice Recognition and yesterdays was explaining how Linux memory management works (as though it came from them), and tomorrows will be something different.

And when they lobby for software patents, they will try to look like the good guy, an inventive company and not the biggest technology leach of the lot.

Re:IBM Speech - Needs Superhuman sales to survive? (1)

AndroidCat (229562) | more than 8 years ago | (#14589548)

I want technology that'll run on a cheap single end-user or SOHO box. Too bad there's no money for companies to develop for that.

Re:IBM Speech - Needs Superhuman sales to survive? (1, Informative)

Anonymous Coward | more than 8 years ago | (#14589690)

I want technology that'll run on a cheap single end-user or SOHO box.

As I said, Nuance (Scansoft) bought them all up; not just SpeechWorks and Nuance, but Draggon, Lernout & Haupsie, etc. They still sell a bunch of (Windoze) retail SOHO packages for a hundred bucks or two.

Microsoft has some crappy .NET-based stuff, but I'd give it a pass, if I were you. It's neither SOHO nor enterprise. Not sure what it is...

It's not really soup yet, but there is also a free solution. See http://www.speech.cs.cmu.edu/ [cmu.edu] . At least one commercial vendor has taken the source, hacked it up and is using it in a commercial product. At least it runs on Linux and (I think) *BSDs

- The AC OP

Re:IBM Speech - Needs Superhuman sales to survive? (1)

AndroidCat (229562) | more than 8 years ago | (#14593365)

I thought that the .NET stuff was just a wrapper on top of SAPI 5? Free, but MS hasn't really done anything with it since 1998.

Re:IBM Speech - Needs Superhuman sales to survive? (1)

MrBandersnatch (544818) | more than 8 years ago | (#14590162)

Actually I think the question with speech R&D is can IBM aford NOT to keep pouring money in?

At some point within the next 10-50 years, *someone* is going to develop SR technology that CAN act as a totally natural HCI. The potential profits from this exceed those of MS, possibly on patents alone! At the very least IBM is going to want the patent leverage to be able to take advantage of that technology if they are not the ones who develop it.

I am often suprised that MS/Apple arnt making a significant investment in R&D in this field, because when it happens I can forsee a paradigm shift as large as that between the text interface and the GUI. Possibly enough to totally dispace both from the market...perhaps IBM can see the possibility too!

Re:IBM Speech - Needs Superhuman sales to survive? (1)

LordMyren (15499) | more than 8 years ago | (#14590990)

patents are only useful if they can be leveraged without the use of someone elses patent. in this case, scansoft owns the entire book of speech patents... IBM already even sold them their last couple chapters.

so even if someone does build the complete natural-linguistic speech recognition, it'd be worthless since scansoft (they've chagned names a # of times now) owns a couple of the stages in the stack. you can try to sell it to them or try to buy the rights, but you're just some schmoe with cool technology and they're the guys enjoying selling multi-thousand dollar software packages to enterprises with no alternative....

scansoft, as true capitalists, needs only do everything it can to keep anyone else from innovating. all so they can keep selling rediculously overpriced software to big buisness.

innovation is un-capitalistic; why invest resources and cause disruption when you're making more money selling stuff you can make for free?

Re:IBM Speech - Needs Superhuman sales to survive? (2, Insightful)

Kadin2048 (468275) | more than 8 years ago | (#14592128)

I know nothing about the particular details of this deal, but wouldn't it make sense if IBM's sale of the patents also included a reciprocal agreement, that Scansoft would not sue IBM in the future for use of it's IP?

It just seems like IBM, seemly a company obsessed with creating and preserving intellectual capital, wouldn't so hastily sell off patents that they might ever be able to use / need, unless there was a catch, like they got access to Scansoft's portfolio as part of the bargain?

Just speculation, based on what I've read about how Big Blue operates.

integration (3, Interesting)

caffeinemessiah (918089) | more than 8 years ago | (#14589280)

personally, i can't wait till they take speech recognition and couple it with natural language processing as a standard part of the desktop interface. it should be quite feasible now that we're seeing affordable 64-bit computing with fast memory and bus speeds. imagine excel with a speech-recognition interface, so instead of typing and filling formulae you would just tell it to "sum the row labeled timing, but only include values greater than 10". ok, back to work...

Re:integration (2, Funny)

Bloke down the pub (861787) | more than 8 years ago | (#14589396)

There's enough wittering going on in the office already, thanks.

Of course, it seems you'll have the advantage of not having to tell it to switch to uppercase no i meant put the letters in uppercase not the word quote uppercase quote shift shift er fuck hey Joe what is it for uppercase huh was that caps lock YOU SAID OK THANKS NO DELETE DELETE THAT.

Re:integration (3, Insightful)

cagle_.25 (715952) | more than 8 years ago | (#14589497)

Spot on. Many interfaces today make it difficult to get from user's idea to computer's execution. Because we are much more facile at using spoken language to be precise than we are at using mouse+keyboard to be precise, a "G+AUI" (graphical+audio user interface) should, in principle, be much more powerful than a GUI.

Dragon Naturally Speaking is a baby step in that direction, but it is pretty much limited to single nouns or verbs.

Re:integration (1)

munpfazy (694689) | more than 8 years ago | (#14592361)

Because we are much more facile at using spoken language to be precise than we are at using mouse+keyboard to be precise, a "G+AUI" (graphical+audio user interface) should, in principle, be much more powerful than a GUI.


I'm not convinced that spoken language is more precise than any other form of interface. In fact, I'd suggest just the opposite.

When one wishes to communicate anything with precision, writing it down is likely to lead to far better results. For the really demanding material, diagrams, equations, and structured text generally accomplish the task much more easily than prose. Spoken language, on the other hand, is ideal for transmitting large amounts of imprecise information with little effort. (Well, that and poetry, which is by no means a trivial aspect of language, but one that seems largly unrelated to computer interfaces.)

Re:integration (1)

cagle_.25 (715952) | more than 8 years ago | (#14594369)

This is a reply to both you and the child below.

The key is in the difference between the words "facile" and "precise." You are absolutely right that written language is more precise, and written language with diagrams even more so, than spoken language. The problem is facility. The time it took me to write this and think about my choice of words is about 10x the time it would have taken for me to explain it verbally.

In an interface situation in which the computer provides me with reasonable feedback so that I can judge the correctness of the results, being facile is important because it allows me to keep up with the speed of my thought. The result is that I can be more efficient.

Re:integration (1)

asuffield (111848) | more than 8 years ago | (#14592800)

Because we are much more facile at using spoken language to be precise than we are at using mouse+keyboard to be precise

Wow. Where did you get that idea? Most of the non-engineers I have encountered require an interpreter ('consultant') to translate their spoken words into something which is sufficiently precise to enter into a computer. Anybody who's been involved in the analysis/specification stage of a development project will know what I mean.

They aren't any better at doing it with a keyboard, but they sure can't do it when speaking.

Re:integration (1)

CastrTroy (595695) | more than 8 years ago | (#14591932)

The problem is, even if we get speech recognition, the computer might know which words you are saying, but not what they mean. Assuming the computer understood "sum the row labeled timing, but only include values greater than 10", then your idea of speach recognition would work great. But since computers don't understand that, and it will be a while before they understand arbitrary commands, speech recognition will only be for those who are too lazy to learn how to type.

Re:integration (1)

caffeinemessiah (918089) | more than 8 years ago | (#14591974)

you just bashed the whole field of natural language processing! while its true that computers probably won't be "understanding" words for quite a while (cue the AI discussion), it's quite within the realm of NLP to reasonably accurately tag the parts of speech in your sentence, and then possibly use some heuristic to reason about what was implied. of course, we're talking a very restricted subset of english as usable. you won't be able to say "hey ol' computer boy, howz about.....". simple imperative sentences ("do this...") shouldn't be too hard to tackle.

Re:integration (1)

CastrTroy (595695) | more than 8 years ago | (#14592894)

Computers aren't that good at understanding natural language in order to follow commands. This is why we have programming languages, scripting languages, and macros. If computers were anywhere close to be able to understand natural language, then we would have no need for programming. The only thing i've seen work on a computer as far as voice commands, are using your voice to navigate the menus. I realize the usefullness of these technologies for those without the ability to type or use a mouse, but beyond that, I think that people should just learn to use the tools, and stop trying to create a solution to a problem that isn't there.

In Soviet Russia (0, Redundant)

LameJokesGuy (950298) | more than 8 years ago | (#14589363)

In Soviet Russia, speach recognizes you!

Re:In Soviet Russia (0)

Anonymous Coward | more than 8 years ago | (#14590388)

speach? c'mon, that's piss poor even for here!

Re:In Soviet Russia (1)

yoprst (944706) | more than 8 years ago | (#14590541)

That's what happens because of unhealthy drinking habits.

can it replace court reporters? (3, Interesting)

RussP (247375) | more than 8 years ago | (#14589400)

A few years ago my wife was thinking about studying to become a court reporter. The training is very demanding, and I heard the dropout rate is about 95%, but the pay is good if not great.

In any case, I warned her about the potential for voice recognition technology to render court reporters obsolete. It probably won't happen, but the mere prospect tipped her in the direction of foregoing the opportunity. Was that a mistake?

The same concern applies also to medical transcription.

Re:can it replace court reporters? (2, Informative)

Anonymous Coward | more than 8 years ago | (#14589569)

Being a court reporter, I'd say no. A computer doesn't say "What?" when it doesn't understand the words, and it doesn't tell people not to talk at the same time so that the record's clear. Some courts try video, some try just audio recorders, but so far the results haven't been so good. You need people to operate the machine, people to catalog the recordings, people to transcribe the recordings if necessary. It's just better to have a court reporter there to do all that (and often cheaper).

The problem with the field is that with fewer reporters to meet an increasing demand, the lack of capable court reporters is forcing more electronic recording -- good results or not.

Now, for medical transcription, it's a great product. After about six months of use, the doctor (or anyone that dictates a lot) has gotten the computer trained to his voice and can go at a pretty good clip (150 words per minute or more). But this is one voice and a limited, task-specific vocabulary.

So why is voice input in decline? (3, Interesting)

Animats (122034) | more than 8 years ago | (#14589511)

Several good mainstream voice applications are on the way out. Wildfire [savewildfire.com] is gone. TellMe [tellme.com] is laying off people and no longer promoting their public services. These are good systems; you could get quite a bit done on the phone with them, and they had good speaker independent voice recognition. Yet they're gone, or going.

Try TellMe. Call 1-800-555-TELL. It's a voice portal. Buy movie tickets. Get driving directions. News, weather, stock quotes, and sports. All without looking at the phone. So what's the problem?

Re:So why is voice input in decline? (3, Informative)

mikeylebeau (68519) | more than 8 years ago | (#14589863)

You're mistaken about Tellme laying people off; they are doing quite well and are growing. You're right that the voice portal idea is no longer emphasized, but Tellme's making great money selling voice services to enterprise customers.

Re:So why is voice input in decline? (1)

DigiShaman (671371) | more than 8 years ago | (#14591747)

I'll tell you why.

The problem is rooted in human psychology. For example, when I'm ready to compose my thoughts and ideas to written format, I don't want to be talking aloud in thin-air. I find the prospect of eavesdroppers to be unnerving. Flat out, it makes me feel insecure. As such, I like to keep my thoughts and ideas private on paper via typing in an office environment or out in the public. When I'm ready to be heard, I will send the text document via e-mail or printed format. If asked, I will hold a public speech.

Under no circumstances do I want people questioning or having thoughts about my rough drafts as I'm speaking them out allowed.

Actually... (2, Informative)

ijablokov (225328) | more than 8 years ago | (#14589529)

...the point of our multimodal work is that you can have a two way dialog with the device, as well as have visual feedback to the interaction. See http://ibm.com/pvc/multimodal [ibm.com] for some examples.

Don't question my intelligence, it's fake. (1, Interesting)

Anonymous Coward | more than 8 years ago | (#14589579)

My name is Dr. Sbaitso. I am here to help you. Say whatever is in your mind freely, our conversation will be kept in strict confidence. Memory contents will be wiped off after you leave. So, tell me about your problems.

Re:Don't question my intelligence, it's fake. (1)

virtcert (512973) | more than 8 years ago | (#14600469)

Wowwww... flashbacks...

Unrecognizable grunts (2, Funny)

renrutal (872592) | more than 8 years ago | (#14589592)

You know this technology will be a big hit in the porn industry when the big man of the area says

"There has to be a good reason to use speech, maybe your hands are full"

Now, what if the mouth is full too? Ventriloquism?

Re:Unrecognizable grunts (1)

blue trane (110704) | more than 8 years ago | (#14590137)

mindreading!

Screw speech recognition (2, Interesting)

Anonymous Coward | more than 8 years ago | (#14589617)

One great thing about keyboards and typing is that it's relatively private. Like phone menus. I hate when they ask me to speak my choice or answer a question or recite my account number just let me freakin type.

Babblin' all over the place is dumb.

Instead of speech recognition let's work on better speech synthesis. Here we are in 2006 and the average synthesized voice sounds hardly better than my freakin' Phasor card I had for my Apple // in 1988.

Re:Screw speech recognition (1)

justsomebody (525308) | more than 8 years ago | (#14590114)

Babblin' all over the place is dumb.

On the other hand, it is a joke killer. Star trek producers would probably sue IBM if this would go mainstream. Nobody will laugh in the scene where Scoty talks to PC mouse anymore. IBM would ruin their best scene ever

Now, if they could make my computer make coffe and a beautifull babe ready to do anything out of nothing,... that would be something. It is something I would be proud to call progress.

Re:Screw speech recognition (1)

Zantetsuken (935350) | more than 8 years ago | (#14591523)

This was one of the biggest things I thought this article would be about, and it would be nice if somebody put some decent work into speech synthesis, the text to speech synthesiser on my PC gaming clan's ventrilo server wouldnt sound so crappy pronouncing stuff - in some cases funny though, like when it tries to pronounce urls...

BTW: ya I know the synthesiser is part of the client, usually uses MS Sam I think...

Doctors are going to use speech recognition (3, Informative)

Aggrajag (716041) | more than 8 years ago | (#14589644)

Doctors in Finland are starting to use speech recognition to update patient records. I think it is in testing at the moment, check the following link for details.

http://www.tietoenator.com/default.asp?path=1;93;1 6080;163;9862 [tietoenator.com]

GUI gets in the way (1)

smart elik (533036) | more than 8 years ago | (#14589700)

XP Pro (W/AT&T voices, Office language Bar W/Word, firefox W/Foxy Voice) and OSX are a bit more polished. But I had a similar voice recognition/TTS setup in 1993. And what I concluded was that it is far simpler to interact physically (double click) with a GUI than to tell the computer to double click. What is needed is a different type of interface for speech to take off mainstream. However it is sad that Windows will not read dialog boxes. And that's a pretty obvious useful feature that the Mac has enjoyed since system 7 or 8! Windows is the norm with the largest desktop penetration. And the norm blows in this case. Just like the m-i-c-r-o-s-oft s-a-m voice. There were better TTS voices available to System 6.0.8! Ouch! This is one of the few areas that the Mac people truly and justly get to laugh at the PC.

I see the future (1)

dangitman (862676) | more than 8 years ago | (#14589992)

It's Daleks all the way down.

It's not the tech, it's the applications once more (2, Insightful)

redzebra (238754) | more than 8 years ago | (#14590489)

I'm convinced speech technologies have a fantastic future when they are used for improving human communications like providing for an electronic bablefish. However it looks like most are concentrating on using speech as a way to interact with machines.

Which is so terribly ineffient and cumbersome. You really don't want to spend the time to socially interact with your coffeemachine at 7am.
Unless it's able to go to the shop, put in exactly the right amount of coffee and is able to turn itself to on once it hears you stumbling out of bed. It's next to useless if the only added value is to switch itself to on after you grunted "on" to it.

more of the same (1)

wwmedia (950346) | more than 8 years ago | (#14590587)

they have been promising good speech recongition software for years! im still waiting...

Audio search & instant report (1)

marcosdumay (620877) | more than 8 years ago | (#14590895)

Good speech recognition would be great for searching audio. We could index webcastings, not only text. It would also be great for reporting meetings and conferences.

Re:Audio search & instant report (1)

mikeylebeau (68519) | more than 8 years ago | (#14591614)

Marcos, check out www.podzinger.com. It does just that sort of audio search for podcasts, really useful.

patents (1)

LordMyren (15499) | more than 8 years ago | (#14590960)

the company keeps changing, but what was once scansoft (dragon dictate) had a bunch of really big patents. its my understanding taht they did what any true capitalist should do once they gain complete monolopy over something; they sat on it and milked the big fat tit they'd engineered themselves. and thats what they're doing today. just think of the god damned margins on something like that...

and tahts why speech recognition 2006 is the exact same as speech recognition 1997.

FUCK YOU CAPITALISM. FUCK YOU.

Re:patents (1)

Grimboy (948054) | more than 8 years ago | (#14591468)

Heh, like those corperate buyouts that are only to prevent compotition.

Re:patents (0)

Anonymous Coward | more than 8 years ago | (#14592190)

What does capitalism have to do with patents? The two are completely independent.

Re:patents (0)

Anonymous Coward | more than 8 years ago | (#14593888)

Uh, you ARE aware that patents and copyrights are decidedly anti-capitalist systems, right? They are extremely socialistic, almost to the point of being communistic. We tollerate them because we assume that no one would create and distribute art and invention unless they were given a government sponsored monopoly over them.

Under a proper market-driven system, there simply wouldn't be anything like intellectual property. Anyone would be able to use technological achievements and distribute art as they wished. The assumed problem with this, however, is that technological progress would slow to a crawl as no one would be willing to invest the money and resources into improving systems out of fear that a competitor will take said innovation and use it to run them into the ground. The best, the argument goes, we could hope for would be that every company would have a huge list of trade secrets, leading to lots of reinventing the wheel and wasted resources.

Artificial Intelligence (1)

caller9 (764851) | more than 8 years ago | (#14591492)

Bring on the system that learns language in simlar way that a human does...of course it would come out of the box with a reasonable starting point. Then the ultimate backend would be a HAL-like system (2010 not 2001), hopefully not a skynet-like, borg, VGER, or the trapper keeper from southpark. VGER wouldn't be too bad once it knew about carbon based infestations.

Anyone know of a project to simulate human life starting at a fertilized egg? That would be sweet once we understood all of the chemical processes that govern cell growth etc, couldn't that be simulated? In a crude way, just create a detailed physics simulation and put the right virtual ingredients in the right places. Grow it, teach it, then lock that sucker up in a space ship and point it toward the closest known rock/ice planet in hybernation mode with a decent stock of terraforming DNA and a robot body to do the manual labor on arrival and teach the babies. Bam! instant SCI-FI novel. Probably already written though.

When your hands are full.. (0, Troll)

4D6963 (933028) | more than 8 years ago | (#14591529)

"There has to be a good reason to use speech, maybe you're hands are full [like in the case of driving a car]"

When they said "maybe you're hands are full" (btw, noticed the you're/your typo?) I admitt that the first example that went through my mind wasn't the case of driving a car.

Many people out here must know how it can be unconvenient to type with one hand, mostly when it's the left hand, and as for the car example, what would you need speech recognition for anyways, doing word processing while driving, or driving while you have both of your arms broken?

Speech recognition is for people who are alone (2, Insightful)

renfrow (232180) | more than 8 years ago | (#14592047)

Something that has not been mentioned, because, evidently, no one has actually worked with it, is that it is seriously annoying to work in the proximity of someone USING speech recognition. I worked with a fellow that had speech recognition on his machine who used it for programming. YOU try working on YOUR own code when someone is droning in the background: "for left paren int i equals zero semi-colon i less than mumble mumble delete word delete word ..." ALL DAY LONG! Even with head phones on it sometimes seemed like he was asking a question and I'd remove the head phones and say "What was that?" "Nothing delete word". ARGGHHH. Leave me the heck away from people with speech recognition.

Tom.

Open source speech recognition engines (3, Informative)

mandreiana (591979) | more than 8 years ago | (#14592440)

speech recognition
http://www.speech.cs.cmu.edu/sphinx/ [cmu.edu]

image+speech recognition
http://sourceforge.net/projects/opencvlibrary/ [sourceforge.net]

Desktop voice commands
http://perlbox.sourceforge.net/ [sourceforge.net]

Others
http://www.tldp.org/HOWTO/Speech-Recognition-HOWTO /software.html [tldp.org]
http://www.cavs.msstate.edu/hse/ies/projects/speec h/software/ [msstate.edu]

Do you know about other usable open source speech solutions?

Ambient Noise... (1)

Paraplex (786149) | more than 8 years ago | (#14592863)

...is the *biggest* problem with speech recognision. I used it extensively for a good period of time, but it's not reliable. Someone walks into the room/some music plays. etc. Speech recognision would greatly benefit from either the computer getting an audio & visual input to determine the source, or better yet, adopting the military throat microphones that only pick up vibrations directly from the skin (even whispers)

Hands are full? (1)

Dasch (832632) | more than 8 years ago | (#14593131)

"There has to be a good reason to use speech, maybe you're hands are full..."

"Computer, play video!"
"Hmm, to much talk..."
"Computer, fast forward!"
"Wow, nice!"
"Computer, resume normal play!"
"Mmmm"
"Computer, play that scene again..."

(Girlfriend comes home)

"Computer, stop playback! Stop! Shut down!"

Talking to 1 or 2 year olds is .... (1)

chawly (750383) | more than 8 years ago | (#14593668)

Very relaxing.

ughh (1)

zobier (585066) | more than 8 years ago | (#14595751)

Imagine trying to code this way:

I en tee space main open-parenthesis i en tee space a ar gee cee comma cee aitch a ar asterisk space a ar gee vee open-bracket close-bracket close-parenthesis open-curly-bracket...

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?