Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Is Speech Recognition Finally 'Good Enough'?

Zonk posted more than 7 years ago | from the why-do-salty-snacks-keep-coming-up-freedom-fries dept.

313

jcatcw writes "Speech recognition software is fast, but it still may not be accurate enough. Clerical jobs usually ask for 40 wpm, but speech recognition software can keep up with someone speaking at 160 wpm. In Lamont Wood's demo it did very well at too/two/to and which/witch, but will it still render 'I really admire your analysis' as "I really admire urinalysis'? At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing. Those who type at hunt-and-peck speeds will experience results that are even more dramatic. There's really only one product on the US market: Dragon NaturallySpeaking from Nuance Communications. The free versions from Microsoft aren't up to the task and IBM sold ViaVoice to Nuance, where it's treated as an entry-level product."

cancel ×

313 comments

Sorry! There are no comments related to the filter you selected.

Hmmm.... (5, Funny)

DoofusOfDeath (636671) | more than 7 years ago | (#19184747)

Is Speech Recognition Finally 'Good Enough'?

Is spinachry ignition rivaly gooery stuff? What the hell are you talking about?

Re:Hmmm.... (1, Redundant)

ThunkDifferent.com (1095229) | more than 7 years ago | (#19184819)

i think speech recognition IS good enough for a lot of things. i'm not sure for what yet, but HAL 9000 was way ahead of its time in the movie 2001, to that means that were only a few.. hmm... years over that, i'm sure it is good enough for deep space explorations by now.

Re:Hmmm.... (2, Funny)

creimer (824291) | more than 7 years ago | (#19184977)

The funny thing is why haven't Microsoft mastered this technology yet? You would think with the billions of dollars they spend on R&D that they could up with better speech recognition. And funky AIs shouldn't be too far behind.

Re:Hmmm.... (1)

Rei (128717) | more than 7 years ago | (#19185239)

Getting that last 5% -- like the "your analysis/urinalysis" issue -- is doable. There's a translation technology that I read about a while which should be applicable to voice recognition. It's a technique to figure out how to properly translate words with multiple meanings. You build up a database of a great amount of writings of all kinds and compile statistical information about word associations from it. So, for our example case, it would find that "admire" comes before "analysis" and "your" a lot more often then it comes before "urinalysis", so it would choose "your analysis". I think that the technique was to check eight words around the word in question (both directions)

Re:Hmmm.... (3, Funny)

value_added (719364) | more than 7 years ago | (#19184857)

What the hell are you talking about?

Maybe he meant speech wreck ignition?

Can your computer... (1)

onkelonkel (560274) | more than 7 years ago | (#19184945)

wreck a nice beach??

Ted "Chug-a-lug" Kennedy (-1, Flamebait)

Anonymous Coward | more than 7 years ago | (#19184875)

Why do you folks in Massachussets keep voting for this guy? Is there really that much of a shortage of murdering blowhards in your state that could justify such action? Or is it simply voter apathy? Really, if he wasn't a Kennedy he would still be in jail instead of skating free on a manslaughter charge. Poor Mary Jo. When will she receive her justice?

Re:Ted "Chug-a-lug" Kennedy (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#19185147)

Perhaps the person who modded me "-1 Flamebait" would care to provide a coherent answer to my question instead of acting in the typical liberal fashion of silencing all opposing ideas? Did Ted Kennedy not commit manslaughter? How does someone who commits manslaughter rise to the level of a senator of the United States? Does Mary Jo Kopechne and her family not deserve justice? http://en.wikipedia.org/wiki/Mary_Jo_Kopechne [wikipedia.org]

Oh well. I guess letting the woman you were going to fuck behind your pregnant wife's back drown in a sinking car that you drove into the water while in a drunken stupor is child's play compared to blowing your accountant's head off. RIP Vince Foster.

Re:Ted "Chug-a-lug" Kennedy (-1, Troll)

Anonymous Coward | more than 7 years ago | (#19185327)

No answer is required, fuckhead. This post is flamebait, too.

"New Directions" (5, Funny)

parvenu74 (310712) | more than 7 years ago | (#19184881)

I used to work for a company that has the words "new directions" in their name. When I told people where I worked I would make a rather long pause between the "new" and "directions" so as not to sound like I was saying something else. I wonder how this software would render it...

Re:"New Directions" (3, Funny)

sd_diamond (839492) | more than 7 years ago | (#19185169)

I used to work for a company that has the words "new directions" in their name.

Please tell me the first two words in the name weren't "Coming From".

Re:"New Directions" (-1, Troll)

Anonymous Coward | more than 7 years ago | (#19185287)

I used to work for a company that has the words "new directions" in their name.

Please tell me the first two words in the name weren't "Coming From".
That joke is so juvenile I want to gag...

Re:"New Directions" (4, Funny)

TrippTDF (513419) | more than 7 years ago | (#19185341)

Reminds me of when the company "Pen Island" or "Mole Station Nursery" set up their domain names...

Re:"New Directions" (3, Funny)

Anonymous Coward | more than 7 years ago | (#19185423)

And let's not forget the Italian energy company Powergen Italia... their name makes for a wonderful .com address!

Re:"New Directions" (4, Funny)

houghi (78078) | more than 7 years ago | (#19185385)

You though you had problems with "new directions"
Can you imagine telling the software to go to this site?
haatch tee tee pee double point slash slash slash dot dot org.
http:///..org not found

Re:"New Directions" (0)

Anonymous Coward | more than 7 years ago | (#19185415)

HA! I finally get it.

Lame.

Re:Hmmm.... (1)

inviolet (797804) | more than 7 years ago | (#19184937)

Is Speech Recognition Finally 'Good Enough'?

Funny, when I dictated this sentence to my computer today, it came out "Is Slashdot's Shameless Plug Recognition Finally 'Good Enough'?"

Today somebody at Dragon got moved to a corner office.

Re:Hmmm.... (3, Insightful)

bearinboots (743355) | more than 7 years ago | (#19185305)

Dragon is no more... and hasn't been for a long time.

NaturallySpeaking has been sold a few times to various companies.

(I keep track because I worked on V1.0)

Re:Hmmm.... (2, Insightful)

Mahjub Sa'aden (1100387) | more than 7 years ago | (#19185127)

I'll be honest with you, Vista is way better at coming up with hilarious new Madlibs than you are.

Certain industries already make heavy use of this (1)

artemis67 (93453) | more than 7 years ago | (#19185187)

Several years ago, I saw a court reporter using a speech recognition system with his laptop. The microphone actually looked like some sort of breathing apparatus, as it fit snugly over his mouth and nose with the wires in a tube running down to the laptop.

Problems (5, Insightful)

Tribbin (565963) | more than 7 years ago | (#19184773)

As a foreigner it is really hard to get the pronounciation right enough.

Also command execution by others in the room is a problem.

How about listening to music, or TV, and having the computer interpreting it.

Re:Problems (4, Informative)

Sciros (986030) | more than 7 years ago | (#19184887)

It all depends what sort of corpus the SR system is trained on. So yeah, foreigners will have problems because a system trained for, say, British English will not perform well with American English. For this same reason an SR system trained for "normal" speech will do very poorly with lyrics in music.

As for stuff like "i really admire your analysis" being interpreted as "i really admire urinalysis," that stuff can easily be ironed out by an n-gram based system that "ranks" English sentences based on probability. What is the chance that "urinalysis" will follow "your" which follows "admire"? Such things can be estimated well enough if you use a large corpus to train your n-gram system (as long as the corpus you're using for this is the same "kind" as whatever speech the SR system is interpreting -- that is, newswire, business meeting, etc.)

Re:Problems (1)

Sciros (986030) | more than 7 years ago | (#19185473)

By the way, what I described is referred to as the "Language Model" component of a natural language processing system. I'm sure Nuance uses one, so whatever errors it makes are probably from a result of data sparseness during training.

Re:Problems (1)

lawpoop (604919) | more than 7 years ago | (#19184961)

Also command execution by others in the room is a problem.

How about listening to music, or TV, and having the computer interpreting it.
I think a noise canceling microphone would take care of those problems.

Re:Problems (1)

Drooling Iguana (61479) | more than 7 years ago | (#19185177)

Wouldn't a noise canceling microphone filter out pretty much all current music ant TV automatically?

This comment written by MS speech recognition (4, Funny)

TodMinuit (1026042) | more than 7 years ago | (#19184783)

Dear aunt, let's set so double the killer delete select all.

Re:This comment written by MS speech recognition (1)

k1980pc (942645) | more than 7 years ago | (#19184959)

This bug is reportedly fixed : http://blogs.msdn.com/larryosterman/archive/2006/0 7/31/684327.aspx [msdn.com]

I play with speech recognition on my mac and it is pretty cool...but cannot say productive...possibly because I am not a native english speaker..

Love to impress my mates with the knock-knock jokes feature in mac speech recognition.. :)

It works! (0)

Anonymous Coward | more than 7 years ago | (#19184787)

I'm using it now so double delete the killer select all.

Sure (2, Funny)

springbox (853816) | more than 7 years ago | (#19184789)

In fact, I'm using it to write this Dear aunt, let's set so double the killer delete select all

Depends on what you use it for (3, Insightful)

orclevegam (940336) | more than 7 years ago | (#19184791)

Is Speech Recognition Finally 'Good Enough'?

For typing up an inter-office memo in Word, most likely. But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.

Maybe the question should be... (5, Insightful)

Mahjub Sa'aden (1100387) | more than 7 years ago | (#19185079)

Instead of asking if speech recognition is "good enough", maybe we should be asking whether or not it's actually useful for anything in the first place. I mean, is it good enough... to do what?

Can you imagine being in a cubicle farm full of people talking to their computers? Or trying to talk to your computer on the bus? You have to imagine that as computers become more ubiquitous, input methods will have to adjust alongside, and I simply can't see (or hear) speech recognition doing that very well.

Re:Maybe the question should be... (1)

EvanED (569694) | more than 7 years ago | (#19185343)

I would like a good speech recognition program. I've been meaning to give Dragon a try at some point, but I'd need a mic too (unless it comes with one which it might...). I do enough writing and stuff that it could come in handy to reduce wrist strain.

A lot is coding, but I could still be speaking this /. post to the computer instead of typing it.

Re:Maybe the question should be... (1)

Mahjub Sa'aden (1100387) | more than 7 years ago | (#19185377)

My boss uses Dragon Natural Speaking in his office. It's quite a nice product once fully trained; out of the box it's pretty spotty. It's also quite a resource hog, but that's pretty much to be expected with that sort of software.

My point is this, however. While it may be fine for my boss, a touch typer and not much of a speller, in his office, alone, it's not much use in a public or semi-public space. I'm not much of a visionary, but it seems pretty obvious that sooner or later, computers will be everywhere and we'll have to be inputting stuff everywhere as well. I'm not sure how speech recognition will scale in those cases.

Not to mention that at this point, as far as I know (and feel free to correct me on this), speech recognition is not good enough out of the box to recognise all sorts of voices. Not everyone has clear natural diction.

Re:Depends on what you use it for (2, Insightful)

GustoGaiden (1080739) | more than 7 years ago | (#19185153)

programming with voice recognition just seems stupid to me. The idea behind voice recognition is to make it easier to write natural speech, such as email, or an essay, or anything else that follows normal speech patterns. Programming is writing so a computer can understand what you want it to do. It involves TONS of punctuation, oddly named keywords and variables (var, int, _InitBlockPosX). Hell, I can barely read my code aloud to someone else without confusing MYSELF, much less confusing the other human. Case in point, if you're trying to use your voice recognition software to write code, you using the wrong tool for the wrong job.

Re:Depends on what you use it for (1)

parvenu74 (310712) | more than 7 years ago | (#19185157)

For typing up an inter-office memo in Word, most likely. But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.
Probably because computer languages aren't designed for dictation. It would be interesting, however, if a language were designed for spoken programming rather than typing. What would that look like -- errr, sound like? Code-reviews might get a little wacky though (I'm hearing voices in the computer!).

Re:Depends on what you use it for (0)

Anonymous Coward | more than 7 years ago | (#19185361)

Oh yeah? How about Lisp?

Re:Depends on what you use it for (1)

Movi (1005625) | more than 7 years ago | (#19185477)

> Probably because computer languages aren't designed for dictation. It would be interesting, however, if a language were designed for spoken programming rather than typing. Like Applescript? http://www.apple.com/macosx/features/applescript/ [apple.com]

Re:Depends on what you use it for (1)

Richard McBeef (1092673) | more than 7 years ago | (#19185271)

But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.

Why would you even think of using it for that? That's completely retarded. Will it ever be faster to say 'if open paren x equals equals y close paren' than to type 'if (x==y)'? The answer is return apostrophe no comma it will not period apostrophe semi colon.

Re:Depends on what you use it for (1)

QRDeNameland (873957) | more than 7 years ago | (#19185469)

Imagine someone with a lisp coding LISP via speech recognition...

"cwothe pawenthethith, cwothe pawenthethith, cwothe pawenthethith, cwothe pawenthethith, cwothe pawenthethith, ...."

(My apologies for any insensitivity to those with speech impediments.)

Re:Depends on what you use it for (1)

Poromenos1 (830658) | more than 7 years ago | (#19185395)

Why would you want to? I spend more time thinking about it than typing it anyway. It's not like speech, where you don't think about the words. I'm sure I'd hate being like "def getstr... no, getvaria... erm, gettype".

No. (5, Funny)

Caspian (99221) | more than 7 years ago | (#19184795)

Speech recognition, handwriting recognition, species recognition... all of these suck, and will CONTINUE to suck, until strong AI is developed.

And by that time, there will be a lot more important problems to worry about than making a computer understand Bubba Sixpack who can't type-- such as keeping the robots from taking over the planet in a bloody war.

Of course it's good enough (5, Funny)

ral315 (741081) | more than 7 years ago | (#19184797)

I use it myself. It's wonder full. delete that. delete that. delete that. double the killer delete select all

Re:Of course it's good enough (1, Redundant)

Morky (577776) | more than 7 years ago | (#19184995)

In Soviet Russia, double the killer select all deletes you!

Voice recognition sucks. (1)

grub (11606) | more than 7 years ago | (#19184803)


Try it sometime.

right slash ass turd is mane dot see this will print hello oh whirl ass trick slash print f open parenthesis quote hell oh whirl backslash end close parent he says semi clothed close curly


Not Useful for Coders (1)

Hoi Polloi (522990) | more than 7 years ago | (#19184815)

"Set v underscore tab equals space parenthesis parenthesis x minus lev schema dot all recs concatenate..."

Re:Not Useful for Coders (4, Funny)

Tackhead (54550) | more than 7 years ago | (#19185019)

> "Set v underscore tab equals space parenthesis parenthesis x minus lev schema dot all recs concatenate..."

Yeah, but if you put a beat to it, you've got something.

{ } . ! /
& ; ^ # -
< > @ \
{ } _ SYSTEM HALTED

"Left titty, right titty, dot bang slash.
Ampersand semicolon, caret pound dash.
Less than greater than, at back slash,
left titty, right titty, under score crash!"

* # ! ! (
~ & | )
' " . . DEL
# ^G ! ! working... done.

"Star pound bang bang, open-paren.
Tilde and pipe, close-paren.
One quote, two quote, dot dot delete,
pound bell, bang bang, process complete!"

Google's USENET archive dates it back to 1990, but it predates the 1990 post ("Stuck Shift Key Poetry") to rec.humor.funny by several years.

You haven't lived until you've seen a dozen drunken geeks trying to sing "Waka Waka", or the entirety of "Hatless Atlas", while seeing only one character at a time. Well, maybe you have, but this is Slashdot.

I'd say so.... (1)

zappepcs (820751) | more than 7 years ago | (#19184831)

With some of the stuff that I see on the Internet (websites and blogs etc.) I'd have to say that the urinalysis gaff isn't really all that bad.

The only place that speech recognition really annoys me is phone answering systems. They are not competent enough to let you concatenate menu item options and make an intelligent choice as to which phone queue to put you in. For example:

"I have trouble with my cable modem dropping packets" is a statement that 'SHOULD' get you put through to the second tier support line... but no, you have to go through 3 or more menu choices and still only get to talk to the scripted low wage 1st tier support.

Re:I'd say so.... (2, Insightful)

RingDev (879105) | more than 7 years ago | (#19185007)

To be fair, that's a problem with the IVR coder, not the voice recognition engine.

-Rick

Welcome to the new AT&T! (3, Funny)

poptones (653660) | more than 7 years ago | (#19185017)

Press or say one to speak with a representative in english...

One

When you hear the option you are calling about you may say it at any time. If you are calling about a billing problem, say billing. If you are calling about a technical issue, say technical. If you are calling about new service, say new customer. If you are...

Billing

I'm sorry, that is not an option. When you hear the option you are calling about you may say it at any time. If you are calling about a billing problem, say billing. If you are calling about a technical issue, say technical. If you are calling about new...

Billing!

I'm sorry, that is not an option. When you hear the option...

Billing billing billing!

I'm sorry, that is not an option. When you...

Fuck you! Give me a human! Human human human!

I'm sorry, that is not an option. When you hear the option...

Re:Welcome to the new AT&T! (1)

Mattintosh (758112) | more than 7 years ago | (#19185099)

Unless I call a number where I expect an automated system, the first thing I do is press and hold the 0 button for about 10 seconds.

I'm usually talking to a real person within a minute or so.

Until (1)

geekoid (135745) | more than 7 years ago | (#19185449)

sonme jackass tells non tech people to sue it to get tier 2 help.

Probably the same jackass that told people about the Internet.

speech for programmers (1)

VirexEye (572399) | more than 7 years ago | (#19184839)

For those of us with serious RSI and who program/sys admin for a living, are there any serious attempts at voice recognition out there? Specifically, have there been any breakthroughs with speech -> symbol names or obscure shell commands?

Re:speech for programmers (0)

Anonymous Coward | more than 7 years ago | (#19185061)

Good enough for what? (4, Insightful)

traindirector (1001483) | more than 7 years ago | (#19184841)

TFA mentions that many people stop using speech recognition software because of poor accuracy. I don't think that's the major reason. I think they start using it because it's a neat idea that seems to have a lot of promise, but quickly realize there are only a few situations where it's actually helpful. The end of the article mentions rough drafts; I'd also say it might be a decent choice

  • when you need to enter hand-written documents into a computer
  • for transcripts of a single speaker
  • informal free-thought when not surrounded by other people
  • when you have horrible typing skills

For the majority of office tasks, it just isn't a good fit.

So if the "good enough" is being useful in any way whatsoever, it sounds like we're almost there.

Re:Good enough for what? (3, Insightful)

L. VeGas (580015) | more than 7 years ago | (#19184909)

These are some good points. I don't know what I would use speech recognition for, and I'm someone that writes a lot.

Seeing words laid out as text helps me think. I can compose things better, more coherently.

I'll write an email in an instant, but make me leave a voice mail, and I'll usually hang up first.

Re:Good enough for what? (2, Insightful)

RingDev (879105) | more than 7 years ago | (#19185039)

I would love it for a graphics editor. Being able to swap tools, zoom, bring up pallets, etc... with out having to go digging through menus or trying to remember hot keys. I think VR in desktop software has a place, but it is in augmentation, a fringe benefit, not the core functionality.

-Rick

Mod parent up! (3, Insightful)

Doctor Memory (6336) | more than 7 years ago | (#19185243)

Seriously, the only things speech recognition is good for are bulk text entry and simple navigation. I imagine trying to use voice commands to operate modern software would be similar to letting my four-year-old help make pancakes — yes, it gets done, but it's so much easier and faster to just do it yourself. Imagine trying to edit a document using just voice commands. Is your WP going to be smart enough you can tell it "find all occurrences of 'scum-sucking bottom feeders' and replace it with 'esteemed colleagues'". Or are you going to have to say "Find. Scum hyphen sucking bottom feeders. Tab. Esteemed colleagues. Replace all." Face it, GUIs have rendered speech recognition for command and navigation moot. Most operations you perform don't have a verbal description, or at least not one that is quicker to say than to do.

I also can't imagine it'd be that useful for actually writing things. I don't think I'm the only one who revises as they write. I think I actually write better when I write things out by hand, because it's slower so I tend to think my phrasing and sentence structure through more before I commit anything to paper. If I could suddenly type two or three times faster, I think it'd probably make my text even more incomprehensible than it usually is...

Re:Good enough for what? (2, Interesting)

QRDeNameland (873957) | more than 7 years ago | (#19185273)

Excellent points. One only need consider how much computer usage is done in cubicle farms, and then picture everyone chattering "Scratch that!" at their workstation, and the utility of speech recognition as a primary form of input becomes very limited regardless of its accuracy. I have a copy of Dragon, and its accuracy is really quite impressive, but past the novelty I have almost never used it. Other than the fact that it requires virtual silence (aside from your voice) to operate, unless I already know *exactly* what I want to say, it is easier for me to compose text by keyboard and construct my wording as I go along. The only time I could see it being of much use is for dictating a handwritten or badly printed document where OCR wouldn't work.

Depends on what for... (1)

Actually, I do RTFA (1058596) | more than 7 years ago | (#19184855)

I remember using M$'s speech recognition engine (the version that comes with Office 2k3) to prototype a training program. It was designed to teach radio protocol. And actually, it worked very well. It helped that we had a very limited vocabulary, and even more constricted sentence construction.

oblig. (0)

Anonymous Coward | more than 7 years ago | (#19184859)

O'RLY?

Is it really faster, once you factor in checking? (0)

Anonymous Coward | more than 7 years ago | (#19184869)

I type pretty fast: somewhere around 60 WPM. I do tend to mistype, lowering my speed, but at the same time when I mistype I know I mistype: I can "feel" that my fingers are not moving as they should. With speech recognition, you'll have to be looking at the screen to find mistypes, and then you'll have to do something to retype them, but it'll probably take a while. And because of the lag, people will tend to talk slower so that it can "keep up" and they can prevent the words on the screen from getting too out of sync with their train of thought.

Speech Recognition: It's probably good enough for an IM conversation, but a copyeditor's nightmare.

Speech recognition IS good enough (4, Informative)

rinkjustice (24156) | more than 7 years ago | (#19184885)

I'm using Dragon NaturallySpeaking. Right now, as I write this calm it, comet, post, and it sure as hacking beats typing.

Actually, I am using Dragon NaturallySpeaking right now, and it works very well. It actually works better if you speak quickly (as you normally would) and it's pretty good at inserting grammar along the way. I have bilateral tendinitis, and the software has been a godsend for me. I was even able to finish writing my book, a task that was becoming just too painful typing manually.

Oh, and you are probably wondering how long it takes to train the software? About a half an hour, and I find the accuracy at around 95%.

Re:Speech recognition IS good enough (2, Informative)

Sciros (986030) | more than 7 years ago | (#19184943)

Yeah, Nuance makes good stuff. Well, they've bought up everyone worth anything afaik, so I guess it's only to be expected.

Re:Speech recognition IS good enough (1)

ddhuyvet (718443) | more than 7 years ago | (#19185427)

Coincidently Monday the trail against Lernout & Hauspie [wikipedia.org] begins. I don't know if they are known outside of Belgium, but in the late nineties they gave Flanders (Dutch speaking North of Belgium) the dream it could have a leading role in peach technology. L&H even formed the centre of a "Flanders Language Valley".

Unfortunately L&H made some wrong investments and became the centre of a major financial scandal after Robert Smithson of the Wall Street Journal discovered fictitious transactions in Korea and shady accounting techniques. As a result L&H went bankrupt in 2001. It's around this scandal that a court case starts this Monday. It's big news here in Belgium, as a lot of people invested money in L&H and are hoping to get some of it back.

I was wondering if L&H where actually on the right track, Jo Lernout today still believes in the technology. I was thinking he was wrong, but this news item might prove him right.

It was actually L&H that bought the then faltering in Dragon Systems in 2000. L&H was after their bankruptcy bought by ScanSoft (for very little money). ScanSoft bought Nuance Communications and changed it's name to Nuance. And now they seem to be getting successful with the NaturallySpeaking software, so it probably was a good acquisition by L&H back then. And ScanSoft (now Nuance) was in turn smart in buying them up.

Re:Speech recognition IS good enough (1)

DragonWriter (970822) | more than 7 years ago | (#19185053)

Actually, I am using Dragon NaturallySpeaking right now, and it works very well. It actually works better if you speak quickly (as you normally would) and it's pretty good at inserting grammar along the way.


What does "inserting grammar" mean?

Re:Speech recognition IS good enough (1)

rinkjustice (24156) | more than 7 years ago | (#19185319)

What does "inserting grammar" mean?

It means adding commas and periods as you speak to make the text read more natural.

Its good enough for comercial applications (1)

sentrido (1104183) | more than 7 years ago | (#19184897)

Its good enough for comercial warehouse applications e.g. the vocollect and voxwares of the world

IVR vs VoIP (1)

RingDev (879105) | more than 7 years ago | (#19184901)

I work on IVR systems for clinical research and medical screening (along with a huge variety of other things we make these systems do). And it's pretty good. We do a lot of work massaging the Grammars to make the system more accurate though, and we have a lot of extra logic built in for situations where we can predict values and assign weights to different words. But the one thing that rather annoys me is that I quite often have issues with Skype's quality just being a bit to low for the system to pull off. I use Skype to dial in so I don't have to take my hands off the keyboard/mouse for testing (or deal with the phone in general). I would guess about 1 in 5 questions I have to repeat or wait for a reprompt because of an audible glitch from the VoIP connection.

All in all though, I'm rather impressed with the functionality and accuracy we do have. I'm not sure it will take over in many places though because of the error rate on free-formed text and the volume levels. My old cube-farm was noisy enough with everyone typing, I even can't imagine it with everyone trying to talk to their computers and hoping the noise filters would pick out their voice correctly. I've got a nice closed of office to work in now, so no one has to hear me yell "Invalid selection my ^%#!" at my computer ;)

-Rick

English is stupid! (0, Flamebait)

drinkypoo (153816) | more than 7 years ago | (#19184913)

will it still render 'I really admire your analysis' as "I really admire urinalysis'?

English is the only language I speak and I still think it's stupid. But if you pronounce 'your' correctly it doesn't sound like "yur", which is what the beginning of urinalysis sounds like. 99% of the time the problem is improper pronounciation.

And no, accents are no fucking excuse. I'm sorry you grew up around people who can't pronounce words properly... But you should really learn to pronounce the words correctly so that people outside of your inbred birthplace will understand you.

Once I had a Texan share an anecdote with me about an even sillier-sounding Texan who pronounced "oil wells" as "owl whales". I don't think speech recognition software will figure that out, either.

pronouncing words "properly" (1)

Bearpaw (13080) | more than 7 years ago | (#19185131)

Everybody has an accent. (Ask a linguist.) Basically, it sounds like you just want everybody to have the same accent that you do. Good luck.

Re:pronouncing words "properly" (1)

drinkypoo (153816) | more than 7 years ago | (#19185207)

You can have an accent and still pronounce words in such a way that you can properly distinguish between them. The pronunciations in the dictionary are there for a reason, and until people learn to use them, we will still have problems like this. And on the topic of everyone having my accent, with notable exceptions the people on the West coast of the US speak English closest to the intended pronunciation, and in fact are closer to it than denizens of England, who have actually gone so far as to change some of their common spellings many years ago to differentiate them from the way we spell them here across the pond, and to deny the desire of the individual who named Aluminum (note the absence of additional letters) as to how it should be spelled. So arguably, everyone's accent should be closer to mine :)

Re:pronouncing words "properly" (1)

DragonWriter (970822) | more than 7 years ago | (#19185277)

You can have an accent and still pronounce words in such a way that you can properly distinguish between them.


Which words can be properly distinguished by sound alone (rather than context) varies by accent.

The pronunciations in the dictionary are there for a reason, and until people learn to use them, we will still have problems like this.


Except in languages where there is an official prescriptive authority, they exist to document actual usage, and often document several variations which can be ambiguous with other words or combinations of words.

People learning to use them will not change the fact that spoken language, even "proper" spoken language, by any definition, contains ambiguities that cannot be deterministically resolved with 100% accuracy.

did you know (1)

way2trivial (601132) | more than 7 years ago | (#19185323)

a lot of dictionaries have NO pronunciation guide.. they just aren't english dictionaries

that is because, in many languages, a certain order of letters are always pronounced the same way.

Russian is one example..

Language Hat (0)

Anonymous Coward | more than 7 years ago | (#19185137)

I play the language hat card! [wolkenvelden.com]

Re:English is stupid! (0)

Anonymous Coward | more than 7 years ago | (#19185175)

I defy you to define "proper" pronunciation without invoking a bunch of dead people who wrote down their unfounded ideas of perfection in books.

Re:English is stupid! (1)

compro01 (777531) | more than 7 years ago | (#19185333)

it is easier to reprogram computers than it is to reprogram humans.

For that matter, how do you define the "correct pronunciation" of any given word? The King's English? The President's English? The Prime-Minister's English? The MLA's take on it? Your opinion?

opinions are like assholes. everyone has one and a lot of people are assholes about their opinions. you try messing with people's ideas of language and they will tend to hate you.

Pretty good (5, Funny)

Richard McBeef (1092673) | more than 7 years ago | (#19184917)

95 percent is pretty good, only one word in twenty. I wouldn't have a problem with a 5% error ate.

Re:Pretty good (0)

Anonymous Coward | more than 7 years ago | (#19185185)

mod parent up -- most elegant post I've seen in a long time

Re:Pretty good (2, Insightful)

Rei (128717) | more than 7 years ago | (#19185353)

5% could be the difference between "The report confirmed that Iraq has WMDs" and "The report confirmed that Iraq had WMDs." It could be the difference between "Tell Mrs. Smith to take 20mg of neurontin" and "Tell Mrs. Smidt to take 20mg of neurontin." It could be the difference between "The magnet should not be exposed to a field greater than fifteen teslas" and "The magnet should not be exposed to a field greater than fifty teslas." And on, and on.

Small wording changes can make a big difference -- generally much bigger than typos, which I can assure you happen far less often than 5%. Additionally, typos are generally recognizable as the intended word, and often aren't even noticed by the reader.

Re:Pretty good (1)

Derek Pomery (2028) | more than 7 years ago | (#19185413)

That is precisely the problem with Dragon - the algorithm by its very nature will not create typos - it is matching speech against known words. So it is helpless with new vocab (although you can train it) and it makes for devilish subtle typos that take longer to pick out than it would have to run a spell check after a typist finished with their 93% accuracy.

We use it. (2, Interesting)

Organic Brain Damage (863655) | more than 7 years ago | (#19184923)

For command control of a system where we need both hands free. It's pretty good, much better than stopping and typing, clicking or pressing buttons during a repetitive manual process.

We're using an older version of Microsoft's product and it seems the microphone quality is important.

Re:We use it. (1)

cs02rm0 (654673) | more than 7 years ago | (#19185121)

Likewise, for our main product we've integrated Dragon for command and control. It's faultless there, even without training. It's 'good' in general use, but that doesn't really cut it for anyone who can touch type.

Yes and no.. (1)

msimm (580077) | more than 7 years ago | (#19184989)

For some reason even time this topic comes up the focus seems to shift word-processor type use.

What about simpler uses? How many basic tasks in the car require you to take your hands off the steering wheel? I'd like to see the basic functionality of the remote control mirrored in speech recognition. Things like stop/pause/increase/skip.

I'd imagine once this kind of simple recognition became common over-all speech recognition would (more) rapidly evolve.

Re:Yes and no.. (1)

DragonWriter (970822) | more than 7 years ago | (#19185203)

What about simpler uses? How many basic tasks in the car require you to take your hands off the steering wheel?


Zero.

A very few may require taking a hand off the steering wheel, though well designed newer cars tend to solve even that by putting controls on the wheel.

Though to solve the major "hands off the wheel" problem I've seen in other drivers, I'm not sure how voice control would work, anyhow: are you proposing a voice-controlled makeup application system?

No (1)

Threni (635302) | more than 7 years ago | (#19185005)

They use it on TV all the time for subtitles, and practically every sentence has a mistake. It's finally "usable" or "worth taking seriously", but "good enough" implies, to me, that no further improvements are required, and I don't agree with that.

It would be nice.. (1)

Wicko (977078) | more than 7 years ago | (#19185047)

..to see the software discern between two different voices when typing up a document.

The only problem I see here, is people becoming too dependant on the software. Terms like urinalysis might become something we will automatically associate with your analysis, people will get lazier and lazier, as if we aren't enough already.

FailWare (0)

Anonymous Coward | more than 7 years ago | (#19185071)

FailWare. Heh, I just thought of that term. Google only returned 77 hits, so I guess I almost coined it.

Re:FailWare (0)

Anonymous Coward | more than 7 years ago | (#19185425)

Better patent it

Well, if speech recognition gets common... (1)

Kjella (173770) | more than 7 years ago | (#19185173)

...so everyone will talk all the time, half the work population will go postal and the other half will get offices. Also one thing that I notice is that I rarely get everything right the first time, I go back to add a sentence or use copy-paste quite a bit. It's really much easier to do that with your fingers without losing the "verbal" line of thought. And all the applications where it makes much more sense to use the UI than trying to talk your way through commands, voice commands get a bit like the ocmmand line, you have to memorize a lot to use it at a decent pace. That limits its use to a very few select situations for me, not hardly enough to be worth it.

Speech Reco Software Consolidation (4, Informative)

TheGreatDonkey (779189) | more than 7 years ago | (#19185183)

I am presently a financial customer of an enterprise speech recognition product that Nuance offers. For several years now, the speech recognition software industry has been under consolidation, with Nuance buying a few different competitors and technologies. Most recently, this dance has continued with Nuance being acquired by ScanSoft, a company known for specializing in type recognition.

Nuance support is marginal at best, and through all the consolidations, understanding even within their own company of how the product works is quite lacking. We have found our own developers often times educating the Nuance support folks in various aspects of how the product is working, and then inquiring as to whether this is intended behavior or not. Crickets can often be heard finishing these types of conversations. We normally would have moved to another product under these conditions, but simply put - Nuance acquired what little was left, and now has no competition in the market. Competition is what spurs innovation, and so with the continued consolidation, it is hard to see significant advances in the technology without free help from academia.

If you think the Microsoft monopoly is bad, imagine if they absorbed Apple and somehow took over Linux leaving you with a few "choices", but all under the Microsoft moniker. The technology is very neat and the enterprise level products do some basic things quite well, but there is still some glaring room for innovation that I don't expect anytime soon under present industry conditions.

not even close (1)

trwww (545291) | more than 7 years ago | (#19185221)

Judging by this [youtube.com] and this [youtube.com] , I would say its not even close.

Looks like it makes for good jokes.

open source speech recognition (1)

biscon (942763) | more than 7 years ago | (#19185297)

are their any open source speech recognition word checking out? (as a coder I would love to have a library to play around with).

Yeah I could use google, but then you wouldn't have a chance of making the lists of links and get modded +5.

relying on karma whores since '07

The lesson of speech recognition software (1)

macraig (621737) | more than 7 years ago | (#19185311)

The message that people *should* be learning from the less-than-perfect transcription of speech recognition software, such as misunderstanding "I really admire your analysis" as "I really admire urinalysis", is that it's finally time for people to learn to SPEAK as well as write proper English, as opposed to speaking in ebonics or text-speak or some other hard-to-transcribe dialect. "Your" pronounced as "ur" is pretty damned difficult to interpret, without resorting to contextual analysis... which of course is the ONLY reason we humans can still understand each other at all. Does the story of the Tower of Babel ring a bell?

There's no comparison (1)

Trailer Trash (60756) | more than 7 years ago | (#19185329)

This is really apples & oranges. The typist with 93% accuracy will produce a document with some typos, and I can tell you from years of reading /. that typos are easily "corrected" by the reader if the typist doesn't catch them. Even at that, spell checkers catch quite a few of them, too.

That's very different from "your analysis" turning into "urinalysis". Here, the spelling is correct but the words are completely wrong, and trying to figure out what is really meant will take a much longer reading of it.

To answer the question, it's not ready.

Its hard to wreck a nice beach (1)

Intron (870560) | more than 7 years ago | (#19185335)

About 5 years ago some manufacturer announced chips for under $5 that would do speaker-independent, limited vocabulary recognition and I predicted that there would be products appearing all over the place that would get rid of the crappy buttons and use speech as the interface. The only place I see it is in cell phones, and I always turn it off, because I don't want my cell phone surreptitiously calling someone while I am talking ABOUT them. Anyway, why hasn't the toy and gadget market latched onto speech input? It seems like those back massagers ought to be able to understand "Harder, ooh, harder, harder".

too anstwer you question. (3, Funny)

geekoid (135745) | more than 7 years ago | (#19185345)

Yeth.

It is on my BlackBerry (1)

BigCheese (47608) | more than 7 years ago | (#19185375)

I use the speech recognition on my BlackBerry Perl^H^Hrl^H^Harl all the time and it's "good enough".

One example... (1)

NIN1385 (760712) | more than 7 years ago | (#19185383)

This is a very good example of succesful voice recognition:

Google 411 [google.com]

Very intelligent, but isn't everything Google does?

Holding mouse to mouth (1)

fishthegeek (943099) | more than 7 years ago | (#19185451)

and trying out my best Scottish accent... Computer..... Computer......

Has ANYONE gotten this to work on System 6 for the Mac yet?
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>