Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Rest In Peas — the Death of Speech Recognition

Soulskill posted more than 4 years ago | from the yale-in-ox-boom-i-crows-off dept.

Software 342

An anonymous reader writes "Speech recognition accuracy flatlined years ago. It works great for small vocabularies on your cell phone, but basically, computers still can't understand language. Prospects for AI are dimmed, and we seem to need AI for computers to make progress in this area. Time to rewrite the story of the future. From the article: 'The language universe is large, Google's trillion words a mere scrawl on its surface. One estimate puts the number of possible sentences at 10^570. Through constant talking and writing, more of the possibilities of language enter into our possession. But plenty of unanticipated combinations remain, which force speech recognizers into risky guesses. Even where data are lush, picking what's most likely can be a mistake because meaning often pools in a key word or two. Recognition systems, by going with the "best" bet, are prone to interpret the meaning-rich terms as more common but similar-sounding words, draining sense from the sentence.'"

Sorry! There are no comments related to the filter you selected.

Buffalo buffalo (5, Insightful)

Anonymous Coward | more than 4 years ago | (#32077194)

Buffalo buffalo Buffalo buffalo buffalo, buffalo Buffalo buffalo.

Mod parent up (2, Informative)

idiot900 (166952) | more than 4 years ago | (#32077338)

Would that I had mod points today.

The above is a valid English sentence and a poignant example of how difficult it is to parse language without knowledge of semantics.

Focus, Dammit. (1)

Jeremiah Cornelius (137) | more than 4 years ago | (#32077378)

"What, all of us?"

Re:Focus, Dammit. (1)

Philip K Dickhead (906971) | more than 4 years ago | (#32077832)

The sixth sheik's sixth sheep's sick.

[so, say said sentence sextuply...]

Re:Mod parent up (4, Interesting)

x2A (858210) | more than 4 years ago | (#32077448)

There's nothing special about computers though, people have to do that with other people... lets not kid ourselves into thinking that humans are immune to misunderstandings. No, the more you get to know someone, the way they think and express theirselves, the better you can become at communicating with them. Different words to different people have different connotations. It can take a lot of work to get all these down, and it'd be no different with a computer. For effective communication, you'd train and build up a common language with it, that might seem nonsense to outsiders... and I, for one, welcome this.

Re:Mod parent up (2, Insightful)

ground.zero.612 (1563557) | more than 4 years ago | (#32077464)

What about the simple fact that conversation itself is a learning process?

You learn the extent of your audience's comprehension among other things. How can a computer be programmed to recognize everything when we lack a sufficient model to base it on?

There is a point in conversation when a sensible human being will recognize they are not getting their ideas through, and simply give up and say "never mind".

Re:Mod parent up (1, Interesting)

RockoTDF (1042780) | more than 4 years ago | (#32077960)

You raise a good point about learning. A problem with AI researchers is that they are scared of neural networks for reasons that have been solved since the 1980s, and are stuck with expert systems or other "symbol manipulating" programs. The problem with these programs is that they *suck* at learning. I really think that if the AI community looked at neural nets more often they would get closer to figuring this language thing out. With billions and billions of sentences it is hard to create a good system using the aforementioned techniques.

Re:Mod parent up (1)

gyrogeerloose (849181) | more than 4 years ago | (#32077612)

The above is a valid English sentence and a poignant example of how difficult it is to parse language without knowledge of semantics.

Although it's either lacking in punctuation or using non-standard capitalization.

Then again, maybe he's invoking both the large mammal and the eponymous city in New York?

Re:Mod parent up (1, Informative)

Anonymous Coward | more than 4 years ago | (#32077676)

Hence why some of the words are capitalized.

Re:Buffalo buffalo (3, Funny)

Anonymous Coward | more than 4 years ago | (#32077366)

This rest ponds was and turd you sings peach recon nation soft where

Re:Buffalo buffalo (5, Funny)

CecilPL (1258010) | more than 4 years ago | (#32077394)

That comma is just out of place and makes the sentence hard to parse.

Re:Buffalo buffalo (4, Insightful)

liquiddark (719647) | more than 4 years ago | (#32077490)

What human can parse this without an expert to tear apart the context? I don't see the point in trying to serve up a sentence that simply isn't a sentence to most speakers of the language.

Re:Buffalo buffalo (1)

u38cg (607297) | more than 4 years ago | (#32077984)

The point is not that it is a useful sentence, the point is that it is a sentence. What's even more remarkable is that you can add arbitrary repetitions of buffalo to it and still get a grammatical, meaningful sentence. The meaningful is important. Colourful green ideas sleep furiously. That parses grammatically, but it means absolutely nothing.

Badger badgers badger Badger badgers (1)

tepples (727027) | more than 4 years ago | (#32077496)

Buffalo buffalo

Likewise, Badger badgers Badger badgers badger, badger Badger badgers. (UW taxideans harassed by UW taxideans harass other UW taxideans.) Oh, and mushroom mushroom [badgerbadgerbadger.com] .

Re:Badger badgers badger Badger badgers (5, Funny)

Anonymous Coward | more than 4 years ago | (#32077782)

snaaaaaaake!

Re:Buffalo buffalo (1, Interesting)

Anonymous Coward | more than 4 years ago | (#32077558)

Has anyone really been far even as decided to use even go want to do look more like?

Re:Buffalo buffalo (2)

Hylandr (813770) | more than 4 years ago | (#32077728)

Braincells are jumping to their deaths from my ears...

Re:Buffalo buffalo (5, Informative)

hoggoth (414195) | more than 4 years ago | (#32077634)

Buffalo bison whom other Buffalo bison bully, themselves bully Buffalo bison.

Re:Buffalo buffalo (5, Informative)

Anonymous Coward | more than 4 years ago | (#32077648)

For those that don't know:
http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo

  'Buffalo bison whom other Buffalo bison bully, themselves bully Buffalo bison'.

Re:Buffalo buffalo (1)

Hylandr (813770) | more than 4 years ago | (#32077696)

Dood, Dood! Doooood. dood! DOOD!! Dewd...

Re:Buffalo buffalo (1)

blair1q (305137) | more than 4 years ago | (#32077698)

Your marklar is well marklar.

well (0)

Anonymous Coward | more than 4 years ago | (#32077218)

I the method which comes to make probably, them who see the work of the speech recognition software which is honest is suitable and language translation asserts! where

I hope.. (0)

Anonymous Coward | more than 4 years ago | (#32077224)

I certainly hope that TFA title is intentional...

I refuse to partake in a short sleep cycle process while lying in small round vegetables otherwise!

Goodnight, Sir!

Re:I hope.. (1)

Zancarius (414244) | more than 4 years ago | (#32077348)

I certainly hope that TFA title is intentional...

Considering the subject matter, I'd hope readers would be able to detect a play on words when they see one.

Nevertheless, it got your attention, didn't it?

Key words (2, Interesting)

flaming error (1041742) | more than 4 years ago | (#32077240)

> meaning often pools in a key word or two
It's true.

My own hearing is not great. I often miss just a word or two in a sentence. But they are often key words, and missing them leaves the sentence meaningless. If I counted the words I understand correctly I'd probably have a 95% success rate. But if I counted the sentences I understand correctly, I'd be around 80%. So I get by, but I tend to annoy people when I ask for repeats over one missed word.

Re:Key words (4, Funny)

SomeJoel (1061138) | more than 4 years ago | (#32077368)

& It's true.

My own ... is not great. I often miss ... a word or two in a sentence. But they are often ... words, and missing them leaves ... sentence meaningless. If I counted the words I understand ... I'd probably have a 95% success rate. But if I counted the ... I understand correctly, I'd be around ...%. So I get by, but ... tend to annoy people when I ask for ... over one missed word.

I can see how this would be annoying.

Android Speech Recognition Rules (5, Informative)

bit trollent (824666) | more than 4 years ago | (#32077254)

I hardly type anything in to my HTC Incredible. Google's voice recognition, which is enabled on every textbox works just about perfectly.

Seriously, get an Android phone, try out the speech recognition text entry, and then tell me speech recognition is dead.

Re:Android Speech Recognition Rules (5, Funny)

liquidpele (663430) | more than 4 years ago | (#32077506)

I'm sorry Dave, I can't do that.

Re:Android Speech Recognition Rules (1)

DeadDecoy (877617) | more than 4 years ago | (#32077792)

Now the RIAA can use google's services to determine if you're singing a copyrighted song and hunt you down : D.

Re:Android Speech Recognition Rules (2, Interesting)

bertok (226922) | more than 4 years ago | (#32077886)

I hardly type anything in to my HTC Incredible. Google's voice recognition, which is enabled on every textbox works just about perfectly.

Seriously, get an Android phone, try out the speech recognition text entry, and then tell me speech recognition is dead.

I've tried Google voice recognition, but I found that it just detected gibberish unless I spoke with a fake American accent.

Re:Android Speech Recognition Rules (1)

justinlindh (1016121) | more than 4 years ago | (#32077914)

You're right, with the caveat that most people tend to try to speak differently when they know they're speaking to digital transcription. The Android voice input also requires that you actually say the punctuation, as well (i.e. Hello comma Mom period Yes comma a visit would be nice exclamation point). So, unfortunately, even with Google's web powered voice transcription, you're still not speaking naturally.

I'm assuming that Google Voice uses the same technology for their automated transcription. In this case, the person will definitely be speaking naturally. The transcriber is spotty at best in that setting. I can usually get the gist of what's being said without needing to actually listen to the message and I appreciate how it applies different style types for things it thinks it could have gotten wrong (guesses are in a lighter shade of gray)... but it's far from perfect.

Re:Android Speech Recognition Rules (1)

vanyel (28049) | more than 4 years ago | (#32077950)

I haven't tried note taking on my Cliq, but voice dialing is a waste of time

Let me guess (4, Funny)

Zerth (26112) | more than 4 years ago | (#32077258)

That summary was written with speech recognition software?

Re:Let me guess (2, Funny)

MollyB (162595) | more than 4 years ago | (#32077622)

Hesitant grate watts peach wreck ignitions oft where kin dew ferrous?

What are you talk'in about ? (1)

burni2 (1643061) | more than 4 years ago | (#32077270)

Years ago I used viavoice on Warp4, and it had a pretty decend recognitation rate ..

it was even better understanding my needs than I can get Windows7 understand mine by mice commands ..

I miss those times .. when grey was a chique color for OSes

Re:What are you talk'in about ? (1)

CohibaVancouver (864662) | more than 4 years ago | (#32077452)

Years ago I used viavoice on Warp4, and it had a pretty decend [sic] recognitation [sic] rate ..

Did you have to 'train' it to your voice using a script and a series of corrections, or did it have 'natural' speech recognition from the get-go, the way you do when you chat with a cashier at the supermarket?

Re:What are you talk'in about ? (3, Insightful)

bmo (77928) | more than 4 years ago | (#32077872)

People want "human quality" speech recognition.

As if we're ever going to get away from training speech recognition programs when we train listeners every day when we speak. It's just that most people don't look at it as being trained, since we're so used to doing it.

I'm sure you have more trouble understanding someone with a thick Cockney or Scottish accent if you're from the Midwest US. You'd ask that person to repeat a few times, wouldn't you?

To expect speech recognition programs to *not* use training is to expect them to exceed human intelligence. Indeed, it's to expect such programs to be psychic.

--
BMO

Re:What are you talk'in about ? (4, Funny)

corbettw (214229) | more than 4 years ago | (#32077482)

Years ago I used viavoice on Warp4, and it had a pretty decend recognitation rate ..

Looks like whatever you're using now ain't quite as good.

Google Voice isn't Horrible (1)

bobstreo (1320787) | more than 4 years ago | (#32077276)

It's close enough to usually understand. But I'm not sure if it's a computer translation or a bunch of pigeons typing to translate.

AI (5, Insightful)

ShadowRangerRIT (1301549) | more than 4 years ago | (#32077280)

Natural language processing *is* AI. And high accuracy speech recognition requires natural language processing if we expect to have accuracy rates approaching that of a human. Humans hear words partially or incorrectly all the time. We fill in the gaps from context, and we correct if the course of the conversation reveals that the original interpretation is wrong. Expecting computers to do better, when half the time the problem is the speaker, not the listener, means you need it to be able to make the same corrections from limited information on the fly, and after the fact that a human brain makes.

Re:AI (3, Insightful)

ShadowRangerRIT (1301549) | more than 4 years ago | (#32077372)

Just as an example, my father is partially deaf. No hearing in one ear, and less than a quarter of human baseline in the other. But with a hearing aid (which still doesn't get him to full functionality), he gets 95% accuracy or better in regular conversation, and it gets better as the conversation progresses. It's not because the hearing aid is fixing the underlying problem (it can't, since the problem is in the inner ear). But if he knows the general topic, and picks up on 50% of the phonemes, he can fill in the blanks and figure out the gist of the sentence, despite hearing it in bits and pieces. As the conversation progresses, his accuracy improves because he is supplying the prompts; if the responses fall into the set of "expected" responses, filling in the gaps becomes even easier. By contrast, if you change topics abruptly or go off on a tangent, you may need to repeat yourself half a dozen times. Now a computer will have better "hearing", but if it doesn't know the topic before you start, it's going to have the same problem anytime you slur a word, elide a syllable, or clear your throat mid-sentence. People expect to speak to a computer and have it understand, forgetting that people aren't usually expected to interpret a sentence in isolation, with no idea of the topic.

That's Because... (5, Funny)

BJ_Covert_Action (1499847) | more than 4 years ago | (#32077282)

It only flatlined because nobody tried to write speech recognition software in perl*.

*Disclaimer: Poster is not responsible for attempts resulting in unintended AI development and/or end of the world scenarios brought on by such an irresponsible endeavor.

Re:That's Because... (0)

Anonymous Coward | more than 4 years ago | (#32077952)

Pff, it's already done in Python:

import speech.recognition

Well duh. (3, Funny)

bmo (77928) | more than 4 years ago | (#32077284)

Even humans mishear speech.

"'Scuse me while I kiss this guy"

That misheard lyric is so common that there's a book about misheard lyrics with that as the title.

--
BMO

Re:Well duh. (1)

Eudial (590661) | more than 4 years ago | (#32077318)

Eggcorns [wikipedia.org] constitute another great example of how humans get this wrong.

Re:Well duh. (2, Funny)

CityZen (464761) | more than 4 years ago | (#32077406)

"Time flies like an arrow; fruit flies like a banana."

Time flies (1)

tepples (727027) | more than 4 years ago | (#32077628)

"Time flies like an arrow; fruit flies like a banana."

Is a time fly an archer or a DDR player?

Re:Well duh. (5, Funny)

Chris Burke (6130) | more than 4 years ago | (#32077462)

That misheard lyric is so common that there's a book about misheard lyrics with that as the title.

I know! A surprising number of people think Hendrix was talking about kissing the sky, rather than embracing the experimental, counter-culture, and free-love nature of the 60's, simply because they don't like to think of their testosterone-filled hero sucking face with another dude. Like, get over it! "Kiss the sky" doesn't even make any sense unless you're on some kind of mind-altering substance, and there's no way Jimmy would have put something like that in his body!

Re:Well duh. (1)

bunratty (545641) | more than 4 years ago | (#32077508)

You mean Hendrix wasn't gay after all? Next you'll be telling me that CCR never said there's a bathroom on the right!

Crap, crap, crap into the toilet bowl (1)

tepples (727027) | more than 4 years ago | (#32077662)

PS1 music game Parappa the Rapper turns "There's a bathroom on the right" into a rap song [youtube.com] .

Re:Well duh. (1)

Tynin (634655) | more than 4 years ago | (#32077750)

Even humans mishear speech.

"'Scuse me while I kiss this guy"

That misheard lyric is so common that there's a book about misheard lyrics with that as the title.

-- BMO

Their was a Tool song my friends and I argued over the lyrics of for quite some time. Think it was the prison sex song. I was sure it said, "...my lamb and martyr, this will be over soon...". My friends were sure the song was, "...my loving mother, this will be over soon...". Considering the topic of the song I suppose it could have had yet another level of depravity to it with the whole mom/incest angle, my perverted friends sure thought so, even though I didn't think it made much sense.

Re:Well duh. (1)

blair1q (305137) | more than 4 years ago | (#32077760)

But...but computers are supposed to be perfect!

Re:Well duh. (1)

swilver (617741) | more than 4 years ago | (#32077858)

Well, I checked a few pages worth of content on that site, and I must say it looks like most of these "misheard" lyrics are people trying to make a funny (often sex related) joke (or are simply lacking the correct vocabulary knowledge) instead of actually mishearing the lyric. Some of the songs on that list have lyrics that are so clear it's near impossible to hear them wrong. I certainly didn't find any that I heard wrong.

Disclaimer: I'm not native English, I am a musician though.

Re:Well duh. (1, Informative)

Anonymous Coward | more than 4 years ago | (#32077944)

The article acknowledges this... mentions speech recognition topped out with a 20% word error rate, while humans have an error rate of 2%-4%.

Sorry what? (1)

Daas (620469) | more than 4 years ago | (#32077310)

When talking to someone else, we can politely stop them and ask : "Sorry, what did you say?". As someone whose first language is not english, I tend to use these words a lot, mostly because of differences in pronunciation. Computers, on the other hand, are supposed to get everything right the first time! Why can't they, like us, ask those simple words instead of making stupid guesses??

Re:Sorry what? (1)

cnoocy (452211) | more than 4 years ago | (#32077392)

A lot of vpr systems do just that. Also, dictation systems display what you've typed on the screen, so you can correct by voice if necessary.

Re:Sorry what? (2, Interesting)

Ethanol-fueled (1125189) | more than 4 years ago | (#32077518)

When talking to someone else, we can politely stop them and ask : "Sorry, what did you say?"

That dosen't always work. When accents and the command of a language are so poor, you only get a few chances to ask, "Sorry, what did you say?" After asking three times, you either look like an asshole and/or give up and spend the next few minutes nodding and smiling before trying to parse what they said, hoping you get it right.

Which is why we need good speech-recognition and translation software. It's easy to infer the meaning of "come to me give the diagram" because there are at least intelligible words to work with. And no, I'm not being racist -- the situation applies to all cultures and languages.

Re:Sorry what? (0)

Anonymous Coward | more than 4 years ago | (#32077526)

Computers, on the other hand, are supposed to get everything right the first time! Why can't they, like us, ask those simple words instead of making stupid guesses??

They do and can. I called up a company and got a computer help line. The computer insisted I state my problem so it can look it up. Of course it never got it even close to right and insisted we keep trying again. By the forth time my question involved mostly swears words and finally I was transferred to a human. It only got worse from there.

Re:Sorry what? (1)

Jeng (926980) | more than 4 years ago | (#32077548)

It would seem that people learn computers better than computers learn people.

Much like talking to someone with a poor grasp on ones language you try to make things simple and easy to understand.

Number of sentences? (2, Insightful)

Logarhythmic (1082321) | more than 4 years ago | (#32077322)

One estimate puts the number of possible sentences at 10^570

What a completely useless metric. It makes sense to examine the context and meaning of speech in order to accurately transcribe words, but the number of possible sentences doesn't seem to accurately describe the problem here...

Re:Number of sentences? (0)

Anonymous Coward | more than 4 years ago | (#32077658)

Also, that number is kinda bullshit. In the article, it links to one guy making some back of the envelope calculations about the numbers of sentences. The author (a phonetician) doesn't take into account things like new word creation, novel developments in syntactic structures, or even basic things like recursive embedding of sentence in other sentences.

Though there may be a practical bound on the number of sentences in a language, there's no theoretical limit.

Windows 7 (3, Interesting)

Anonymous Coward | more than 4 years ago | (#32077324)

I've been using VR in Win7 for a few weeks now. I can honestly say that after a few trainings, I'm near 100% accuracy. Which is 15% better than my typing!

Re:Windows 7 (3, Informative)

adonoman (624929) | more than 4 years ago | (#32077484)

People underestimate the value of training - we do it subconsciously when we meet people with different accents or vocal tones. At first people are hard to understand, but given an hour or so talking to someone, you eventually stop noticing their accent. Windows 7 seems to do a really good job at learning from use (it learns even without explicit training when you make corrections). I have windows 7 tablet and the voice recognition is impressive. Its handwriting recognition is even better than mine when it comes to my writing (it benefits from knowing the directions and order of strokes) - I just scratch out something vaguely resembling something I want to write and it seems to recognize it almost 100% of the time.

Not Dead Yet (2, Insightful)

Shidash (1420401) | more than 4 years ago | (#32077328)

I doubt it is completely dead. I have yet to hear it from the researchers working on AI. I work in affective computing, so I am thinking that it is possible that the missing component could be emotion or another way to increase the understanding and ability of computers to learn. In addition, even if it is not possible to increase speech recognition capabilities in this model of computing, in another model of computing this and more would be possible. I am not believing it until I hear it from researchers who have tried most possible options for improvement.

Totally Not Dead Yet (4, Interesting)

RingDev (879105) | more than 4 years ago | (#32077834)

A few years back I worked for an awesome company that did a IVR (interactive voice recording) systems.

We had voice driven interactive systems that would provide the caller with a variety of different mental health tests (we work a lot with identifying depression, early onset dementia, Alzheimer, and other cognitive issues.

The voice recognition wasn't perfect, but we had a review system that dealt with a "gold standard". I wrote a tool that would allow a human being to identify individual words and to label them. Then we would run a number of different voice recognition systems against the same audio chunk and compare their output to the human version. It effectively allowed us to unit test our changes to the voice recognition software.

Dialing in a voice recognition system is an amazing process. The amount of properties, dictionaries, scripting, and sentence forming engines are mind blowing.

Two of the hardest tests for our system were things like: Count from 1 to 20 alternating between numbers and letters as fast as you can, for example 1-A-2-B-3-C. And list every animal you can think of.

The 1-A-2-B was killer because when people speak quickly, their words merge. You literally start creating the sound of the A while the end of the 1 is still coming. It makes it extremely difficult to identify word breaks and actual words. And if you dial in a system specifically to parse that, you'll wind up with issues parsing slower sentences.

The all animals question had a similar issue, people would slur their words together, and the dictionary was huge. It was even more challenging when one of the studies that was nation wide. We had to deal with phonetic spellings from the north east coast and southern states accents. What was even worse was that there was no sentences. We couldn't count on predictive dictionary work to identify the most likely word out of those that would match the phonetics.

That said, getting voice recognition to work on pre-scripted commands and sentences was pretty easy.

And I can only imagine the process has been improving in the years since. Although we were looking into SMS based options, not for a dislike of IVR, but because our usage studies with children were showing most of them were skipping the voice system and using the key pad anyway. So why bother with IVR if the study's target demographic was the youth.

-Rick

World model (2, Informative)

Anonymous Coward | more than 4 years ago | (#32077330)

Speech recognition mechanisms/algorithms are not entirely
the problem. What needs to back them up is called a "world
model," and, as the name implies, this can be large and open
ended. Humans being able to correct spoken/heard errors
on the fly is because of having an underlying world model.

Conlangs (1)

izomiac (815208) | more than 4 years ago | (#32077384)

I've wondered why we can't meet computers half-way. Just design a constructed language that avoids the unsolvable problems. If operating computers by speech is truly better then learning the language would be akin to learning to type.

OTOH, if it's an attempt to simplify computing for those who don't wish to learn, well, that's an impossible task. The problem lies in the fact that such people don't give explicit commands, and even humans take quite a bit of intuition to figure out what they're implying.

Time flies like an arrow fruit flies like a banana (2, Insightful)

GuyFawkes (729054) | more than 4 years ago | (#32077386)

Having said that, Dragon works fairly well, provided you modulate your speech.

If you want a laugh with Dragon, turn away from the screen and talk normally, then look at what it has transcribed..

Re:Time flies like an arrow fruit flies like a ban (1)

SomeJoel (1061138) | more than 4 years ago | (#32077470)

The eighties were like half as groovy as the seventies, but twice as cool as the nineties.

Re:Time flies like an arrow fruit flies like a ban (0)

Anonymous Coward | more than 4 years ago | (#32077894)

Once we get out of the eighties, the nineties are gonna make the sixties look like the fifties.

Training (1)

dominious (1077089) | more than 4 years ago | (#32077408)

speech recognition requires training because it lies on Machine Learning algorithms. Nobody has time to train their computer. I mean, even us humans need 2-3 years of such "training" in order to start recognizing words.

Speech recognition is higher intelligence (1)

gurps_npc (621217) | more than 4 years ago | (#32077412)

Speech recognition is a form of higher intelligence.

Intelligence is basically composed of pattern recognition, with two general categories. One) Specific pattern recognition is logic, math, etc. It requires incredibally exact matches. Yes or no. 1.0, not 1.00001. Computers are very very good at that.

Two) General pattern recognition is creativity, art appreciation, and our capacity to invent. It requires people to ignore a ton of irrelevant data and instea focus on only one aspect of identity, recognizing it despite the large amounts of irrelevant data. That tree kind of looks like a face, that falling object is like all other falling objects. Computers have always been very very BAD at this. Humans do it much much better than animals, but even a monkey is better at general pattern recognition than a computer is.

I am sure that we can make computers slightly better at speech recognition - enough to recognize all of a limited set of comand words like print, attach, email, open, run. Individual programs would have to include codes for their names and specific commands. But I think it will take a true Artificial Intelligence to recognize speech as well as a human. In fact, I would make that my Turing Test. I would also add that I don't think an intelligence built using current theory could become a true Artificial Intelligence. We would need to design a computer that is a non-determenistic device -one that does not rely soley on pure mathematical logic, but is itself based on an entirely new design. No I can't describe it - because if I could I would build one and be rich.

Since I don't have a flying car today, all is lost (4, Insightful)

liquiddark (719647) | more than 4 years ago | (#32077430)

Futurists should really learn what the word "plateau" means. The death of any given technical progression, particularly one that deals with information procesing, tends to be announced early and often, right up to the point where progress becomes meaningful again and then all of a sudden everyone saw it coming, and oh by the way where's my flying car?

Sssssh. (1)

Allnighterking (74212) | more than 4 years ago | (#32077436)

Don't tell the people actually doing it. They don't know that the author of this piece says it won't work. So they keep making it work. We don't want to upset them. Ssssh.
Speech recognition and translation is becoming a highly effective and proficient tool for the US military. You see it fit's in your iPod... and ... well translates. info here [physorg.com] Kinda puts the knosh on this article. Speech recognition as a part of translation is a new application of the tech that is growing by leaps and bounds. 10 years ago we had to do text to text translation, now it's speech to voice. Then you have companies like Voxify,TuVox and others replacing routine call center calls with realistic voice recognition. Far from being a dead animal. It has moved from the realm of fantasy to the realm of direct application.

Did you dictate your post? (0)

Anonymous Coward | more than 4 years ago | (#32077700)

"Ssssh"? "it fit's in your iPod"? "puts the knosh on this article"? "Far from being a dead animal. It has moved"?

Apparently your speech recognition software still needs a bit more R&D. In case you can correct it for the future, it should probably be "Shhhh", "it fits in your iPod", "puts the kibosh on this article", and "Far from being a dead animal, it has moved".

dom

is there any evidence for this analysis? (3, Insightful)

Trepidity (597) | more than 4 years ago | (#32077440)

I see a lot of claims, but not much evidence. If we're going to use perceptions and anecdotes as evidence, my impression is that speech recognition has always been considered vaguely stalled. In 2000, people didn't think much progress had been made since 1991 besides some commercialization of stuff academia already knew how to do. In 2010, this guy doesn't think much progress has been made since 2001 besides some commercialization of stuff academia already knew how to do. Yet I think some progress has been made over the past 20 years. There just haven't been any breakthroughs, which is maybe what he's expecting, given his vague suggestion that "AI", a pretty vague concept, is our hope.

I'm also skeptical that accuracy has flatlined, though it's possible that's true in some areas. My impression is that multi-speaker recognition, use of large corpora to improve accuracy, and use of language modeling to improve accuracy, have all improved [google.com] over the past 10 years. Of course, not all improvements go everywhere: the speech recognition running in real-time on a mobile ARM processor is not using every possible state-of-the-art technique. The advance there is that you can run speech recognition in real-time on a mobile ARM processor at all, and get performance that was once only possible on pretty hefty workstations.

No it doesn't (2, Interesting)

Colin Smith (2679) | more than 4 years ago | (#32077444)

It works great for small vocabularies on your cell phone

No. It doesn't.

It works great for small vocabularies on your cell phone if you happen to live in the same neighbourhood as the developer where "everyone talks this way". For the rest of the world, attempting to talk with a nasal American twang in order to get the phone to understand you, is shit.

 

Re:No it doesn't (0)

Anonymous Coward | more than 4 years ago | (#32077762)

Whut are you tawking abowt boy, I mite just have ta kick yer ass.

Blame startrek (4, Insightful)

onyxruby (118189) | more than 4 years ago | (#32077456)

Blame Startrek for making it look flawless. Speech recognition is just like fusion technology, 20 years away from properly working - just like it has been for the last 20 years.

-RANT- I cant stand voice recognition systems that don't at least give you an option to press a number. Especially when they are out of tune and pick up back ground noises as voice. Please, please, please - always give the option to press a number instead of having to voice everything!!

medical dictation - no go (1, Interesting)

Anonymous Coward | more than 4 years ago | (#32077468)

The radiology voice dictation transcription system at my former employer was horrible. Having to read the dictated reports was equally appalling considering there was a radiologist signing off on their accuracy, and they were certainly not completely accurate. The irony is that the things the system frequently had trouble with were simple words like "not" and recognizing quantities appropriately, whereas more complicated things such as "gastroschisis" would be dictated correctly.

I never understood it, but since I was not the radiologist, I didn't care either. I mostly was entertained by listening to them repeat the same stupid, simple word over and over trying to get the dictation system to behave, when it would have taken a fraction of the time to manually edit the document with a keyboard.

yale-in-ox-boom-i-crows-off (1)

richdun (672214) | more than 4 years ago | (#32077472)

Yay Linux! Boo Microsoft!

I win! Give me all your speech recognition monies.

Wait, what do you mean you don't believe I'm an AI? ... er, I mean ... Wait, what do you mean you do not believe I am an Artificial Intelligence?

IBM? (2, Funny)

Darth Snowshoe (1434515) | more than 4 years ago | (#32077530)

Didn't IBM a few years ago announce a big five-year-program to crack speech recognition? Whatever came of that?

Re:IBM? (1)

PalmKiller (174161) | more than 4 years ago | (#32077862)

They used it to make a better chess playing AI instead.

Re:IBM? (5, Interesting)

N1ck0 (803359) | more than 4 years ago | (#32077904)

IBM closed many of their speech research offices 1-2 years ago and transferred most of the research/data to Nuance's Dragon Naturally Speaking research.

Full Disclosure: I work for Nuance

Tea, Earl Grey, Hot (5, Funny)

tokki (604363) | more than 4 years ago | (#32077546)

How hard is it for a computer to understand the sentence: "Tea, Earl Grey, Hot"? That takes care of 90% of the use case scenarios right there. Next is "Computer, initiate auto-destruct sequence" is the next 8%.

Shout-outs to two idiots (5, Insightful)

Foobar_ (120869) | more than 4 years ago | (#32077550)

This blog post is retarded. The author is correlating a drop in internet news articles about Dragon NaturallySpeaking with a flatlining of speech recognition accuracy rate.

The Slashdot editor Soulskill is retarded for both not realizing this and for not reading the anonymously-submitted blog post (hmm no way it could have been the author) before approving it for the Slashdot front page. The guy is just out for more traffic to his rather pointless tech news commentary blog.

Decline of Slashdot, internet signal-to-noise ratio, get off my lawn, etc.

Try this one... (0, Offtopic)

Aut0mated (885614) | more than 4 years ago | (#32077578)

Alpha Kenny 1

no, it doesn't work on cell phones, either (1)

swschrad (312009) | more than 4 years ago | (#32077586)

this is the reason that millions of americans are faster with the thumb than Buddy Rich with the drumsticks... you can't see the finger move as they type 30 zeroes in a row to escape the mumblebots.

Not free (0, Offtopic)

em0te (807074) | more than 4 years ago | (#32077610)

They gave this information to the "public" by handing it over to the LCD? It costs $150 to obtain a non-commercial license from LCD. This is ridiculous but i guess money is the best way to control information.

Data Input (1)

fermion (181285) | more than 4 years ago | (#32077618)

Automated data input is always tricky. Basically the technology is type it on a keyboard or use voice recognition software or dictate and pay someone to type it in a computer. When people talk about voice recognition they are think the it is competing against typing it in yourself, but it most is competing against paying someone else to type it in.

My understanding, from the people that use Dragon, it competes well against paying someone else to type. First it is a couple of orders cheaper. Second, if you pay someone to type, you still have to read and edit, and dragon is accurate enough. Of course you have to train yourself to use the technology, but that is the same with any technology. It is naive to think that we don't make subtle and not so subtle changes in ourselves so that we can benefit from the technology.

I think speech recognition is going to expand in the future. Beyond the dictation process, there is also simple commands. I don't use the voice controls on the iPhone, but it seems something that people like. I have used the voice controls on my Mac. Furthermore, i can certainly imagine a time when my fingers are not so limber that I might depend on something like Dragon.

I don't see the technology so commoditized that MS includes it in the 2015 version of MS Office, but I do have beilieve there is always room for improvement.

Dear Aunt, (1)

IorDMUX (870522) | more than 4 years ago | (#32077624)

Any discussion of the history of speech recognition is incomplete without a reference to Microsoft's famous Windows Vista "double the killer delete select all" botch-up: http://www.youtube.com/watch?v=klU2zt1KdUY [youtube.com]

Forget speech recognition.... (2, Funny)

puppetman (131489) | more than 4 years ago | (#32077664)

I'd settle for a grammar checker. From the fine summary:

"Even where data are lush"

A good one would have saved this summary from sounding stupid.

Wrong problem (1, Interesting)

slasho81 (455509) | more than 4 years ago | (#32077748)

There won't be any meaningful development in speech recognition (or machine translation) until context is taken seriously. Context is an inseparable part of speech.
Right now the problem being solved is audio->text. This is the wrong problem, and why the results are so lame. The real problem is audio+context->text+new context. This takes some pretty intelligent computing and not the same old probabilistic approaches.

The sixth sheik's sixth sheep's sick. (0)

Anonymous Coward | more than 4 years ago | (#32077812)

Somehow Slashdot chose an apt fortune: "The sixth sheik's sixth sheep's sick." Let me know how your speech recognition software does on that sentence!

dom

Maybe we just need to speak binary (1)

mwheeler (152107) | more than 4 years ago | (#32077828)

Maybe we just need to speak binary.

Best example: Google text captions. (1)

Ancient_Hacker (751168) | more than 4 years ago | (#32077900)

When you have a minute, go to YouTube and bring up an old Star Trek episode (not the CBS ones with very loud commercials).

Then turn on Google captions. More fun than a barrel of Rigelian monkeys!

About every third sentence gets a close or exact rendering, but oh, the other two! I should sue them for laugh-muscle strains.

 

Watermelon Box (4, Insightful)

NReitzel (77941) | more than 4 years ago | (#32077934)

Long ago - decades, before Bill Gates was invented, a lot of research went into what would be required for actual voice recognition.

A counterexample was given, about an engineering marvel (of the time) that would recognise when someone said the word "watermelon". For a long time, people in the industry assumed that the path to voice recognition consisted of building more and better watermelon boxes.

Several authors, including Alan Turing himself, argued that actual voice recognition could never be accomplished with a large array of watermelon boxes. Current VR software divides input into a series of hyperplanes, and attempts to build a best match from the classification tree.

THis is the 2010 version of the watermelon box.

Real voice recognition won't be practical until the input is parsed, matched against context, and structured much akin to diagramming a sentence in those old English (or other) classes. In short, matching against a vocabulary is trying to solve an exponential problem with a (large) polynomial engine.

It won't be until the computer actually understands what is said that VR is likely to be practical in a global sense.

As a person who has been building computer systems for 35 years, it bothers me to see a huge body of research done into subjects like these ignored, because someone thinks that none of it applies to PC's.

its getting better but (2, Interesting)

luther349 (645380) | more than 4 years ago | (#32077940)

speech software has been evolving at a steady pace. but the issue isn't that its the fact 90% of the users out there don't use it. if you live in a loud place with kids or other noise it will not work well. windows 7 has built in speech software and how many people use it. i played with the latest dragon speech software and i gotta admit its very good even without traning it. i did emails with it without any issue. but as i said speech software is more a toy then anything usefull. as people said it probly will have a good use on a cell phone rather then on a pc being it would be a easy way to chat rather then using the cell phones keypad. .

Why should anyone care? (0)

Anonymous Coward | more than 4 years ago | (#32077956)

Most people won't benefit from speech recognition software in any manner that is critical, or might automate the mundane to the point that their lives might yield great benefit to mankind overall. If there's anyone out there, aside from the physically handicapped, who thinks they need speech recognition software to perform any task that isn't repetitive and it truly important for the greater good, I assert that it would be better for all if they had proteges who could learn from them and not machines facilitate isolation.

There is also the problem of meaningful work from those who might serve as assistants, and automation for the sake of automation didn't do the Luddites any good, albeit notwithstanding the motivation to rebel against already cruel and inhumane conditions of employment.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?