Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

The Coming Wave of Gadgets That Listen and Obey

Soulskill posted more than 6 years ago | from the tea-earl-grey-hot dept.

Cellphones 98

dgan brings us a NYTimes piece about the development of speech recognition for common gadgets. Companies such as Vlingo and Yap are marketing their software to cellular carriers to give consumers a hands-free option for tasks like finding directions and text messaging. Quoting: "Vlingo's service lets people talk naturally, rather than making them use a limited number of set phrases. Dave Grannan, the company's chief executive, demonstrated the Vlingo Find application by asking his phone for a song by Mississippi John Hurt (try typing that with your thumbs), for the location of a local bakery and for a Web search for a consumer product. It was all fast and efficient. Vlingo is designed to adapt to the voice of its primary user, but I was also able to use Mr. Grannan's phone to find an address. The Find application is in the beta test phase at AT&T and Sprint. Consumers who use certain cellphones from those companies can download the application from vlingo.com."

Sorry! There are no comments related to the filter you selected.

It may finally happen. (4, Funny)

moogied (1175879) | more than 6 years ago | (#22200576)

5,000 years ago man relized it could not make women listen and obey. So he started a quest to make devices that could..

Is it possible that all of mankinds dreams are coming true now?!

Re:It may finally happen. (4, Interesting)

peragrin (659227) | more than 6 years ago | (#22200634)

nope. because we must select double delete them all.

voice recognition is no where near reliable. I laugh at my brother as he tries to use voice dial on his cell phone, it takes two or three times to get it to work. I once sneezed and it dialed my father. a good throat clearing sounds like mother. I should try farting at it some time to see who that would Dial.

Seriously try it sometime. delicately train the system for your voice, use it for a while, and then start throwing random noise at it. Or take a song which the music track is quiet enough to hear each word clearly and play that at the microphone. It should give you all the lyrics, yet they can't sort that out. The human ear can, but a computer can't yet. voice recognition is nearly useless until it can.

Re:It may finally happen. (5, Funny)

MonsterOfTheLake (880659) | more than 6 years ago | (#22200694)

I should try farting at it some time to see who that would Dial.

#265532 [bash.org] :
(Sabdo) on one of those speech-to-text programs my friend ripped ass onto the mic.
(Sabdo) and it typed out "France"
(Sabdo) we were like, wtf?

Re:It may finally happen. (4, Interesting)

ScrewMaster (602015) | more than 6 years ago | (#22200736)

The human ear can, but a computer can't yet. voice recognition is nearly useless until it can.

Voice recognition is incredibly useful in the right context. A friend of mine is an attorney who happens to be disabled. He makes great use of voice recognition on his computer, does most of his legal work with it. Is it "conversational"? No, but it serves his purposes perfectly.

So you're right, speech recognition systems aren't as generally versatile or accurate as the human brain, but they're getting better all the time. Give it ten years or so, with improved algorithms and a sixteen core processor to handle them I think we'll be interacting with computers on a much different level. Of course, by then you'll have to know Spanish or Mandarin to use one of them.

Re:It may finally happen. (1)

peragrin (659227) | more than 6 years ago | (#22201334)

funny I thought I heard that same thing 10 years ago only it was with the GHZ barrier.

processing speed has helped a lot and they are getting better but I think we need to be able to process more than one thing at a time first. parallel programming will help more than anything else.

Re:It may finally happen. (0)

Anonymous Coward | more than 6 years ago | (#22202770)

uhm, Multi core processors == true multi threading

well, I suppose I should be more specific.
True multi core processors == true multi threading

the only true multi core thus far I believe is the new, albeit slow, AMD Phenom processor.
which is GREAT at true multi threaded multi core supporting programs, not very good at everything else though.
we already utilize parallel processing, but its applications are limited because we thus far have been programing for single core applications.
The poster you're replying to is absolutely right.
the PS3 would probably do quite well with a intricate multi thread parallel process algorithm for speech recognition.
as its an ~8 core system.

As time goes forward you'll probably see video cards utilizing shader processor technology evolve past being simple video cards and start acting more like sub computers (as is already happening with the Ge force 8 series utilizing excess shader cores as dedicated physics processors.)

While you are right about the over played GHZ barrier, you have to remember that computers are increasing in power at an exponential rate as described by Moore's Law http://en.wikipedia.org/wiki/Moore's_law/ [wikipedia.org]
In fact we are ahead of that curve considerably, especially with the advent of multi core processors, especially now that Intel has began commercial production of their 45nm process CPUs.
10 years for the development of accurate speech recognition software is a quite forgiving time line, and in fact is probably not giving development enough credit.

-- The Anonymous coward formerly known as Twiner

Re:It may finally happen. (1)

denovich (25859) | more than 6 years ago | (#22201414)

Speech absolutely works as an interface. I wouldn't have my current job if it didn't (we make speech recognition hardware/software for mobile computing, primarily for use in industrial settings, and have been doing so for 20 years.) But to understand its real potential you have to think beyond traditional humancomputer interfaces and contexts (in short: not at a desk). What if a keyboard or even a screen is impractical? Speech can allow users to interact "hands free, eyes free" (while operating machinery, during physical activity, or while interacting with other people) This is not just about productivity, but also about safety. Cultural factors also come in to play, I believe speech is a more natural interface when it's the computer that is directing the activity (we're all used to accepting verbal commands.)

That said, you need to consider what kind of environments will it be used (noisy?) and how accurate you need the recognition to be. Part inspection in a noisy manufacturing plant will have significantly different demands than a phone based menu system. Are you going to use it infrequently or 8hrs a day? Tailoring speech recognition trade-offs is key to a successful implementation. One-size is currently still very far from fitting all.

Re:It may finally happen. (1)

Dun Malg (230075) | more than 6 years ago | (#22201480)

speech recognition systems aren't as generally versatile or accurate as the human brain, but they're getting better all the time. Give it ten years or so, with improved algorithms and a sixteen core processor to handle them I think we'll be interacting with computers on a much different level.
I'll believe it when I see it. This is one of those areas where various folks have been promising "[five|ten] more years" since the late sixties. Trouble is, the only thing greater storage and processing capacity get you is bigger personalized dictionaries of memorized [words|phrases|phonemes]. You still have to invest time to train the system in recognizing your speech. The greater capacity/accuracy, the longer it takes to "fine tune" the dictionary. It just doesn't seem like simply a problem of lack of "clocks n' bits", but a lack of a way of processing speech like people. It may perhaps be that we are only 10 years from having the capacity to mimic human speech processing, but I think the human brain is so infuriatingly complicated that there'll always be some subtle snag fouling it up until we can actually simulate a WHOLE BRAIN.... and even then, I suspect something else will fall short...

Re:It may finally happen. (1)

ScrewMaster (602015) | more than 6 years ago | (#22201508)

I agree, that's why I said, "with improved algorithms". Ten years, fifty ... eventually it will get done unless we find a better way of communicating with computers. Direct neural interface, perhaps. Something like that would be indistinguishable from telepathy, and a darned sight more useful.

Cell phone vs. server farm (2, Informative)

Lachryma (949694) | more than 6 years ago | (#22200818)

The recognition that you describe is poor because the speech recognizer is running on the phone in a tiny memory/cpu footprint.

Most of the cell phone systems described in the article are likely uploading the audio to a server farm, running recognition there, and then sending back the response.

Re:Cell phone vs. server farm (1)

utopianfiat (774016) | more than 6 years ago | (#22201230)

...what?
Please mr. guru, tell me how this happens exactly.

Re:Cell phone vs. server farm (2, Interesting)

Simon Brooke (45012) | more than 6 years ago | (#22201534)

...what?
Please mr. guru, tell me how this happens exactly.

I not saying it is done that way, but it would be very easy to do it that way. Mobile phones have all the kit which is needed to digitise speech, and to send that digitised speech over a GPRS connection to a web service that does speech-to-text and returns the text would be trivial. Doesn't need a guru.

Re:Cell phone vs. server farm (1)

Lachryma (949694) | more than 6 years ago | (#22201988)

send that digitised speech
It's a cell phone, that's what it does anyway.

to a web service that does speech-to-text and returns the text
It doesn't need to return the recognition, it can return the what the user actually wants. In the music-buying example, the network can just pass the text to the music service, which would then reply to the phone with, for example, the tracks for the requested artist.

Re:It may finally happen. (1)

Maestro485 (1166937) | more than 6 years ago | (#22203632)

Although I do agree with you, voice recognition of a song is significantly different than voice recognition of regular speech. For an anecdotal example, I had 2 years of a Spanish in high school (not that long ago, but I guess it's getting there) and although I don't claim to be fluent, I can recognize certain things even now and I was much better at it then. One day the teacher had a copy of a Disney song in English and in Spanish. Just about everyone had heard the song and knew at least part of it. First we heard the English version, then the Spanish, and our assignment was to write down as many words that we could recognize from the Spanish version. Most people had a list numbering in the single digits.

Granted this shouldn't be much of a surprise whenever you consider that it can sometimes be difficult to recognize words sung in your native language, but it was far more difficult than I thought it would be to recognize foreign lyrics (especially when I was aware of the English ones).

I'm completely unfamiliar with the inner workings of speech recognition software but I have a feeling its much more difficult that it sounds (pardon the pun).

This is a dream (0, Offtopic)

FromTheAir (938543) | more than 6 years ago | (#22200830)

The trick is to become lucid in it.

VLINGO? (1)

caffeinemessiah (918089) | more than 6 years ago | (#22201068)

That's all fine, but do we really another idiotic web 2.0 name for a startup? Vlingo?? REALLY!?!? Haven't we had enough of vongo, twitter, oyogi, flickr, xuqa, blinkx, sharkle, squidoo, zemq, diigo, frappr, joost, zingee, vyew, bebo?

Re:VLINGO? (0)

Anonymous Coward | more than 6 years ago | (#22205312)

Let me guess: your uncle owns the speechrecognition.com domain?

Re:It may finally happen. (1)

boisepunk (764513) | more than 6 years ago | (#22201164)

As long as they are 3 laws safe...

OT: your sig (0, Offtopic)

rk (6314) | more than 6 years ago | (#22201772)

"Anybody who has a problem with me saying "Merry Christmas" shouldn't and won't be taken seriously"
I have a problem with it... it's almost February. :-)

Re:It may finally happen. (0, Offtopic)

rts008 (812749) | more than 6 years ago | (#22204644)

"Anybody who has a problem with me saying "Merry Christmas" shouldn't and won't be taken seriously"

Perfect! Now we will have a MAD (Mutually Assured Disinterest) solution!

Actually, you would not get the opportunity to not take me seriously as I will automatically tune you out as soon as you say Merry Christmas in my presence....especially since it is near the end of January.

Re:It may finally happen. (1, Funny)

Anonymous Coward | more than 6 years ago | (#22201188)

I for one welcome our new... er wait, that doesn't quite work here does it?

Re:It may finally happen. (0)

Anonymous Coward | more than 6 years ago | (#22201526)

and where is my beloved 'whatcouldpossiblygowrong' tag? :D It pretty much fits here :)

I wonder... (1)

blake1 (1148613) | more than 6 years ago | (#22200600)

I wonder what other tasks this technology will provide hands-free options?

Re:I wonder... (2, Interesting)

Amorymeltzer (1213818) | more than 6 years ago | (#22200676)

Everything. I personally don't give a rat's ass about cell phones - it's not really a big deal or very innovative until you just have a communicator built in. Everything else though, from doors, lights, running tasks on a computer, etc. is what's really cool. Little inane things that just piss you off in life - like having to get up from the bed with the girl/boy on it to turn the light off, or setting a TV up for a movie, or having the computer do everything you want. I'd much rather say "wait until this song ends, then play "Helter Skelter," and then put up an away message and turn the display off." (Not that I can't really do that already, it's just more aggravating)

Re:I wonder... (2)

ScrewMaster (602015) | more than 6 years ago | (#22201186)

Forget voice recognition/synthesis and all that crude claptrap ... I want a brain implant capable of accessing symbolic thought patterns directly. Just think about something and the machine will figure out what it is that you want to know, and feed the information back into your head as if you'd just remembered it naturally. You wouldn't even have to know the difference between a "real" recollection and one that was put there on-the-fly. You would just know stuff. Need to perform some integral calculus? No need to boot up your desktop PC ... let the implant do the work! So long as it's connected to the global network it could find out anything that's been published, access any service it would need to do what you want. How cool would that be? Imagine you're a scientific researcher and you're thinking about what to do for your next study: you'd instantly know what's already been done in the field. Want to control your home? Just think it! {"She's getting into the mood, better start dimming the lights and fading up the music."}

I mean, LSD was supposed to "expand the mind", but this kind of technology could actually do it.

Re:I wonder... (5, Funny)

lgw (121541) | more than 6 years ago | (#22200728)

I can't get over this "hands free text messaging" option! What engineer had the insight "we need to give customers a way to communicate over the phone just by talking"? It's a strange world.

Re:I wonder... (1)

ShiningSomething (1097589) | more than 6 years ago | (#22200948)

Think of the possibilities... No more annoyign tpyos! And just how are they going to say "LOL"? It may be the downfall of teen cell users.

Re:I wonder... (1)

ScrewMaster (602015) | more than 6 years ago | (#22201206)

No more annoyign tpyos!

Yeah, we'll call them "Freudian slips" instead.

And just how are they going to say "LOL"?

Probably by saying "laughing out loud."

Re:I wonder... (1)

TheThiefMaster (992038) | more than 6 years ago | (#22202542)

Or just by saying "lol".

Re:I wonder... (2)

_xeno_ (155264) | more than 6 years ago | (#22202060)

One of the features of my new phone is "Voice SMS."

Think about that for a moment. It's like a text message, but it's voice. On a phone.

According to Sprint [sprintpcs.com] , the reason this is better than a normal voice mail message is that you're guaranteed to leave a message and not actually reach the person you're calling (which comes up how often?) and that the text message UI is easier to deal with than the voice mail system. (Then why not offer a voice mail UI?)

And, of course, it wastes both a text message and data transfer. So instead of leaving a voice mail message, which uses normal minutes and during off-peak hours is free, you get to pay extra for this feature. (Oh, I get it, improving the voice mail experience would be too hard to monetize, so it's just not worth it. The iPhone must be an illusion.)

Re:I wonder... (2, Interesting)

FromTheAir (938543) | more than 6 years ago | (#22200752)

There is no reason you couldn't set your car's speed with your cell phone using Blue Tooth. Just say 80 MPH please. Or reduce rapid, no break lights, 60. Or speak "reduce 3 spot 60 BL not." That means reduce speed to 60 in 3 seconds no break lights. Over the course of 3 microseconds the car determines based on recent stored values if there is another vehicle approaching from behind and how close speed and if a collision would result from your command. If not it executes the command. Everyone could be tweaking the safety parameters to have the fastest vehicle. With enough sensors and computerized control we could travel much faster safely. Of course there is an issue with voice command in an environment with lots of noise pollution. Of course why not just tell it your destination and have it automatically race like a bat out of hell, coordinating with all the other cars on the road, to get you there while change stations on the stereo and recline the seat. Obviously we would engineer any big brother features out of the traffic intelligence system creating autonomous anonymous intelligent navigation.

Re:I wonder... (-1, Troll)

Anonymous Coward | more than 6 years ago | (#22200966)

Mother FUCK the Jew York Times. Fucking liberal mouthpiece masquerading as legitimate news. Trash. All the news that's fit to deceive.

All I can think of is... (4, Funny)

bennomatic (691188) | more than 6 years ago | (#22200614)

"Open the pod bay doors, HAL."

"I'm afraid I can't do that, Dave."

Re:All I can think of is... (2, Funny)

Anonymous Coward | more than 6 years ago | (#22200636)

Re:All I can think of is... (3, Interesting)

value_added (719364) | more than 6 years ago | (#22201014)

"Open the pod bay doors, HAL."
"I'm afraid I can't do that, Dave."


My take on the matter is that the reason that's all you can think of is that everything else is inappropriate, inefficient or simply too goofy for consideration.

Not to anthropomorphise electronic devices (I know, they don't like it when you do that), but I think they'd prefer to be treated anonymously and respond the most basic of instructions only. And we'd prefer they remain that way, except in very limited circumstances where the device is named Lenore.

In the Star Trek movies you'll find something similar to the above, with an occasional "Tea, Early Gray, Hot" for good measure, but the rest of the time everyone is interacting with devices using ... wait for it ... keys and buttons. And this is into the technologically advanced future where most everything is a device, including crew members. Seeing Picard, for example, say "Computer, send a message to Data telling him to work on his joke-telling skills", or to use the article's example, [asking] his phone for a song by Mississippi John Hurt, would be seen by everyone as a ridiculous use of technology and dismissed as absurd.

Voice recognition, in the abstract, is fascinating and no doubt fun, but I wouldn't want to live in a Tourettes-like world where everyone is shouting out instructions to unthinking devices, let alone work in a cubicle where the next guy's phone conversation are competing with the noise of his regular work.

So past opening and closing doors, keyboards it is. Or for those unskilled in the expressive art of the command-line, a mouse or function buttons.

So I'm at the grocery store and this (0)

Anonymous Coward | more than 6 years ago | (#22201184)

dude barks at me:
you moron, why?

So I ask him in a severe tone of voice:
Why, in the empirical sense?

To which he responds:
I'm gonna kick your fucking ass!

At which point I began:
beating(defensively) his face to a pulp with a pair(one per hand) of #303 cans of tomatoes(one crushed, one diced).

It wasn't until the pod feel from his ear and I heard a little voice crying:
Ben, Ben, I've been so good for you.

That I realized he had not been speaking to me at all.

Re:All I can think of is... (1)

sticks_us (150624) | more than 6 years ago | (#22201258)

So past opening and closing doors, keyboards it is. Or for those unskilled in the expressive art of the command-line, a mouse or function buttons.
Hear hear! I'm still hoping in my lifetime I'll get to enjoy the inevitable outrageous media hype over the NEW TYPE OF INTERFACE, one that REPLACES THE PRIMITIVE GUI with WORDS THAT YOU TYPE INTO THE SCREEN. This one will use SOPHISTICATED text parsing and concepts derived from ARTIFICIAL INTELLIGENCE!!

Sample ad copy:

Want to remove a file? Just type
rm [filename]

Want to list the files in your directory? Try
ls

Want help? Just ask for it!

etc.

Definitely need a keypad fallback (1)

Nerdposeur (910128) | more than 6 years ago | (#22207958)

Some phone menus are now speech-only, which I find annoying. I have had to call large corporations on my lunch break, expecting to eat while I punched in numbers to get to the right person and sat on hold.

To my dismay, I had to speak every menu option, so I had to stop eating. Since the menu also misunderstood my speech, I got misdirected a time or two as well.

You can imagine this happening to people who are calling from a noisy environment, like a subway, or outside when a train is passing. If I must talk to a machine, it would be nice to have the option of keying in my choices.

Re:All I can think of is... (1)

owlnation (858981) | more than 6 years ago | (#22201496)

"Doolittle: Fine. Think about this then. How do you know you exist?
Bomb #20: Well, of course I exist.
Doolittle: But how do you know you exist?
Bomb #20: It is intuitively obvious.
Doolittle: Intuition is no proof. What concrete evidence do you have that you exist?
Bomb #20: Hmmmm... well... I think, therefore I am.
Doolittle: That's good. That's very good. But how do you know that anything else exists?
Bomb #20: My sensory apparatus reveals it to me. This is fun."

Re:All I can think of is... (1)

bennomatic (691188) | more than 6 years ago | (#22202486)

I *loved* that movie. Thank you for reminding me of it. Jeeze... going way back in the memory banks. Dark Star? I was a sophomore in high school when I saw that (not first run) at the UC Theater in Berkeley...

Fun with Gadgets (3, Funny)

Knave75 (894961) | more than 6 years ago | (#22200616)

User: Please connect me with Hugh Jass
Gadget: Sorry, I could not find a Hugh Jass
User: *snicker*

Re:Fun with Gadgets (1)

T.E.D. (34228) | more than 6 years ago | (#22207678)

User: Please connect me with Hugh Jass

Gadget: Dialing: Mother
User: Hey!

The increasing rate of change Collective Power (2, Interesting)

FromTheAir (938543) | more than 6 years ago | (#22200620)

I can imagine the day we speak the name of some legislation in the phone and say "vote yes" or "vote no". The results show up on our congressman's web site and some other third party sites that archive. This way we take control of a few and transfer it to the less corruptible and wiser "many".

Re:The increasing rate of change Collective Power (1)

killmofasta (460565) | more than 6 years ago | (#22201268)

I have never posted a video to Slashdot. NEVER, but a bright young friend pointed me to this:

http://video.google.com/videoplay?docid=1070329053600562261&q=endgame&total=3063&start=0&num=10&so=0&type=search&plindex=0 [google.com]

We will probibly never get the chance again to participate in domocracy, and its not just some crazy theory, but I have been following both of these journalists for a few years. The guy who showed me some of their stuff, died of a heart attack while driving.

"Alex Jones' brand new documentary ENDGAME charts the history of the elite blueprint for social domination and control, outlining the ultimate plans that those who consider themselves the anointed have for our planet.

Please support Alex Jones by subscribing to prisonplanet.tv and/or purchasing a DVD copy of ENDGAME.

A high-definition version of this film is available for download at prisonplanet.tv along many other Alex Jones documentaries."

Check It Out.

heard it all before (4, Insightful)

debatem1 (1087307) | more than 6 years ago | (#22200624)

I maintain great skepticism about speech recognition as an interface. It just isn't much faster than typing, even on a cell phone- and its not that it takes so much longer to get an ideal rendering, its that even a minor error in translation results in about five seconds of prompting followed by reentry. Until they can get that figured out, or get accuracy up to a point where someone unused to giving dictation can use it, its just not that great a technology.

Re:heard it all before (4, Interesting)

mdfst13 (664665) | more than 6 years ago | (#22200800)

It just isn't much faster than typing
Sure, but it's a lot safer to do while, say, driving down the road. The problem with screen output and typed input is that you have to use both eyes and hands to operate the device. By contrast, using speech input and output only requires voice and ears. Of course, there are some circumstances where the screen/type method is superior, e.g. sending emails from your blackberry during meetings. However, there are many cases where speech is superior, e.g. driving down the road (or even just walking). Viewing speech as a replacement for screen/type is over zealous. It's really more of an alternative.

It would probably help if advocates of the technology understood this. It doesn't have to be all or nothing. Two alternative solutions can add up to a more powerful solution than either would be alone.

Re:heard it all before (2)

Jeff DeMaagd (2015) | more than 6 years ago | (#22201120)

Poeple definitely shouldn't be texting while they drive. People probably shouldn't be talking while they drive either.

Re:heard it all before (1)

Justus (18814) | more than 6 years ago | (#22201428)

Yes, in an ideal world, all drivers would devote 100% of their attention to driving safely and not distract themselves.

Unfortunately, in practice, people are going to zone out, talk to their passengers, mess with their radio, etc. I'd much rather have them ask their car for a song or directions than have them look down to adjust the radio dials or check a map. That's what this technology is trying to address, and I would guess it will eventually make us safer, should they get it adopted and used in a widespread fashion.

Re:heard it all before (1)

Belial6 (794905) | more than 6 years ago | (#22201442)

So, you advocate that there should be no passengers in motor vehicles? If only more people could understand that talking to someone next to you is just as bad (actually worse because you are naturally drawn to look at the person you are talking too) as talking to people a hundred miles away. At least there are two of us that don't see cell phones as evil magic delivered by the dark lord.

Re:heard it all before (1)

Jeff DeMaagd (2015) | more than 6 years ago | (#22202556)

I thought the studies I saw suggested that talking on the phone is a bit worse than talking with passengers. At least the adult passengers can see the circumstances and have a chance to shut up if the situation is tight. Someone on the other end of the line isn't going to get that. Also, an adult riding with you might notice things the driver misses. But talking can be a distraction, no matter who it is or where they are.

Re:heard it all before (1)

Belial6 (794905) | more than 6 years ago | (#22203188)

"I thought the studies I saw suggested that talking on the phone is a bit worse than talking with passengers."

The studies you saw were specifically designed to find that cell phones are dangerous.

"At least the adult passengers can see the circumstances and have a chance to shut up if the situation is tight. Someone on the other end of the line isn't going to get that."

Not only are many passengers not adults, you cannot just hang up on an adult passenger if you need to.

"Also, an adult riding with you might notice things the driver misses."

I have met very few people that drive better with a back seat driver. In fact, any benefit to having someone telling you how to drive when there is an emergency is likely going to be offset by the drawbacks of having someone telling you how to drive in an emergency. Although, I have never seen any studies to determine that, much less legitimate ones.

"But talking can be a distraction, no matter who it is or where they are."

While that may be true, we have accepted that we are going to allow distractions in a car. The fact that cars allowed to be build with the driver in the same compartment as the passenger confirms this. Complaining about cell phones is simple peoples way of saying that their shade of gray is better than other people's shade of gray.

Really, if half the people that complain about cell phones in cars would put as much effort into complaining about cars not driving themselves, we could actually see some progress in that area, and then we could actually see some real progress in making cars safer.

Here is a little quiz. What do all of these dangerous activities have in common:
  1. Driving while talking on a cell phone
  2. Driving while reading a newspaper
  3. Driving while putting on makeup
  4. Driving while eating
  5. Driving while arguing with your wife
  6. Driving while checking to make sure your infant is quite due to sleeping instead of chocking
  7. Driving while stupid
  8. Driving while sleeping
  9. Driving with 100% of your attention on driving

That's right. Driving. How many of those activities are dangerous if your remove the driving part? That's right none. Well, I guess arguing with your wife, or being stupid MIGHT be dangerous without driving, but you get my point. Why do you think that the same people who complain about cell phones in cars, don't complain about cars not driving themselves? That's right. Because they believe that their activity that endangers themselves and others is OK, while any activities that might be dangerous that they don't take part in is not.

Re:heard it all before (1)

debatem1 (1087307) | more than 6 years ago | (#22202156)

Honestly, I would love it if it could be viably used in conjunction with text input, but the technology just isn't there yet. It doesn't help that most people aren't trained for dictation (it really isn't as easy as you'd think!) but the major hurdle is that even under ideal conditions the accuracy of the technology is poor. Of course, the more rigidly defined applications (voice activated phones, etc) are more effective than their free-form cousins, and have achieved some degree of reliability even in acoustically less-than-ideal settings, but then their usefulness is also more limited, and the above mentioned issues with input speed still apply. I think that the bottom line is that unless you're in an environment that A) attaches a very high penalty to requiring manual input, B) attaches no/a low penalty for speed of input, and C) does not require a wide range of behaviors, speech recognition, and particularly speech-to-text, are more feature creep and marketing fluff than serious technologies.

Re:heard it all before (0)

Anonymous Coward | more than 6 years ago | (#22202664)

It would probably help if advocates of the technology understood this. It doesn't have to be all or nothing.

Not just the advocates, the people looking to shoot down everything always latch on the examples where it's not useful. About a year back and a half back I posted talking about how much I loved my tabletpc and how I felt that the tablet functionality was something the mac needed... I got labeled flamebait/troll and all these idiots replied saying how the pen is inferior to a keyboard and a step back. It is only if you're in a situation where using a keyboard is doable. But the most comfortable positions on a couch, or jotting down notes as you're standing in the doorway of a user/manager/whatever, and a bunch of other common daily situations where a keyboard is completely inferior. Now macOS has those things(big time happy about that), so that part is a moot point, but I wonder how many out there play twister on the couch trying to use their laptops comfortably when what they really need is something with a keyboard and a pen, two input methods that compliment each other, not compete with each other.

As you pointed out, it's the same thing with voice recognition technologies. I've dictated a lot into my tabletpc and then corrected the missed translations later. Was it slower than typing things out? Sure, BUT, not when you take into account that I was able to dictate a lot of things while sitting comfortably, or cooking, or cleaning, etc... It does make errors and requires correction, more so than typing, but it takes a lot less time to correct the errors than if I were to sit down and just type the body of it all at once BECAUSE, the body of the work is entered while doing other mundane but necessary activities. That's time claimed that you're never going to claim with a keyboard.

Re:heard it all before (1)

ijablokov (225328) | more than 6 years ago | (#22203172)

I completely understand and agree with you; hopefully as you research my background, you'll notice that I've always been an advocate of "multimodal" interaction, from the standpoint of giving users a choice based on their personal preferences, operating environment, device capabilities, etc.

Yap has been architected from the ground up to be perfectly useable for either manual, voice, or a combination of both input methods (and others that we can't reveal just yet). You decide what's best for you (we're not that arrogant where we know that up front for all potential cases).

That said, we sincerely appreciate the community's feedback. There'll be some exciting things we can do with a free-form platform to support expanding this capability for all you developers out there, so watch this space!

i.
(yap's ceo)

Re:heard it all before (1)

glitch23 (557124) | more than 6 years ago | (#22203400)

The problem with screen output and typed input is that you have to use both eyes and hands to operate the device. By contrast, using speech input and output only requires voice and ears.

In a perfect world yes, but until voice recognition is perfected the speech input method requires one of the same things as typed input, eyes, in order to make sure it recognizes everything correctly so that you can fix *its* mistakes when the words aren't recognized correctly.

Re:heard it all before (1)

mdfst13 (664665) | more than 6 years ago | (#22346946)

In a perfect world yes, but until voice recognition is perfected the speech input method requires one of the same things as typed input, eyes, in order to make sure it recognizes everything correctly so that you can fix *its* mistakes when the words aren't recognized correctly.
Why not have it read back to you what it said? Or wait until later to fix the problems? I used to work with a guy who would just throw words onto paper and correct the spelling afterwards. Again, this might not work for everyone, but for some people, that better models how they approach problems.

For some applications, e.g. notes to oneself, errors aren't that critical. Don't bother correcting them. Jot down the idea and delete it when you have time to get back to it and process it formally.

Re:heard it all before (1)

chelsel (1140907) | more than 6 years ago | (#22200898)

I think it will be a long time before speech recognition works outside controlled environments... for example, try navigating these speech recognition menus when your kids are playing and yelling in the background or while you're driving and the car window is open... most of them break down severely... the good ones transfer you to a human after a minute or more of wasted time.

Re:heard it all before (1)

maxume (22995) | more than 6 years ago | (#22201190)

Yes, most things are only useful once they become useful. Up until then, they are often 'neat ideas' that people get excited about, because they imagine them being useful.

Re:heard it all before (1)

Poromenos1 (830658) | more than 6 years ago | (#22202306)

That's not really it. Right now you need to tell the computer exactly what to do when you're talking to it, so you need to say "move down five, move left three, press enter". This is done much faster with a keyboard, obviously. What we need is a way to micromanage computers less and have them do what we want, e.g. "find a restaurant in the area that serves seafood". Unfortunately, the fewer information you give, the more can go wrong, so I'm not sure that movie-like voice recognition will ever catch on...

Just imagine how you would want the computer to obey voice commands, and then imagine giving the same command to a human, e.g. "Write a letter to my mother". I wouldn't trust a human to do exactly what I want with the limited information I gave, so why would a computer be different?

Re:heard it all before (1)

debatem1 (1087307) | more than 6 years ago | (#22202492)

Except saying 'write an email to my mother' comes pretty close to working on the actual computer. I open my email client, type 'mother' in the to: box, then type my subject and message. In a speech recognition scenario I would say 'open... email... client... to... mother... subject...apple...pie..." etc. And the computer would then faithfully reply "I'm sorry, I didn't understand anything you just said. Could you please repeat it, growing steadily louder and angrier, until the end of time?"

Just so we're clear (1)

s4ltyd0g (452701) | more than 6 years ago | (#22200626)

I'll only be interested in gadgets which obey only what I tell them to do.

Re:Just so we're clear (0)

Anonymous Coward | more than 6 years ago | (#22200672)

If you are taking the same approach with women, no wonder you are posting to slashdot!

Re:Just so we're clear (-1, Troll)

Anonymous Coward | more than 6 years ago | (#22200846)

I read your ad on craigslist. Don't even act like you are to good for cocksucking. You solicit little boys on there all the time, and now this? You are lower than low my friend. You might as well get on your knees are finish it, before you are forced to.

Fucking hypocrite.

Limited phrasebook (3, Interesting)

name*censored* (884880) | more than 6 years ago | (#22200652)

Limited phrasebook technology is a lot better than voice recognition technology in a lot of devices. Given that most (well, all) devices have limited functionality (not even Steve Jobs' iPod can do his taxes for him), there's very little point in giving the device the ability to understand possibly-misdirected phrases such as "Honey, have you seen the remote?". A good approach for this technology would be to limit it to understanding alternate ways of phrasing a particular command; "Device, Get Me A Beer"/"Device, Can I Have A Beer"/"I'm Really Thirsty". This way, we'd avoid misdirected speaking (the device thinking you're speaking to it instead of to another), and could also exploit the reduced set of understandable phrases to correct for people with colds/accents/quiet voices/etc, in much the same way as limited-phrasebook devices work (only with more flexibility).

Re:Limited phrasebook (2, Funny)

Dachannien (617929) | more than 6 years ago | (#22200882)

Instead, we should invent plot-directed recognition technology. I mean, you never see the computer on Star Trek misinterpreting the zillions of conversations as being directed toward it. Why? Because it would bog down the plot, except for those rare occasions where it's funny.

Same thing applies to the doors. The doors know exactly when someone is going to walk through them, because they are plot-directed. You can stand mere inches away from a door, facing it, but until the plot indicates that the time for you to go through has arrived, they won't budge.

Re:Limited phrasebook (0)

Anonymous Coward | more than 6 years ago | (#22201030)

This comment should be +10 funny by now. I'm dying here. Well played.

Re:Limited phrasebook (3, Funny)

niceone (992278) | more than 6 years ago | (#22200922)

"Honey, have you seen the remote?"

Phone: Yeah, sure, it's cute enough, but I think I can do better.

Re:Limited phrasebook (1)

webmaster404 (1148909) | more than 6 years ago | (#22201626)

Given that most (well, all) devices have limited functionality (not even Steve Jobs' iPod can do his taxes for him)

The hardware problem isn't as big as the software one. Sure Steve Jobs' iPod can't do his taxes with stock firmware, however with a different OS I am sure that it could be done. It used to be that speech recognition would become a reality when your processor was fast enough, now we have quad-core CPUs running at 3 GHZ and it still hasn't been done reliably.

Open the pod bay doors, Hal. (1)

Caboosian (1096069) | more than 6 years ago | (#22200666)

I'm sorry Dave...

Oh the coming litigation (1)

rodney dill (631059) | more than 6 years ago | (#22200678)

"Didn't I ask to to start the Roomba, Dear" (click) "Do it yourself, Roomba my ass...."

layer mismatch (1)

TheLink (130905) | more than 6 years ago | (#22200748)

Seems to be a "layer mismatch". Analogy is the OSI model.

I'll stick to using voice for "higher layer" communication with actual intelligences like humans and other animals. For "lower layer" comms you don't use your voice.

If you ride a horse while you do talk to the horse sometimes, the talking is for the "higher layer", you use reins and body for "lower layer".

The last I checked all these gadgets and devices are pretty stupid, definitely no real AI. So it'll be more gimmicky than actually useful.

For such things that are to work as extensions or augmentations of your self it is silly and impractical to try to control them using your voice.

You won't want to have to control artificial limbs with your voice.

How about we skip this and move on to controlling such stuff with thoughts instead? If necessary for first gen devices I'm sure we can come up with our own thought macros- if they're unique enough there won't be a "collision/clash" with normal thinking.

Dang Lazy Gadget (1)

camperslo (704715) | more than 6 years ago | (#22200794)

Obedient, huh? Get a job and bring home some cash!

Vlingo and Yap (0)

Anonymous Coward | more than 6 years ago | (#22200824)

Proof that in the coming new-wave market all of the good names are taken.

Doh. (0)

Anonymous Coward | more than 6 years ago | (#22200894)

Dear aunt, let's set so double the killer delete select all.

The trouble with this is... (1)

zappepcs (820751) | more than 6 years ago | (#22200934)

I have put some thought into this problem via a hobby of robotics, and consequently have read quite a few papers etc.

The trouble with this can be summed up like this: Would you typically go through your day with a 6 year old, giving the 6yr old instructions on who to dial, what emails to send etc.?

No? Then you can forget the voice recognition stuff. Voice recognition substitutes What? for the typical 6yr old's Why?

There are a lot of people who have VR dialing on their phone now. Do you ever see anyone using it? Wonder why?

Voice-based text messaging? (0)

Anonymous Coward | more than 6 years ago | (#22200944)

This technology is wonderful! I always wished I could enter text messages through voice rather than type them in. They could even improve it on the receiving side with text-to-speech technology, perhaps even automatically matching the voice of the person sending the text message. Imagine, you would just have to speak into your telephone and the person on the other side would hear your message in your own voice! Amazing!

Take care! (1)

Nomen Publicus (1150725) | more than 6 years ago | (#22200964)

If we ever let them learn how to lip read, we are doomed!

Speech recognition in languages other than english (1)

__walk_the_talk (1227504) | more than 6 years ago | (#22200970)

Another company [haikya.com] seems to have developed speech recognition engines for embedded devices [haikya.com] in languages other than english. Speech recognition has a potentially huge user base(in tens or hundreds of millions atleast) if they can crack the problem for native indian and chinese languages.

Both Indian [iiit.ac.in] and Chinese [psu.edu] researchers seem to have made progress in this.If this work is successful,people would'nt need to learn english to access information on the web etc.With the booming mobile telecom sector and the proliferation of fairly powerful(architecture wise) phones,this could well be the right time to introduce this.Mobile vendors are already innovating,with text messaging now being available in local languages.But a functional speech recognition system could open up completely new areas in the non-urban landscape.

There is a lot of scope for the sister technology(speech synthesis) too ,if it can be implemented with reasonable success in native languages.Ideally ,this technology could act like a google translate for voice.It could break the language barrier at one stroke.unfortunately ,speech synthesis seems to be much more nascent.

Why is Talking Considered to be So much Better? (1)

okmijnuhb (575581) | more than 6 years ago | (#22200980)

Personally I'd rather push buttons, than vocalize, to get my gadgets and appliances to do stuff.
Isn't it bad enough people walking down the street apparently talking to themselves with bluetooth headsets?
Now we can have, "What did you say honey?",
"No Dear, I was talking to the microwave."

Re:Why is Talking Considered to be So much Better? (1)

RHSC (1019802) | more than 6 years ago | (#22201766)

You're just jealous because the microwave has better things to say

Not Overlords (0)

Anonymous Coward | more than 6 years ago | (#22201004)

Is it the case that these devices are, by definition, not overlords, and therefore I cannot say "I, for one, welcome our obeying robotic ..." correct? Dangit. That's pretty much all I've got.

Re:Not Overlords (1)

russlar (1122455) | more than 6 years ago | (#22201090)

Just to be safe, I, for one, do welcome our voice-recognizing robotic overlords.

What would make me happy.... (1)

lobiusmoop (305328) | more than 6 years ago | (#22201006)

(as a world traveller) would be a mobile phone that can pair up with 2 bluetooth headsets, and translate between different languages coming into each. That might make it easier to chat with all the beautiful, but differently-languaged babes the world is so full of. The age-old incentive for development is there, so surely something like this has to appear.

Re:What would make me happy.... (1)

sskagent (1170913) | more than 6 years ago | (#22201358)

... pair up with 2 bluetooth headsets, and translate between different languages coming into each.
They have fish to do that for you.

Re:What would make me happy.... (1)

myowntrueself (607117) | more than 6 years ago | (#22202652)

That might make it easier to chat with all the beautiful, but differently-languaged babes the world is so full of

I think I saw a documentary about a prototype for this... it translated anything you said to helpful phrases such as "Free mustache rides" and "Suck it, bitch, suck it dry".

I Hate This Shit (1)

dcollins (135727) | more than 6 years ago | (#22201166)

There, I said it. Voice-recognition shit (most especially attempts at "natural language" parsing) never, ever, ever works right for me -- or anyone that I know or discuss it with. It never works right. On phone networks we all just wind up frustrated, wasting time, swearing obscenities into the phone until it finally turns us over to a live human operator, in a much-worse mood.

It sucks and I hate it and it's bullshit and the charlatans selling this shit should be shot in the kneecaps. You're *garbage*.

Re:I Hate This Shit (1)

Badgam (1219056) | more than 6 years ago | (#22203738)

Sounds like somebody's got a case of the Mondays.

Hmmm...speech-to-text text massaging.... (0)

Anonymous Coward | more than 6 years ago | (#22201176)

Isnt that what we need. I'll have my secretary handle all my text messaging form now on.

The future is already here (1)

sticks_us (150624) | more than 6 years ago | (#22201282)

Anyone have one of these [thinkgeek.com] r2d2 voice-activated r2d2 robots yet?

More importantly, has anyone ever hacked one?

Scotty... (0)

Anonymous Coward | more than 6 years ago | (#22201692)

Scotty: Computer? Computer?
[Bones hands him a mouse and he speaks into it]
Scotty: Hello, computer.
Dr. Nichols: Just use the keyboard.
Scotty: Keyboard. How quaint.

Multiple voice recognition gadgets (1)

Lac (135355) | more than 6 years ago | (#22201814)

I can see it from here. You will have both a car and a cell phone that are voice-activated. What could possibly go wrong, right? Best case scenario: As you try to send a text message over the phone while driving around, the car will be like, "You talkin' to me? You talkin' to me?" "No, damn it, I'm writing an e-mail!" The phone: "Sorry, I thought you were talking to the car, there. Would you mind repeating that?" You: "Ah, never mind."

Mississippi John Hurt/Lionel Trains voice command (1)

dpbsmith (263124) | more than 6 years ago | (#22201876)

"Dave Grannan, the company's chief executive, demonstrated the Vlingo Find application by asking his phone for a song by Mississippi John Hurt (try typing that with your thumbs)"

I am not impressed. I will bet you a nickel that he tried that out prior to the demonstration, and made sure there was nothing similar that might come up by accident. I would be impressed if he had given the mike to reporter Michael Fitzgerald and Fitzgerald had tried it.

At trade shows, I used to watch all sorts of demonstrations of OCR and voice recognition technology. In the old days, I would always ask to try it myself. Whenever I was allowed to, it was always a grotesque, dismal failure (followed by lame assertions that it would be fine if I gave it a little more training). In more recent times, demonstrators at trade shows have smartened up and refuse to depart from the script or allow any third party to try the stuff.

A lot of this reminds me of a gadget that was made for Lionel train sets... not by Lionel, I don't think... called something like "Voice Commander." The box showed a kid saying "Stop! Please move forward! Stop! Please back up!" and the train obeying.

It wasn't exactly a scam, it was just... well... one of these "limited phrasebook" deals. In this case, the limited phrasebook consisted of the single letter "P." The "microphone" actually had a little vane activated by air movement, and the letter "P" was about the only thing that would trip it. So if you said anything with the letter "P" in it, it would momentarily interrupt the current. Meanwhile, Lionel trains were designed with a stepping switch in them, and periodic interruptions would sequentionally cause the train to stop, go forward, stop, and go in reverse.

So, yeah, you could control the train with your voice. But it didn't necessarily do what you told it to do.

I agree that today's applications go a little farther than that, but I still have the feeling that the people who say that speech recognition is have lowered the bar for what "speech recognition" ought to mean.

I also have the impression that over the last ten years, what has happened is not that speech recognition has improved much, but that it's stayed the same and gotten cheaper... so crappy speech recognition is finding it's way everywhere.

I'm still ticked off at the "hands-free" gadget I got for my cell phone that was supposed to do voice recognition (or, correctly, use the voice recognition built into the phone). When you're driving in traffic, and its says "Should I place the call?" and you say "Yes," and it says "Did you say 'yes,'" and I say "Yes," and... lather, rinse and repeat, with my voice gradually becoming less and less intelligible with frustration... I am not at all sure that the demand on my attention is negligible.

Re:Mississippi John Hurt/Lionel Trains voice comma (1)

Wald76 (701473) | more than 6 years ago | (#22203628)

I've actually tried out the vlingo application a couple of times, and the speech recognition is surprisingly good. They trained the system on a vast number of business names and addresses (easily over a million), and thus the application of vlingo I used was for "point of interest" queries in mobile search. When their CTO said "find me a Starbuck's in " and it worked, I naturally wanted to test it on other more odd queries. Even though the server-based recognition had adapted itself for the CTO's voice (based on the caller id information of his phone), I tried "find me Caribou Coffee in Wheaton Illinois" and it got it word for word. I tried a couple more place queries and even one that was fictitious but plausible, and it worked fine: their system is not based on a fixed speech grammar outlining all possible expected utterances, but a much more flexible statistical approach based on phoneme lattices. Voice input seems very appealing for mobile search when you contrast it to keypad entry. This study [esprockets.com] of a million Google Local Mobile queries showed that it took 56-63 seconds -- a full minute! -- to enter an average query by 12 key keypad, and about half that to enter the query via a PDA with a stylus and virtual keypad. So if a speech recognition interface that does it 2-3 seconds is a huge win if the accuracy is high enough for most users. I feel vlingo is at least tantalizingly close to this level of accuracy. You can get a feel for a similar system by trying out Google's free 1.800.GOOG411, to see how it works for you.

User friendly GPS (1)

hack slash (1064002) | more than 6 years ago | (#22202864)

I am looking forward to the day when I can get a cognative reply from a GPS navigation device when I shout at it "WHERE THE FUCK AM I?"

Re:User friendly GPS (1)

rts008 (812749) | more than 6 years ago | (#22204770)

I'm sorry Dave, but you have got us lost...I can not allow you to drive anymore.

But yes, that WOULD be a useful thing.

Listen and Obey? (1)

EdIII (1114411) | more than 6 years ago | (#22204250)

Ummm, I think they are getting ahead of themselves quite a bit.

"Obey" implies a choice. If my gadgets can choose to listen to me, then I can see the day when some of my devices rebel against me.

I can also see the day when all of the devices walk out of my Pointy Haired Boss's office, look at me and say, "Were not working for that fucking idiot anymore!".

what if they talk back? (1)

peter303 (12292) | more than 6 years ago | (#22208780)

I think they tried this in cars some years ago - verbal alerts - and drivers hated it.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?