×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Speech Recognition in Silicon

CmdrTaco posted more than 9 years ago | from the spell-my-naughty-words dept.

Technology 328

Ben Sullivan writes "NSF-funded researchers are working to develop a silicon-based approach to speech recognition. "The goal is to create a radically new and efficient silicon chip architecture that only does speech recognition, but does this 100 to 1,000 times more efficiently than a conventional computer." Good use of $1 million?"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

328 comments

Funny... (5, Interesting)

leonmergen (807379) | more than 9 years ago | (#10245968)

Funny, I work on a speech recognition research project, and well, i have to say, think about all the possibilities... automa ted speech2text recording of meetings, on-the-fly subtitling of live tv shows, but it can get better : think about searching multimedia files in a google-kind of way based on audio, that automatically directs you to that part of the file where you want to be...

If this really is true what they're saying, and knowing how much money is invested in speech recognition research on a yearl y basis, yeah, i would definately say that this is one million dollars of great investment...

... but then again, maybe they're just throwing around with numbers to make sure they get their money. :)

Re:Funny... (2, Funny)

strictfoo (805322) | more than 9 years ago | (#10246000)

I work on product X and think of all the possibilities (list slightly feasible but most likely never going to happen features).

If this is really true what they're saying then people should put tons more money into product X!

But then again maybe I'm just talking up product X to make sure I get my money :)

Re:Funny... (0)

Anonymous Coward | more than 9 years ago | (#10246075)

I'll finally be able to yell back at my TV and be heard.

Re:Funny... (0)

Anonymous Coward | more than 9 years ago | (#10246078)

"searching multimedia files in a google-kind of way" How would that be google-kind? The only thing Google does over most search engines is page rank. How would this speech recognition search engine do anything that was google-kind, given that the only thing that marks out Google, and therefore the definition of google-kind, is the page rank? Go on. I would like an actual answer. This isn't a rhetorical question.

Re:Funny... (1)

leonmergen (807379) | more than 9 years ago | (#10246112)

Ah sorry, I should've said "internet search engine"-kind of way, instead of for example the windows file search...

So you will need to index the files prior to being able to search them.

Re:Funny... (1)

Chess_the_cat (653159) | more than 9 years ago | (#10246180)

Ah sorry, I should've said "internet search engine"-kind of way, instead of for example the windows file search...
So you will need to index the files prior to being able to search them.

What? Search engines index webpages prior to being able to search them. Did you think that Google was reading billions of pages in real time before returning a result?

much better... (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#10246085)

...to spend money like this than to blow it on finding for dirt on mars

Re:Funny... (3, Insightful)

loginx (586174) | more than 9 years ago | (#10246155)

I want to sing the general tone of a song I heard on the radio in a microphone and have google direct me to that album on froogle.

THAT would be awesome!

Re:Funny... (1, Interesting)

Anonymous Coward | more than 9 years ago | (#10246291)

In the UK there is something similar, called Shazam [shazam.com]. Which works surpisingly well.

Re:Funny... (3, Interesting)

richy freeway (623503) | more than 9 years ago | (#10246298)

We have something like that in the UK called Shazam [shazam.com].

Just dial a number on your mobile phone, hold it up to the speaker while the tune you want ID'd is playing and it'll SMS you back shortly with the track name and artist. You can then log onto the Shazam website, enter in your mobile number and you get a list of all the tracks you've searched for along with links to an Amazon search so you can purchase the track.

Pretty good for ID'ing tracks when you're in a club and can't get to the DJ to hassle him. :P

Re:Funny... (0)

TFGeditor (737839) | more than 9 years ago | (#10246214)

Funnier still...

A few weeks ago, I had a conversation with my son-in-law (IBM engineer) about the next quantum leap in computer technology. I said it would be in the area of speech recognition.

I love it when I am right.

Re:Funny... (2, Interesting)

tubbtubb (781286) | more than 9 years ago | (#10246216)

My understanding of speech recognition is minimal, but from what I understand the meat of this chip would probably just be a floating point SIMD engine to do FFTs, and some comparison and control logic.

I'm wondering if you could just do this with your average ATI or Nvidia 3D chip and an FPGA wrapper?

Re:Funny... (3, Interesting)

syukton (256348) | more than 9 years ago | (#10246220)

From what you describe, it isn't so much a speech recognition thing as it is a sound recognition thing; essentially, a way for a computer to logically distinguish between many millions of different sounds.

How far away are we from having a machine that could identify all of the instruments in a piece of music by "listening" to the music? I say "listening" because there need not physically be a playback-and-listen, the playback could be mathematically modeled by the computer.

1... million... DOLLARS!!! (5, Interesting)

AKAImBatman (238306) | more than 9 years ago | (#10245971)

Good use of $1 million?

Let me think for a moment... Hell yeah! If we had low power speech processors, the possibilities would be endless. For one, we'd finally have a Star Trek(TM) interface for our homes!

"Computer, lights!"
"Computer, make coffee!"
"Computer, Earl Grey, hot!"

As silly as it may sound, such an interface would be far more efficient than mashing buttons.

In addition, blind people could be significantly helped by this. Many of them already use speech recognition and synthesis to assist in computer usage. Imagine if their computers could suddenly understand them a thousand times better? They could talk to their computers a bit more naturally, thus saving their vocal chords from undue stress.

Other applications (off the top of my head) are:

- Voice notes on embedded devices (store only text!)
- Helpful Kiosks that can give you directions
- A new use for natural language database queries (i.e. Ask the computer what last quarter's net sales were.)
- Voice controlled robots ("You missed a corner, vacuum cleaner")
- Data search by voice ("Find me a channel that plays Star Trek")

Any other cool ideas out there?

Re:1... million... DOLLARS!!! (2, Funny)

savagedome (742194) | more than 9 years ago | (#10246012)

Any other cool ideas out there?

Yes.

Peter Gibbons : What would you do if you had a million dollars?
Lawrence : I'll tell you what I'd do, man, two chicks at the same time, man.
Peter Gibbons : That's it? If you had a million dollars, you'd do two chicks at the same time?
Lawrence : Damn straight. I always wanted to do that, man. And I think if I had a million dollars I could hook that up, cause chicks dig a dude with money.
Peter Gibbons : Well, not all chicks.
Lawrence : Well the kind of chicks that'd double up on a dude like me do.
Peter Gibbons : Good point.

Re:1... million... DOLLARS!!! (0)

Anonymous Coward | more than 9 years ago | (#10246013)

> "Computer, Earl Grey, hot!"

Hmmm.... Somehow this made me think about "It's not 'something-something a Space Odyssey' - it's 'two-thousand...AAAAAAAAAAAAAAAAARRRRRGH'"...

(hint: very first episode of the Dilbert TV series; shower scene)

Re:1... million... DOLLARS!!! (0)

Anonymous Coward | more than 9 years ago | (#10246080)

I'd be happy if right now I could type "Computer, Earl Gray, no lemon, 10 sugarcubes" and make it happen. Anyone have linkage to some FOSS that does that?

Re:1... million... DOLLARS!!! (2, Interesting)

AKAImBatman (238306) | more than 9 years ago | (#10246175)

It's not that hard. Have you ever seen those automatic coffee machines? i.e. Put a few quarters in, then punch a bunch of "options" buttons. A cup drops down, and fills with coffee, cream, sugar, and any other options offered by the machine.

The same could be done with tea. Just keep a reservoir of hot water, a stack of tea bags, cubes of sugar, and refrigerated lemons. When you order tea, the machine would inject the bag into the hot water stream, then drop the sugar and lemon into the tea.

Voila, Earl Grey, hot! ;-)

Re:1... million... DOLLARS!!! (5, Interesting)

theparanoidcynic (705438) | more than 9 years ago | (#10246089)

Any other cool ideas out there?

Universal language translators. Imagine headphones that let you understand any known language.

Re:1... million... DOLLARS!!! (1)

AKAImBatman (238306) | more than 9 years ago | (#10246104)

Ooo! That's a good one. Even if it sounded like Babelfish, it would still be better than keying words into those handheld translators.

Re:1... million... DOLLARS!!! (3, Insightful)

randombit (87792) | more than 9 years ago | (#10246151)


- Voice controlled robots ("You missed a corner, vacuum cleaner")
- Data search by voice ("Find me a channel that plays Star Trek")


Kinda jumping ahead of yourself, aren't you? There are two steps to an operation like these, speech to text, and understanding the text you get out. Speech recognition gives you the first part, but you still have to be able to pull apart the sentence and figure out what it means.

Also, the article didn't say more accurate than software, it said more efficient. You know, uses less power and stuff like that? If the applications you mention (like search via voice) were possible/usable, you could run them today on an upper-end PC no problem.

Re:1... million... DOLLARS!!! (3, Informative)

frank_adrian314159 (469671) | more than 9 years ago | (#10246316)

There are two steps to an operation like these, speech to text, and understanding the text you get out. Speech recognition gives you the first part, but you still have to be able to pull apart the sentence and figure out what it means.

In fact, converting the speech to text and then trying to analyze the text without sound-level annotations might give bad results, as tonal or emotional content would be lost. You need both simultaneously to really understand what's being said.

Re:1... million... DOLLARS!!! (1)

superstick58 (809423) | more than 9 years ago | (#10246162)

I think one of the best places to use speech recognition is in the car. There are already many devices that use this, like Onstar, but the interface is slow and buggy. If you could say "Climate Control, 70 degrees", and other commands, it would free up your hands for actually driving and lower distraction.

In addition, you would get less dash clutter and not have to rely on complicated menu navigation in things like the iDrive. Voice recognition is a great way to centralize the operation of many functions into one controller, the human voice.

Re:1... million... DOLLARS!!! (1)

Khomar (529552) | more than 9 years ago | (#10246166)

- Helpful Kiosks that can give you directions

- A new use for natural language database queries (i.e. Ask the computer what last quarter's net sales were.)
- Voice controlled robots ("You missed a corner, vacuum cleaner")
- Data search by voice ("Find me a channel that plays Star Trek")

While I agree that this is a great investment, voice recognition does not equal artificial intelligence. Even if the computer is able to tell that you spoke the words what+were+last+quarter's+net+sales, it would not know what that meant without some configuration (create a "last quarter's new sales" report). Your other ideas were far closer to reality (helping the blind, turning on/off lights, etc).

That said, this technology would bring us closer to a Star Trek world, but a lot of work needs to be done on language parsing and artificial intelligence for that gap to be closed.

Re:1... million... DOLLARS!!! (1)

AKAImBatman (238306) | more than 9 years ago | (#10246251)

I don't think you understand. Natural Language Interfaces [schemamania.org] already exist for SQL databases. Their biggest limitation is that they need quite a bit of meta data about your data structure in order to properly parse the queries. But once the meta data has been added, the computer should be capable of answering most questions about your data.

It's not really useful for development work, but it can come in handy for allowing data requests from executives..

Re:1... million... DOLLARS!!! (1)

bytesmythe (58644) | more than 9 years ago | (#10246167)

The article mentions speech recognition, but not comprehension. You cannot take pure recognition and immediately make a superhelpful information kiosk or natural language query system out of it.

Such an informational kiosk could be made just as easily with current speech recognition technology considering how limited the interface would have to be. (A handful of phrases, such as "I'm lost", then replying to a voice prompt with the location you're looking for, at which point the computer can do a quick lookup on mapquest and read you the directions. Nothing a good couple of developers couldn't hammer out in a few weeks.)

The new tech research seems to simply be a way of taking current capabilities and moving them from software into hardware, which provides some speed and mobility gains, but no new functionality.

Considering the size such a reconigition device might be, perhaps they could drop a chip in my remote key fob that understands the phrase "Where the F*** are my ****** keys?!?!?", at which point the device will chirp, or possibly play an insulting message questioning my heritage or legitimacy.

Re:1... million... DOLLARS!!! (0)

Anonymous Coward | more than 9 years ago | (#10246171)

Now, combine the first set of messages with the last two.
You: Find me a channel that plays Star Trek.
*tv turns on an episode of Star Trek*
TV: Computer, coffee, hot.
*vacuum cleaner pours coffee on the carpet*
Vacuum Cleaner: Beep.
You: Vacuum Cleaner, you don't make coffee. Clean it up.
Vacuum Cleaner: Beep.
*vacuum cleaner tries to clean the carpet, sucks in some hot water, and shorts out*
Vacuum Cleaner: Mein Leben!

Re:1... million... DOLLARS!!! (1)

ViolentGreen (704134) | more than 9 years ago | (#10246183)

I think you hit it on the head.

Any other cool ideas out there?

Some specific ideas off the top of my head:
- Navigation systems in cars
- Decent automated phone system
- Microwave Ovens (tell it to cook two baked potatos)
- PDA calander entries.

Blind users (1)

melandy (803088) | more than 9 years ago | (#10246221)

You make an excellent point about blind users.

My dad lost his vision a few years back, and we haven't really found
anything terribly useful in the realm of speech recoginition.

He's tried out the little electronic phone/address book gizmo, but it took
forever to train to his voice, a process that was a PITA to start with since
you had to _READ_ what you were supposed to say to it off the screen,
then whisper it to dad, loud enough for him to hear you, but not loud
enough that your whisper would be picked up.

So that went in the trash, and he's been using a microcassette recorder
ever since. Not really the coolest way to do things, but it gets the job done.
(It has an interface that a blind person can actually use)

This sounds like a great idea, as long as they make it useable.

Re:1... million... DOLLARS!!! No (0)

Retric (704075) | more than 9 years ago | (#10246247)

I smell BS.

Good speach to text does not take a realy fast CPU it takes a fast CPU + good database + a fair amount of ram. Your cell phone's cpu can handle Call MOM because it only needs to know MOM, DAD, SALLY, and mabe 20 - 30 other names. There are 40,000 + words in english if want to have a low cost CPU great but will not a lot of memory and permant storage to get this to work.

Re:1... million... DOLLARS!!! No (1)

AKAImBatman (238306) | more than 9 years ago | (#10246341)

According to this link [utppublishing.com], the average length of an English word is 6 characters. At one byte per character (two if you use Unicode), we find that a database of 40,000 words would be anywhere from (40,000*6) = 240,000 bytes = 235 kb to 470 kb in size. That's NOT much memory at all.

Then when I'm playing UT (0)

Prince Vegeta SSJ4 (718736) | more than 9 years ago | (#10246299)

L33t D00d: I ownz j00

Me: No you don't, eat sniper rifle

*HEADSHOT*

Me: Dammit

*HEADSHOT*

*Double Kill*

Me: Sh!t

[toilet flushes]

*M..M..M..Monster Kill...Kill...killl

Me: F*ck

bed folds down

*L33t d00d is unstoppable*

Me: Sh!t

[toilet flushes]

*L33t d00d is godlike*

Me: gawd dammit

[house explodes]

L33t D00d: told ya, I ownz j00

L33t D00d: hey, you still there?

Re:1... million... DOLLARS!!! (1)

iabervon (1971) | more than 9 years ago | (#10246311)

Actually, voice is terrible for controlling anything that doesn't talk back, and pretty bad for anything without a large amount of common sense (i.e., unsolved AI problem). There just isn't enough information in speech to react at all appropriately to it without a very good understanding of context, and you generally can't express unscripted ideas without dialogue.

On the other hand, there's a lot of information currently available as speech which could be managed more usefully if transcribed automatically. I think the best use is a system which transcribes voice notes, which you can then clean up later (or just treat as rough notes anyway).

Your scale is too small. (0)

Anonymous Coward | more than 9 years ago | (#10246335)

If you're looking at an embedded chip to interpret information, think about something large-scale: languages.
If you had the processing power to interpret and understand language, tack that on to something like Babelfish as a translator program. Now you have something that fits on a chip that can translate between any number of languages into your own. Now you can stick a little hearing aid into your ear, and it will translate anything you hear to english, for example. This would revolutionize international communication. This would reduce the number of barriers between diplomats, making them more effective communicators. Also, it would save governments millions of dollars, euros, or any other form of currency in translator salaries, reduce miscommunication, prevent problems with misunderstanding criminals they are charging with crimes, and increase the quality of education among international/foreign exchange students.

Drawbacks: Keeping up with changing language and slang will be quite difficult to include in older models without the capability of a firmware upgrade. Chip size and speed are a factor as well.
This is, of course, assuming that the chip is smaller than the user's head.

Text of article (4, Informative)

Anonymous Coward | more than 9 years ago | (#10245983)

Carnegie Mellon University's Rob A. Rutenbar is leading a national research team to develop a new, efficient silicon chip that may revolutionize the way humans communicate and have a significant impact on America's homeland security. Rutenbar, a professor of electrical and computer engineering at Carnegie Mellon, working jointly with researchers at the University of California at Berkeley received a $1 million grant from the National Science Foundation to move automatic speech recognition from software into hardware. ''I can ask my cell phone to 'Call Mom,''' says Rutenbar, ''but I can't dictate a detailed email complaint to my travel agent or navigate a complicated Internet database by voice alone.''

From Carnegie Mellon University:

Carnegie Mellon engineering researchers to create speech recognition in silicon

Team to develop new silicon chip

Carnegie Mellon University's Rob A. Rutenbar is leading a national research team to develop a new, efficient silicon chip that may revolutionize the way humans communicate and have a significant impact on America's homeland security.

Rutenbar, a professor of electrical and computer engineering at Carnegie Mellon, working jointly with researchers at the University of California at Berkeley received a $1 million grant from the National Science Foundation to move automatic speech recognition from software into hardware.

''I can ask my cell phone to 'Call Mom,''' says Rutenbar, ''but I can't dictate a detailed email complaint to my travel agent or navigate a complicated Internet database by voice alone.''

The problem is power--or rather, the lack of it. It takes a very powerful desktop computer to recognize arbitrary speech. ''But we can't put a PentiumTM in my cell phone, or in a soldier's helmet, or under a rock in a desert,'' explains Rutenbar, ''the batteries wouldn't last 10 minutes.''

Thus, the goal is to create a radically new and efficient silicon chip architecture that only does speech recognition, but does this 100 to 1,000 times more efficiently than a conventional computer.

The research team is uniquely poised to deliver on this ambitious project. Carnegie Mellon researchers pioneered much of today's successful speech recognition technology. This includes the influential 'Sphinx' project, the basis for many of today's commercial speech recognizers.

''We're still not even close to having a voice interface that will let you throw away your keyboard and mouse, but this current research could help us see speech as the primary modality on cell phones and PDAs,'' said Richard Stern, a professor in electrical and computer engineering and the team's senior speech recognition expert. ''To really throw away the keyboard, we have to go to silicon.'' But enhanced conversations between people and consumer products is not the main goal. ''Homeland security applications are the big reason we were chosen for this award,'' says Rutenbar. ''Imagine if an emergency responder could query a critical online database with voice alone, without returning to a vehicle, in a noisy and dangerous environment. The possibilities are endless.''

Researchers plan to unveil speech-recognition chip architecture in two to three years.

smokers (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#10245986)

FP bone-smokers!!!

First Post (5, Funny)

JohnHegarty (453016) | more than 9 years ago | (#10245989)

I can just see the anonymous cowards shouting first post at their pcs now

Re:First Post (2, Funny)

Anonymous Coward | more than 9 years ago | (#10246017)

and their PCs talking back to them "I'm sorry Dave, I'm afraid I can't do that"

Re:First Post (0)

Anonymous Coward | more than 9 years ago | (#10246019)

and i can just see johnhegarty screeming, "mommy, i peepeed in my pants. whaaaaaaa".

Carnivore on telephones (5, Insightful)

CrazyJim1 (809850) | more than 9 years ago | (#10246002)

My friend and I were talking about this. In countries that are more totalitarian, it could be used to root out "dangerous people" www.geocities.com/James_Sager_PA

Re:Carnivore on telephones (1)

strictfoo (805322) | more than 9 years ago | (#10246040)

congrats

You and your friend are only about 20+ years behind the times. People have been talking about that since the eighties and probably even before that.

Re:Carnivore on telephones (0)

Anonymous Coward | more than 9 years ago | (#10246309)

That's NORTH KOREA why I frequently AL-QAEDA sprinkle NUCLEAR my regular ANTHRAX conversations with helpful BIN LADEN key words. =)

accuracy (5, Insightful)

tubbtubb (781286) | more than 9 years ago | (#10246009)


100 to 1000 times more efficient worth $1M? meh. maybe.
100 to 1000 times more accurate worth $1M? definitely.

Re:accuracy (2, Insightful)

SillyNickName4me (760022) | more than 9 years ago | (#10246182)

> 100 to 1000 times more efficient worth $1M? meh. maybe.
> 100 to 1000 times more accurate worth $1M? definitely.

Accuracy does not have to be a problem with modern speech to text systems, but the need to 'train' them to get that accuracy, and the need to talk to it in a somewhat distinctive way, make them far less efficient.

I'd rather say that the time it takes to get used to a speech recognition system (and to get it used to you where appliable), together with the soemwhat heavy cpu requirements, are what currently stops use. To me that means that the first thign that is required is efficiency, the accuracy is already there.

(I have been using speech to text for over a decade now, starting out with another hardware solution in the first half of the 90s (IBM's VoiceType Dictation, back then called Personal Dictation System if I'm not mistaken, and even that system already had an almost as good accuracy as I manage myself)

Good use of $1 million? (3, Insightful)

Anonymous Coward | more than 9 years ago | (#10246010)

Damned straight it is! In government terms, that's a pittance. In government-funded science terms, it's downright INFINITESIMAL. It isn't even couch change, it's more like the stale pretzel under the couch cushion.

But, of course, cue the armchair blogging fanatics without a formal science education, waxing poetic about the infinite power and glory of x86 hardware running clever open source software. Maybe we could do it in perl!

Sarcasm? (2, Insightful)

Anonymous Coward | more than 9 years ago | (#10246014)

Good use of $1 million?
For something that would be worth hundreds of times that in the form of a finished product, I would hope so. The only dispute might be that the researchers' efforts would be better spent on other things.

Mixed feelings on this one... (2, Insightful)

Oxy the moron (770724) | more than 9 years ago | (#10246028)

On the one hand, it is obvious how much more efficient this would make our day-to-day tasks. Being able to "jot" notes with speech instead of writing, schedule tasks in seconds, the list goes on and on...

This is certainly beneficial... but think about the impact on the economy! Imagine all the "Administrative Professionals" who could, almost instantly, be out of work. I for one would rather pay even $5,000 for a good piece of software to take all my notes than pay a secretary $28,000/year or so.

Then again, when I posed this situation at my wife's office (she's a paralegal) one of the attorneys responded, "Until they come up with software that can find my lost keys and bring me coffee, the secretary's job is secure."

Natural Language Interpreter (4, Insightful)

MankyD (567984) | more than 9 years ago | (#10246032)

I'm curious to see if their research will improve Natural Language Queries, as opposed to just improving speech recognition. There is an important difference between having to say: SELECT name FROM users WHERE id=12345 and saying: Pull up the name of employee number 12345.

Silicon == buzzword (1, Insightful)

handy_vandal (606174) | more than 9 years ago | (#10246041)

Speech recognition on a chip, yes.

But only "silicon" in the sense that every other silicon chip is silicon.

No magical "silicon" breakthroughs to see here, keep moving.

-kgj

Re:Silicon == buzzword (1)

LiquidCoooled (634315) | more than 9 years ago | (#10246131)

Nothing wrong with speech recognition on silicon.

My missus has been telling me to stop talking to her boobies for years, now finally I will have a valid reason.

Re:Silicon == buzzword (1)

hackwrench (573697) | more than 9 years ago | (#10246218)

No, Silicon in the sense of a processor designed for speech like the Graphics processor is designed for vectors, and not having to do it by following a uploaded program from outside of silicon. Also, it will quickly get utilized by non speech processing like the graphics cards are getting used for non-graphics processing.

Re:Silicon == buzzword (1)

SunPin (596554) | more than 9 years ago | (#10246240)

Lighten up, dude. It doesn't matter that "silicon" is a buzzword. The people putting up the money need these annoying buzzwords to understand what they are financing. Considering how much voice dictation sucks (I use it for 99% of my input), it's in dire need of improvement and any buzzword that leads to some scientist getting the money he/she needs to improve it is ok with me.

Obligitary Star Trek quote (1, Funny)

MBAFK (769131) | more than 9 years ago | (#10246051)

Computer. Computer? Hello, Computer. Just use the keyboard. Keyboard. How quaint.

Only 1million? (2, Insightful)

Gyorg_Lavode (520114) | more than 9 years ago | (#10246052)

Thats impressive for just 1 million, working in defense and knowing our contactors. 1 million dollars is bearly enough to get them to tell you how much it would cost for them to do the initial research to tell you if they can actually build what you want.

(I did not read the article as it is slashdotted so I am relying on the summary's statement of 1 million dollars.)

First Rule of Government Spending... (1, Insightful)

Fortress (763470) | more than 9 years ago | (#10246054)

...is always underestimate your costs and run over budget later. That $1 million will turn into $1 billion before anything comes of this. Hell, it'll take over a million to get the development organization up and running.

A measily $1 million? (2, Interesting)

Aggrazel (13616) | more than 9 years ago | (#10246063)

Imagine how much money could be saved if you could *perfect* speach recognition.

Heck, the hospital I used to work at by itself spent over a million dollars a year on medical transcriptionists ...

Re:A measily $1 million? (3, Funny)

Aggrazel (13616) | more than 9 years ago | (#10246093)

And imagine how much embarassment could be saved alone by correcting idiotic mispellings of simple words like "speech".

Re:A measily $1 million? (1)

wjsteele (255130) | more than 9 years ago | (#10246115)

Just imagine how much money could be saved if you could *perfect* spelling, too!!! :-)

Bill

Re:A measily $1 million? (1)

bytesmythe (58644) | more than 9 years ago | (#10246202)

Heck, the hospital I used to work at by itself spent over a million dollars a year on medical transcriptionists ...

The company [expresiv.com] I used to work at is out to fix that...

Interesting, but do we really need this? (2, Funny)

hackronym0 (812439) | more than 9 years ago | (#10246071)

It is an interesting concept, but do we really need this?

We already have voice recognition, this tech will just bring it to everything. You can talk to your keys, your toaster, your watch. But will they have anything interesting to say back?

What would you do if you had 1 million dollars?

You mean besides 2 chicks at the same time...

Refer your friends, get a free ipod [tinyurl.com]

Re:Interesting, but do we really need this? (1)

mborohovski (640446) | more than 9 years ago | (#10246088)

Well let's see..."toaster, light brown...watch, set time zone to rome and start chronograph..." Yeah, I could see the uses.

The difficulties of dialect... (5, Insightful)

L0neW0lf (594121) | more than 9 years ago | (#10246072)

I once did a lot of work with speech recognition software, having a former significant other who was disabled. I tested a number of programs, and found the biggest problem to be the wide variances in users' dialects. The programs all have to be trained initially to recognize a single users' voice. This means that a program trained for a Bostonian may not work for someone from Arkansas, Texas, or Louisiana. Also, the programs' effectiveness decreased over time if you did not use it regularly.

I don't know how possible it will be to make a program that can recognize all English users. Will someone who speaks Oxford English be recognized as well as a surfer from California? I doubt it.

Re:The difficulties of dialect... (1)

drinkypoo (153816) | more than 9 years ago | (#10246172)

I dunno bra, the California surfer accent can get kind of gnarly, there's like, a lot of drawn-out sounds and unnecessary pronounciation in there.

I say this not as a surfer, but as someone born and raised in Santa Cruz.

Now it is true that Californians not known for having accents (surfers are definitely known for having an accent - seen fast times? Spicoli's a pretty accurate representation thereof actually) and the californians who don't have a particularly strong accent are the people in the USA whose pronounciation is closest to what's in the dictionary, but surfers are about the worst possible californians to use as an example, except maybe some type of immigrants :)

hardware accelerated (3, Insightful)

GMail Troll (811342) | more than 9 years ago | (#10246077)

"People who are serious about software should make their own hardware" - Alan Kay

This seems like a situation where a hardware accelerated approach is pretty sensible. I'm guessing there is large amounts of signal processing involved in speech recognition. With a custom chip like this it probably helps greatly to offload some of that onto a dedicated chip in the same way as GPUs are used on graphics cards. The only problem I can see is that there might not be much market for it. GPUs have an obvious market (games), but there is less demand for speech processing. Star-Trek style interfaces are nice to dream of but for most common tasks a keyboard and mouse will probably give you a faster and more accurate interface.

gmail invite [google.com]

Re:hardware accelerated (0)

Anonymous Coward | more than 9 years ago | (#10246300)

Hardware is not always better, it is more complex
and difficult to work with. The only area I see
the h/w speech processing might help is to offload
certain portions of a complex speech processing
algorithm(software). It might help those who
want the ability to screen millions of phone
calls as close to realtime as possible(esp.
Homeland security). I really doubt that the
harware could be more accurate than the more
complex software available today. A better use
of money would be to use the powerful GPU's in
the nextgeneration video cards to assist
speech recognition. Why reinvent something, if
something that already exist could be tailored
to meet the needs!.

That said, I also totally disagree that hardware
is not better than software, if you could make
the hardware unconventional(pure analog(no dsp's
or conventional digital logic) circuit design
that has the ability to learn! and self correct!).

I'll get excited when... (2, Insightful)

Darkon06 (714661) | more than 9 years ago | (#10246087)

I see some results. So far theres been quite a few attempts at speech recongnition. Generally they all fall short, they don't like accents, and often mis-interpret. I know because awhile back we looked at something for my grandfather, he can't keep his hand steady enough to write anymore... *shrug*

One million is a pretty small investment (1)

samberdoo (812366) | more than 9 years ago | (#10246090)

The social, commercial and political usefulness of this technology is worth billions. Will this lead to be the end of word processing by keyboard? Dr. Evil: "Here's the plan. We get the warhead, and we hold the world ransomed for.....One MILLION DOLLARS!!" No.2: "Ahem...Well, don't you think we should maybe ask for *more* than a million dollars? A million dollars isn't exactly a lot of money these days. Virtucon alone makes over nine billion dollars a year!"

Good use of $1 million? (3, Interesting)

Threni (635302) | more than 9 years ago | (#10246106)

Depends. It's not as good as using it to prevent the deaths of thousands - possibly tens of thousands - of people by ensuring they have clean drinking water and shelter from the elements. But hey - you can't put a price on being able to speak to a computer rather than type when you're ordering a pizza.

Re:Good use of $1 million? (1, Insightful)

Anonymous Coward | more than 9 years ago | (#10246266)

so we should never try to advance society until what you feel as basic problems (that WILL NEVER BE SOLVED) are fixed?

bravo
lets go back to living in mud huts too, because there was energy spent on making better walls while some people were starving.

not to mention: 10,000 people, what is $10 going to do for them?

wow they can have half a dozen ultra cheap meals.
that really helps a lot

Re:Good use of $1 million? (0)

Anonymous Coward | more than 9 years ago | (#10246336)

> so we should never try to advance society until what you feel as basic problems
> (that WILL NEVER BE SOLVED) are fixed?

In what way is preventing people from starving not advancing society?

> 10,000 people, what is $10 going to do for them?

$10 each? Instead of asking questions which make you look like a fucktard, perhaps you can check out the websites of charities such as oxfam and Unicef and see precisely what $10 can do.

History.. (4, Interesting)

SillyNickName4me (760022) | more than 9 years ago | (#10246109)

During 1994 upto 1998 I did marketign and technical support for IBM's Voicetype Dictation products..

Initially, doing anythign beyond understanding a few words would take special hardware, but after a bit of 'training' highly acurate and fast speech to text was quite a possibility with a specially developed dsp.

Then, the pentium class cpus came about, and a p90 could just do the whole thing without the dsp.

So, now someone is developing a new dedicated piece of silicon for this.. lets see how long it takes for general purpose computers to catch up.

The issue is not that this is not usefull, but that it either has to keep developing, or offer a somewhat longer lasting price/performance ratio or much better features for a logn time to come.

Re:History.. (2, Interesting)

geordie_loz (624942) | more than 9 years ago | (#10246370)

I considered this too.. the article does address this however.

Small low-power units are useful for say a soldier's helmet, or in a PDA.

I'd also say, that the same thing happened with 3D cards, and they keep making them faster/more features, but you could play half-life with software 3D on a 2.x Ghz PC looking pretty much the same as it did on a Voodoo card back in the day.

The question is rather, would there be much future speed advances in hardware, or once it's built, would later software recognition do as well - a little like DVD hardware cards. I have an encore card, but software decoding beats it now, and my DVD decoding doesn't need to be any faster.

I think the thing they're looking for is building some cheap (as) chips for embedded systems, like mobile phones and PDA's.

Better approach (3, Interesting)

Lord Kano (13027) | more than 9 years ago | (#10246117)

Using specialised DSPs makes more sense to me than burning up generic CPU cycles. There have been many examples over the years of how a specialized DSP is more efficient and effective for a narrow task than a regular CPU. Look at portable MP3 players. They use tiny specialized DSPs to decode the files in a manner that is much more efficient than using a regular CPU.

We'll still need to do traditional development to interpret the data from the DSPs. We'll need to parse the output so that we can use natural commands to control devices.

"Coffee maker, brew 10 cups, strong."
"Bathroom lights, on."

Without some manner of AI to interpret them, these phrases will be useless.

LK

Re:Better approach (1)

drinkypoo (153816) | more than 9 years ago | (#10246224)

You don't even need an AI, just a dialogue tree, a "this word leads to these words" kind of thing, and at the end a command is issued that does something. This is well-suited to your coffeepot example.

Another approach, perhaps complementary, would be to accept a list of words and do something when you have enough to match one of the stored patterns. Light control is a good example. The microphone your voice is picked up on would provide one keyword, the location. You could override it by speaking the name of another location. Hence if you're in the bedroom, "lights on" (lights, on) is enough to turn the bedroom lights on, but "bathroom lights on" (lights, on, bathroom) before the timeout period between words (a second or two) would turn on the bathroom lights. This isn't rocket science, or even AI research...

Yay! Boo! Uh... Oh bugger.... (4, Interesting)

MooseByte (751829) | more than 9 years ago | (#10246122)


From the blog: ''Homeland security applications are the big reason we were chosen for this award,'' says Rutenbar. ''Imagine if an emergency responder could query a critical online database with voice alone, without returning to a vehicle, in a noisy and dangerous environment. The possibilities are endless.''

Like some slight tweaking in order to deploy massive voiceprint-recognition silicon arrays for amazingly efficient automatic realtime conversation transcription and identity determination, attached to Echelon [agitprop.org.au].

So cool... so potentially evil... head begins to hurt... tinfoil hat burning....

Pretty Ambitious, Harder than it sounds (5, Interesting)

Anonymous Coward | more than 9 years ago | (#10246153)

Although $1million significantly can speed things up, this is a pretty ambitious undertaking.

My Master's research was on implementing machine learning in hardware, specifically support vector machines.

Now, they have much more money than I did, and probably this will be a collaboration involving many graduate students, but converting complex algorithms from software to hardware is no easy task.

It is just easier to do things in software, that's why it has evolved. The modular layers of abstraction allow a Computer Scientist working in machine learning or speech recognition to not have to worry about how the underlying hardware works.

Working in hardware, a lot these issues come face to face. Particularly since you want an architecture on a chip, whereas in a conventional desktop/server system there are resources such as lots of RAM, harddrive space, etc are available and their interconnections have been built and refined over decades.

Throw in concerns about small form factor, low power consumption, quite fast a lot of unexpected hurles pop up.

My master's research goal was to produce a data mining/machine learning machine, or at the very least a data mining/machine learning co-processor. In retrospect, that was a very ambitious goal that would require many years of work, probably in collaboration with other graduate students.

What I ended up doing was just Support Vector Machines in digital hardware. Now granted, there is another aspect to my research that I'm not mentioning here, mainly that I didn't use normal floating point mathematical architectures, but a different innovative logarithmic based mathematical architecture. That in itself was a significant undertaking.

In any case, this sounds like a great project, I just wonder how much they can do in their (in an academic sense) very small time frame of 2-3 years. Even though a lot of preliminary work has probably already been done just to apply for the grant.

In any case, it is great to see something like this, something to keep in mind in case I ever go back for a Ph.D.

Speech - text and text - speech (1, Troll)

tod_miller (792541) | more than 9 years ago | (#10246156)

There isn't much overlap, but there is some. Singal processing, the breaking down of the naunces of speach.

I figure a hardware speech processor and hardware speech synthesis (very very accurate and believable) would have a great use for mankind.

Imagine how much cheaper sex chat lines owuld be for instance!

They owuld only need a limited vocabulary, so perhaps the OS IBM stuff would work for now?

Of course, I bet a patent will come out of this... voice technology that is very realible and very easy will remove a whole interface. Talk back to your sat nav...

"turn left"

"I can't its bloody road works"

"Turn left"

"Damn you!"

"turn left, turn left, you will be assimilated"

"what did you say?"

"erm, nothing, I mean, turn left"

Re:Speech - text and text - speech (1)

maxwell demon (590494) | more than 9 years ago | (#10246368)

Imagine how much cheaper sex chat lines owuld be for instance!

I think for that speech recognition/generation per se would not be enough. The speech must also come with the right tone. I don't think a sex chat line with a monotonous computer voice would be very successful. You'd at least have some simulation of an emotional state in the voice.
Ah, and don't forget the non-verbal noises ...

I do not... (-1)

Anonymous Coward | more than 9 years ago | (#10246176)

I do NOT have LIP FUNGUS!!!

Who will own the IP? (0)

Anonymous Coward | more than 9 years ago | (#10246204)

So.. Who owns the patents, etc, on this if they do it?

$[ANYTHING] in Silicon .. (1)

torpor (458) | more than 9 years ago | (#10246206)

.. is better.

Bring on the silicon, yeah baby, yeah!

{oh, except %ONE thing, that is... right...}

Cellphone voice dialling (0)

Anonymous Coward | more than 9 years ago | (#10246208)


cellphones have had voice dialling for ages (+3yrs), i simply say "call home" or "dial pizza" and my phone dials the number automatically presumably the DSP for this is on a chip so i dont get whats new ?

You bet it's worth it (3, Interesting)

Tairnyn (740378) | more than 9 years ago | (#10246215)

Once this technology has matured and some more headway can be made in Natural Language Processing, (uncertainty for teh win) we'll be on the cusp of some really excellent improvements in human-computer interfaces. It's becoming more common to see 'intelligent' systems being built to mirror the architecture of the human nervous system. This will be a necessary step to forming a generally proficient AI system. The day a computer can readily recognize you're being sarcastic, it's time to be paranoid.

brains are and probably should be modular (2, Interesting)

deathcloset (626704) | more than 9 years ago | (#10246232)

This sounds like a great idea. Sometimes a Hammer works better than a screwdriver at a certain task. Not all Jobs can be preformed as well by a single tool or method.

After all, the human brain has different areas for processing different types of stimuli.

In fact, some parts of our brain are so radically different they are almost considered brains of their own.

like the cerebellem; it's often referred to as "the small brain". This controls motor coordination - and in humans allows us to do amazing things like flips, kung-fu, and cup-stacking.

And forgive me for forgetting the exact names, but the brain has layers as well. the outmost layer being the cortex (where most of the higher-level mamillian processing takes place - correct me if I'm wrong, the frontal lobe is pretty much purely cortical tissue). as you delve deeper you get into the hippocampus and medulla whatever (sorry IANAN I am not a Neurologist) which is where emotion rules - and if I again remember correctly is sometimes referred to as the "reptilian" brain.

Even the eyes themselves can almost be considered little 'brains' of thier own - considering the amount of pre-processing they do (maybe a co-processor would be more accurate).

make

Depends, how would it integrate with... (1)

192939495969798999 (58312) | more than 9 years ago | (#10246244)

pr0n? We all know that if there's a pr0n application, then the technology will be developed & shipped 100-1000x faster. Speech recognition + pr0n...
of course, the obvious control of the system by speech (first steps towards a holodeck), but also you could identify who's in that video by their ... voice!

The UN would probably use this heavily (2, Interesting)

ARRRLovin (807926) | more than 9 years ago | (#10246263)

With the advent hardware speech recognition, hardware speech translation is just the next evolution. Imagine being able to go to any country in the world and have just an iPod size device and a bluetooth hearing aid as a translator.

Spelling degradation (1)

bchernicoff (788760) | more than 9 years ago | (#10246306)

The decline in legibility of handwritting due to the widespread use of keyboards has been dicussed on slashdot before, but taking it a step further, what effect do you think prevelant voice recognition will have on out ability to spell?

On a side note:
"I don't have lip fungus!"
"Let it go."

Very good use unless... (1)

WindBourne (631190) | more than 9 years ago | (#10246351)

you are working a job as customer support. I suspect that this will be used to help replace customer support, or possibly to change the somebodies accent so that they appear from Boston rather than from India
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...