Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Encrypted VoIP Meets Traffic Analysis

CmdrTaco posted more than 3 years ago | from the i-only-speak-in-binary dept.

Communications 98

Der_Yak writes "Researchers from MIT, Google, UNC Chapel Hill, and Johns Hopkins published a recent paper that presents a method for detecting spoken phrases in encrypted VoIP traffic that has been encoded using variable bitrate codecs. They claim an average accuracy of 50% and as high as 90% for specific phrases."

Sorry! There are no comments related to the filter you selected.

Video (0)

Anonymous Coward | more than 3 years ago | (#35492006)

Once they discover a method to wire trap encrypted video calls, that would open a new era in porn scene.

TFA != Wiretap (1)

Barryke (772876) | more than 3 years ago | (#35492740)

No it does not work like that (Wire tapping encrypted video calls).
It does not tap the signal, but increases your odds when guessing whether something was communicated in a specific manner.

Re:Video (1)

kmoser (1469707) | more than 3 years ago | (#35512712)

I'd tap that.

Bleh (-1, Redundant)

Desler (1608317) | more than 3 years ago | (#35492038)

They claim an average accuracy of 50% and as high as 90% for specific phrases."

So on average that can't do any better than chance. Wow such great results!

Re:Bleh (5, Informative)

Anthony Mouse (1927662) | more than 3 years ago | (#35492090)

I'm pretty sure that identifying a specific word with 50% accuracy is better than random chance. There are more than two words in the English language.

Re:Bleh (5, Funny)

Chrisq (894406) | more than 3 years ago | (#35492350)

Once they discover a method to wire trap encrypted video calls, that would open a new era in porn scene.

...

I'm pretty sure that identifying a specific word with 50% accuracy is better than random chance. There are more than two words in the English language.

Maybe he's talking about the porn film.90% seem to be "oh" or "yes" (or so i am told)

Re:Bleh (0)

Anonymous Coward | more than 3 years ago | (#35492602)

But both exclamations are pronounced in an infinite number of variations... or so I've heard.

Re:Bleh (4, Funny)

ciderbrew (1860166) | more than 3 years ago | (#35493002)

The pitch is the main thing in the art form.
A low German voice - "ooohhh yaaaaa", over and over. then you have the high pitched Japanese squeak sound - "ii, ii, ii, kimochi". Which really gets annoying these days. It took a few years; but it IS annoying.

Re:Bleh (0)

Anonymous Coward | more than 3 years ago | (#35508504)

then you have the high pitched Japanese squeak sound - "ii, ii, ii, kimochi". Which really gets annoying these days. It took a few years; but it IS annoying.

No, it was pretty much annoying from the beginning.

Still annoying. Rather a shame. Full on JAV, is
terrible for many reasons. At least when it's a JAV
starlet in the US, a third of the reasons go away.

-@|

Re:Bleh (1)

Virtual_Raider (52165) | more than 3 years ago | (#35507886)

You mean it doesn't amount to "fuck" and "shit"? The media and the internet have fooled me again!

Re:Bleh (0)

Anonymous Coward | more than 3 years ago | (#35492092)

Yes, because everyone knows there are only two phrases that can ever be spoken.

Re:Bleh (4, Funny)

zill (1690130) | more than 3 years ago | (#35492260)

A'LA'IH [xkcd.com]

Re:Bleh (0)

Anonymous Coward | more than 3 years ago | (#35492834)

best xkcd ever!

Re:Bleh (1)

ByOhTek (1181381) | more than 3 years ago | (#35492110)

People only use two phrases when they talk?

Re:Bleh (1)

Dalzhim (1588707) | more than 3 years ago | (#35492152)

Especially when being wiretapped.

Re:Bleh (1)

Anonymous Coward | more than 3 years ago | (#35492256)

People only use two phrases when they talk?

The phrases that it detects are "Badda-bing" and "Badda-boom."

Re:Bleh (1)

ciderbrew (1860166) | more than 3 years ago | (#35502926)

This should have got at least one +funny.

Re:Bleh (2)

NotQuiteReal (608241) | more than 3 years ago | (#35493236)

The two phrases are "can you hear me?" and "I have a bad connection, let me call you back."

Re:Bleh (2)

gstoddart (321705) | more than 3 years ago | (#35492124)

So on average that can't do any better than chance. Wow such great results!

I think if half the time you can identify a phrase in a supposedly encrypted stream ... that's better than 'chance'.

Re:Bleh (0)

Lumpy (12016) | more than 3 years ago | (#35492274)

Theyare looking for specific words and phrases...

Bomb, president, freedom, take back control, uprising, constitutional....

You know, only words that the evil terrorists would use.

Re:Bleh (1)

fnj (64210) | more than 3 years ago | (#35492908)

Oops ... wait a minute ...

Re:Bleh (4, Funny)

batquux (323697) | more than 3 years ago | (#35492146)

Come on, 50% is better than most unencrypted voice recognition!

Re:Bleh (1)

Lumpio- (986581) | more than 3 years ago | (#35492166)

I think there's a big difference in the probabilities of a coin toss and the probability of guessing the correct phrase of who-knows-how-many alternatives.

Re:Bleh (0)

Anonymous Coward | more than 3 years ago | (#35492174)

So on average that can't do any better than chance. Wow such great results!

It depends on how their accuracy is defined. (Unfortunately, it's paywall-ed, so I can't tell.)

If their accuracy is "percentage of the time they guess the correct word" (which seems the obvious choice) then guessing the right word 50% of the time is significantly better than random guessing. How many words are there in the English language - many tens of thousands at least.

50% recognition accuracy in _unencrypted_ speech wouldn't have been too shabby a decade or so ago.

Re:Bleh (1)

AlienIntelligence (1184493) | more than 3 years ago | (#35508648)

How many words are there in the English language - many tens of thousands at least.

Many tens of thousands???

I hope English is your second language.

There are over 1 MILLION English words in common and uncommon use.
[ http://www.languagemonitor.com/no-of-words/ [languagemonitor.com] ]

Yes.... many, many, many tens of thousands.

-AI

FWIW, in response to TFA... I realize their research is on phrases. Which
very quickly reduces the set. Since many of those words would only exist
in very few spoken phrases.

Re:Bleh (4, Interesting)

bennomatic (691188) | more than 3 years ago | (#35492246)

This reminds me of the guy Colbert interviewed regarding the Large Hadron Collider who thought there was a 50% chance that it would destroy the universe. When questioned as to how he got those odds, he said, "Well, there's two options... either it will happen or it won't happen. 50%."

Re:Bleh (0)

Anonymous Coward | more than 3 years ago | (#35492284)

Lol.

I think I will use this awesome logic in all project risk analysis in future.

Re:Bleh (0)

Anonymous Coward | more than 3 years ago | (#35492534)

I remember that. I couldn't believe it and they way the guy said it was like if it was soooo obvious.

Re:Bleh (0)

Anonymous Coward | more than 3 years ago | (#35510602)

I remember that. I couldn't believe it and they way the guy said it was like if it was soooo obvious.

Probably means they were being sarcastic, and knew it was an inaccurate number.

Re:Bleh (2)

lwsimon (724555) | more than 3 years ago | (#35492588)

I remember following this logic... when I was three. No shit, I have a vivid memory of trying to figure out how proportions worked - I knew that a penny tossed would give a 50/50 split, but that other problem with two states - e.g., when I threw a rock, I'd either hit the matchbox car or I wouldn't - weren't. I gave up, and figured it out later, when I was five or so.

Re:Bleh (1)

Magnus Pym (237274) | more than 3 years ago | (#35493874)

Well, assuming that he has no knowledge about how the thing works and has no other information, his computation of probabilities is technically correct :)

Re:Bleh (0)

Anonymous Coward | more than 3 years ago | (#35504342)

Thank you. Also, probably most of those proclaiming him a moron are equally unqualified to refine that probability any further themselves, thus taking it only on trust when someone in a lab coat tells them the odds are different.

Re:Bleh (-1)

Anonymous Coward | more than 3 years ago | (#35492352)

The sample space isn't binary here. If they were predicting a coin toss, you'd be right to deride their achievement--but they're recognizing individual words, from a set of many thousands of potential words, half the time or better.

That's really quite impressive. And you're an idiot.

Re:Bleh (1)

AlienIntelligence (1184493) | more than 3 years ago | (#35508670)

but they're recognizing individual words, from a set of many thousands of potential words, half the time or better.

That's really quite impressive. And you're an idiot.

From a set of many thousands of words...

and he's the idiot?

-AI

Re:Bleh (1)

Dishevel (1105119) | more than 3 years ago | (#35493012)

I like the way you jump right in and state "things" with disdain and vigor!

To bad that facts and numbers seem to be such a weak spot in your attack.

That's not good (1)

Anonymous Coward | more than 3 years ago | (#35492104)

Better stick to a constant bitrate then :)

Re:That's not good (1)

WorBlux (1751716) | more than 3 years ago | (#35499426)

Exactly, or just add enough random data into the stream, plus the voice channel or make it look like a constant stream of random data.

So...obvious solution then? (4, Interesting)

Anthony Mouse (1927662) | more than 3 years ago | (#35492122)

Use fixed-bitrate encoding for VoIP.

Re:So...obvious solution then? (1)

ackthpt (218170) | more than 3 years ago | (#35492168)

Use fixed-bitrate encoding for VoIP.

Better still, two cans and a length of string.

Re:So...obvious solution then? (2)

Bengie (1121981) | more than 3 years ago | (#35492542)

until someone gets a warrant to string tap you. You'd think the string connecting the two cans is protected by quantum randomness from the string theory, but it is not.

Re:So...obvious solution then? (0)

Anonymous Coward | more than 3 years ago | (#35493612)

What if I quantum entangle the 2 cans and get rid of the string?????

Re:So...obvious solution then? (3, Interesting)

bsquizzato (413710) | more than 3 years ago | (#35492550)

Not so obvious --- now you have a much less efficient use of bandwidth to deal with.

The article describes the method used to detect phrases ...

At a high level, the success of our technique stems from exploiting the corre-lation between the most basic building blocks of speech—namely, phonemes—and the length of the packets that a VoIP codec outputs when presented with these phonemes. Intuitively, to search for a word or phrase, we first build a model by decomposing the target phrase into its most likely constituent phonemes, and then further decomposing those phonemes into the most likely packet lengths. Next, given a series of packet lengths that correspond to an encrypted VoIP conversation, we simply examine the output stream for a sub-sequence of packet lengths that match our model.

Essentially, you gather enough information about how a VBR codec could encode a speech phrase you are looking for, then predict where it was spoken by looking at the "data bursts" being sent in the media stream. We'll need to research a way to "scramble" this predictability that's more efficient than using fixed bitrates, which eats up un-needed bandwidth.

Re:So...obvious solution then? (5, Informative)

Anonymous Coward | more than 3 years ago | (#35492684)

OpenSSH had a similar problem, it would leak information about your login password by the timing/size of the packets:

http://www.ece.cmu.edu/~dawnsong/papers/ssh-timing.pdf

I believe their solution was to introduce random NOP packets into the stream. This approach could work here too.

Re:So...obvious solution then? (1)

modemboy (233342) | more than 3 years ago | (#35493122)

I immediately thought of this exploit as well. Seems to me you would need a lot of NOP packets comparatively, the login info is just a few keystrokes. Plus login info is not time sensitive on the receiving end, delays in a voice stream might not be acceptable.

Re:So...obvious solution then? (1)

buback (144189) | more than 3 years ago | (#35492896)

So I guess it's like how dentist understand their patients when they have their hands and tools in their mouths.

Re:So...obvious solution then? (0)

Anonymous Coward | more than 3 years ago | (#35493596)

I'm not sure I want to visit your dentist if he keeps putting his tool into people's mouths!

Re:So...obvious solution then? (1)

tixxit (1107127) | more than 3 years ago | (#35493196)

Some encrypted systems actually specify how much data can be "leaked" out per some amount of time. The idea is that, practically, you'll always lose something, so you need to determine a limit that is acceptable. I guess that while voice/sound "data" is very complex, speech is much less so and it doesn't take much data being leaked to get the gist of what was said. Since their method is essentially looking at a sequence of numbers, the more obvious solution may be to add some padding to the packets to foil this attack (perhaps to align on certain boundaries of X-number of bytes); this would reduce the number of bits of information leaked per packet. The hard part would be to figure out how much is need to degrade the signal:noise ratio to a good security vs. efficiency trade-off.

Re:So...obvious solution then? (2)

Jah-Wren Ryel (80510) | more than 3 years ago | (#35493384)

We'll need to research a way to "scramble" this predictability that's more efficient than using fixed bitrates, which eats up un-needed bandwidth.

Any fix is going "waste" some amount of bandwidth.

One solution to this attack may be to semi-randomly inject "nops" to bridge phoneme breaks. So instead of being able to identify individual phonemes by bandwidth spikes, attackers will be limited to identifying entire word clusters - like filling the "space" between the phonemes in the first three words of a sentence to make it look like one really long phoneme.

But perhaps something more exotic might work, like randomly re-ordering chunks of audio so that they are transmitted somewhat out of order and then re-ordered on the receiving end. That probably won't use up much extra bandwidth but would increase latency.

Re:So...obvious solution then? (1)

Anonymous Coward | more than 3 years ago | (#35493482)

But perhaps something more exotic might work, like randomly re-ordering chunks of audio so that they are transmitted somewhat out of order and then re-ordered on the receiving end. That probably won't use up much extra bandwidth but would increase latency.

Might not even need to re-order the audio, just burst it so that multiple phonemes are all "packed" together for transmission so there are much fewer phoneme breaks visible via traffic analysis. You burn latency that way too, but it would be much simpler to implement than a randomizing algorithm.

Re:So...obvious solution then? (1)

dgatwood (11270) | more than 3 years ago | (#35493880)

Agreed that the problem is the packing, not the data. However, grouping multiple short packets together is still leaking information. The only difference is that instead of looking at the length of packets, you have to look at the timing between packets.

I would suggest that the right solution is to modify your code so that instead of sending out packets of varying length isochronously, you instead send out packets of the same length isochronously, and adjust the average length every... say ten seconds, adjusting immediately only when you realize that your encoder is getting dangerously ahead or behind. Pad the packet with null blocks as needed to maintain the average.

With such a scheme, the only information you are leaking is a weighted average length of the packets in the last few seconds of the conversation. That should be much less useful.

QoS (1)

sourcerror (1718066) | more than 3 years ago | (#35494294)

Thus you increase latency, which is the single most important thing in a phonecall.

Re:So...obvious solution then? (1)

NateTech (50881) | more than 3 years ago | (#35500594)

Using a VBR and then inserting NOP's sounds like... using a non-variable streaming CODEC.

Re:So...obvious solution then? (0)

Anonymous Coward | more than 3 years ago | (#35493400)

So it's back to Wind Talkers? Making up some new language every few years (months?), or using some dead one, to keep communication secret? Seems like a lot of work.

Re:So...obvious solution then? (1)

Anthony Mouse (1927662) | more than 3 years ago | (#35493658)

We'll need to research a way to "scramble" this predictability that's more efficient than using fixed bitrates, which eats up un-needed bandwidth.

It seems like there might be some promise in improving the compression method itself using the same techniques, so that the things that currently take more bandwidth would take less and therefore become less distinguishable, but if the compression is already near-optimal then this won't work without an efficiency loss because the change would correspondingly make the things that currently take less bandwidth take more, and those things might be more common.

The only general solution is some kind of padding scheme, and the only way to completely defeat the attack is to use a compression method that compresses all inputs to output of the same size, i.e. fixed bitrate, because the degree of deviation from that is the degree to which the attack functions. The existence of efficiency-improving variation is what leaks information, because it tells the attacker a characteristic of the underlying data, namely the number of bits required to encode it. That isn't to say there is no compromise solution (like the OpenSSH method discussed below) where you sacrifice some degree of efficiency in order to make the attack sufficiently infeasible, but there is a direct inverse relationship between real-time encoding efficiency and information leakage.

Re:So...obvious solution then? (1)

Peter Simpson (112887) | more than 3 years ago | (#35493790)

It's very clever. Seems like using a CBR encoder would defeat this method, because every packet would have the same number of samples. Being *too* efficient might save you bandwidth, but it reveals something about your speech patterns.

Re:So...obvious solution then? (1)

psydeshow (154300) | more than 3 years ago | (#35494170)

At a high level, the success of our technique stems from exploiting the corre-lation between the most basic building blocks of speech—namely, phonemes—and the length of the packets that a VoIP codec outputs when presented with these phonemes. Intuitively, to search for a word or phrase, we first build a model by decomposing the target phrase into its most likely constituent phonemes, and then further decomposing those phonemes into the most likely packet lengths. Next, given a series of packet lengths that correspond to an encrypted VoIP conversation, we simply examine the output stream for a sub-sequence of packet lengths that match our model.

Awesome.

It's like listening to the "Mwa mwaa mwaa mwa mwa" voice that adults use in the old Peanuts television specials, and figuring out what they are saying based on the length of the "mwas" and their order in the conversation.

Re:So...obvious solution then? (1)

PReDiToR (687141) | more than 3 years ago | (#35498964)

You mean like trying to decipher Kenny from South Park's words?

I wonder what my kids would compare it to ...

Re:So...obvious solution then? (2)

Kjella (173770) | more than 3 years ago | (#35494320)

Not so obvious --- now you have a much less efficient use of bandwidth to deal with.

Enough to matter? According to my cell phone bill, I had over 100MB of data traffic last month. That's about 10 hours of 24 kbps CBR encoded voice, which is the highest possible CBR setting speex has. If it's on my DSL/cable/whatever line, who cares? Even if I did that 24x7 for a month it'd be 7-8 GB and I'm pretty sure even a teenage girl with mouth diarrhea has to sleep sometimes. If that's what it takes, I don't see CBR as being a dealbreaker.

Re:So...obvious solution then? (1)

bsquizzato (413710) | more than 3 years ago | (#35494956)

Now take hundreds of thousands of calls like yours running through your service provider's network, being transferred to other providers networks, etc. Or, hundreds/thousands of calls running w/in a large enterprise such as from branch offices to HQ. Bandwidth costs money. In situations like these, you try to conserve bandwidth any way you can.

Re:So...obvious solution then? (1)

Eivind (15695) | more than 3 years ago | (#35501064)

Not enough to matter.

VBR *does* save bandwith for equivalent quality, but not a lot of it.

Your 100MB gives you 10 hours of 24kbps of CBR encoded voice, and at a guess, VBR would maybe give you 13-15 hours of voice in the same bandwith.

Certainly trivial, and certainly the answer to this problem is that encrypted voice, should be encoded CBR to make traffic-analysis impossible.

Re:So...obvious solution then? (4, Interesting)

Cthefuture (665326) | more than 3 years ago | (#35492852)

Actually most people are using G.711 these days which is in fact a fixed bitrate (it's the same protocol used on your normal "hard" voice line).

But most VoIP providers do not offer SRTP or any encryption whatsoever so this whole thing is not even a question. More than likely anyone can listen in on your VoIP calls. We need to put more pressure on VoIP providers to offer encryption.

Re:So...obvious solution then? (0)

Anonymous Coward | more than 3 years ago | (#35495962)

Yea, but then how would officials listen in on your conversations when they need to?

Re:So...obvious solution then? (1)

TuringCheck (1989202) | more than 3 years ago | (#35498050)

Working in telephony and VoIP for the last 8 years I don't remember seeing a VBR codec in actual use - ever. At most silence detection is used but that has unpleasant side effects too. I also find useless to save 2-3 bytes when the UDP+RTP overhead is 40 (plus at least 4 if SRTP is used).

Stalin's Dream II (2)

ackthpt (218170) | more than 3 years ago | (#35492136)

Teh Recognisining.

"I'd like to order pizza, with pepperoni, pineapple, mushroom and an Iludium Pu-36 space modulator delivered to Hall of Justice."

Re:Stalin's Dream II (0)

Anonymous Coward | more than 3 years ago | (#35492266)

Teh Recognisining.

"I'd like to order pizza, with pepperoni, pineapple, mushroom and an Iludium Pu-36 space modulator delivered to Hall of Justice."

Pu36? Negative mass neutrons?

Re:Stalin's Dream II (2)

bmo (77928) | more than 3 years ago | (#35492354)

http://www.youtube.com/watch?v=7A4HeawmE6A [youtube.com]

Not knowing what an Illudium Pu-36 Explosive Space Modulator means you had a deprived childhood.

--
BMO

Re:Stalin's Dream II (1)

AlienIntelligence (1184493) | more than 3 years ago | (#35508750)

http://www.youtube.com/watch?v=7A4HeawmE6A [youtube.com]

Not knowing what an Illudium Pu-36 Explosive Space Modulator means you had a deprived childhood.

--
BMO

Hear, hear!

Marvin is the man! I mean, he's the silly thought and pseudo I use
for this nickname.

-AI

Re:Stalin's Dream II (0)

Anonymous Coward | more than 3 years ago | (#35492492)

Teh Recognisining.

"I'd like to order pizza, with pepperoni, pineapple, mushroom and an Iludium Pu-36 space modulator delivered to Hall of Justice."

The Earth looks lovely tonight.

Re:Stalin's Dream II (0)

Anonymous Coward | more than 3 years ago | (#35493004)

or "FSCK!"

Duh! (2, Insightful)

Anonymous Coward | more than 3 years ago | (#35492298)

When you want to secure something, you must think carefully about how you might be leaking information. You can't just slap some encryption on and call it a day.

fuck you moderators (1)

cstanley8899 (1998614) | more than 3 years ago | (#35492326)

fuck you. die. all moderators.

Re:fuck you moderators (0)

Anonymous Coward | more than 3 years ago | (#35492616)

You should take a look back through the results of your time here [slashdot.org] .

Perhaps it's not the moderators; given your past record it seems more likely that you simply have nothing to say that anyone else finds interesting.

This is not news (0)

Anonymous Coward | more than 3 years ago | (#35492418)

http://portal.acm.org/citation.cfm?id=1397759.1398055&coll=DL&dl=GUIDE&CFID=13816718&CFTOKEN=31717594

Written by the same authors, 2 years ago. Article at the same publication.

Pahaha. (0)

Anonymous Coward | more than 3 years ago | (#35492544)

I keep reading the white papers, and upgrading my crypto.

First it was telnet.
Then SSH.
Then iptables, soon port-knocking...
Encrypted VPN tunnels...
The list is endless.

Just stacking more layers of security on top of each other. *sigh* Gotta keep my conversations safe SOMEHOW.

SOCKS Proxy over SSH (0)

Anonymous Coward | more than 3 years ago | (#35492688)

Skype lets you set a SOCKS proxy. If that links to a dynamic ssh proxy, then at least your local traffic is encrypted. That protects from casual eavesdropping on the LAN segment (prevent firesheep and similar attacks). That gets the packets "further along" to their destination, but if the warrantless-wiretapping [wikipedia.org] cases show, you don't have to be a conspiracy nut to question whether Skype freely routes all of their traffic to the NSA. Okay, maybe you have to be a bit paranoid to think that.

No shit? (1)

Anonymous Coward | more than 3 years ago | (#35492710)

You mean when you vary a quality of your signal (in this case bitrate) based on content, people can read information about the content from those variations??? OMFG!

then it's shitty encryption (2)

cellocgw (617879) | more than 3 years ago | (#35492890)

The definition (somewhere in the 'net archives) of encryption quality is how distinguishable the encrypted message is from random noise. Clearly setting bitrates, or any other parameter, based on the input, is not random.

Pick a better algorithm and/or suck it up and waste a little bandwidth.

Re:then it's shitty encryption (1)

dachshund (300733) | more than 3 years ago | (#35495034)

The definition (somewhere in the 'net archives) of encryption quality is how distinguishable the encrypted message is from random noise. Clearly setting bitrates, or any other parameter, based on the input, is not random.

(A common) definition of symmetric encryption is that a message should be indistinguishable from an equal-length string of random bits. In that sense, there's nothing wrong with this encryption scheme.

What is wrong here is that encryption does not hide message length, and in many cases message lengths can leak information about the message content. The research is nice because they show a very practical way to get useful information from message length.

The research is, however, three years old --- it was originally published in ACM CCS 2008. This is just a journal submission. I love to see crypto in the news, but this really shouldn't be.

Google Voice (1)

Arykor (966623) | more than 3 years ago | (#35492938)

Google is involved in this? Perhaps encryption could help them improve the accuracy of transcription in Google Voice... [twitter.com]

Re:Google Voice (0)

Anonymous Coward | more than 3 years ago | (#35493640)

no kidding! add to that how accurate those fucking voice controlled robo-answering devices are (I long for touch tone!) and I call bullshit. especially on the 90% figure.

What phrases? (1)

stillnotelf (1476907) | more than 3 years ago | (#35492984)

I'm hoping it's best at picking up obvious spy phrases, like "the eagle has landed", "the moon fish squicks wickedly at midnight", "long is the gap between cacti"... Somehow I think it's probably best at "hello".

Re:What phrases? (1)

DriedClexler (814907) | more than 3 years ago | (#35495300)

Somehow I think it's probably best at "hello".

I'm one step ahead of these known-plaintext attacks -- no longer do I use the same, small set of voice greetings. No no -- I prepend a nonce.

"Hello?"
"Shgr'gl'hm-v'va Hi Mom, it's Clyde ... and you're not supposed to answer the phone like that!!!"

Re:What phrases? (1)

NateTech (50881) | more than 3 years ago | (#35500600)

Who answers with "Hello" still? Waste of time. Look at Caller ID, "Hi XXX."

Or... "This is XXX." That one always throws the telemarketers... "Is X there?" "Didn't I just say that?"

Or my favorite, old military and any kind of "Operations" job folks... we just answer with our last name. One word, contact established, identity verified... go with your traffic.

"Goodbye" is silly too. Just hang up.

Read speech vs Conversational Speech (0)

Anonymous Coward | more than 3 years ago | (#35493296)

It would be interesting to see how well their algorithm performs on conversational speech, something like Switchboard or CallHome. There is a lot more pronunciation variation in in conversational speech than in read speech.

Variable bit rate? (1)

s_p_oneil (795792) | more than 3 years ago | (#35493312)

Did you note that they specified variable bit rate? In this case, I'll bet it had more to do with the timing and flow of the packets and bytes than with the actual content of the bytes. When there's a pause in a person's speech, there is a pause in the network traffic. Imagine someone trying to send morse code through an encrypted voice channel. Someone watching a bandwidth graph that had a high enough frequency would know exactly what coded message you sent regardless of the compression or encryption algorithm used (as long as the compression is variable bit rate). Due to the way voice data is compressed, increases or decreases in traffic could imply certain changes in tone, pitch, volume, inflection, etc. Tracked at a very high frequency, changes in the flow of bytes could give plenty of clues as to what is being said whether the traffic is encrypted or not. In general, encryption algorithms don't change the number or flow of bytes, just the content of the bytes.

RTP blinding (2)

WaffleMonster (969671) | more than 3 years ago | (#35493476)

A few solutions...

Add some number of pad bytes to each packet to fill in blanks.

Tweak existing high complexity codecs (ilbc, speex..etc) to maintain a persistant bitrate by dynamically scaling quality to even out the per packet bits.

Use a fixed bitrate codec (most of these really suck from bw effeciency vs quality perspective)

Switch variability to the time domain adding jitter to mask the signal and control latency/security tradeoff.

SRTP scares me because it was invented for a single narrow purpose. Would much prefer the use of DTLS to secure RTP streams which being very similar to TLS has received much more scrutiny than SRTP likely ever will.

An advert ? (0)

McTickles (1812316) | more than 3 years ago | (#35493584)

Is this another advert for yet another pay-to-read-and-see-its-bullshit "science" paper ?

This is why science is going all funny as of late and people don't trust "research" much. "Researchers" write papers to make money apparently, so they will write any old crap and call it an amazing discovery !

Researchers, show us something interesting for once, and don't make us pay for it. Information is free you know ?

useless, and easy countermeasures (2)

t2t10 (1909766) | more than 3 years ago | (#35493600)

First of all, statements like "50% accuracy" are nearly useless; you need to know both precision and recall. And to the degree that "50% accuracy" tells you anything, it tells you that the system is pretty bad.

Finally, the countermeasure for this is the same as the countermeasure for other automated speech analysis techniques: play some singing or theater in the background.

Re:useless, and easy countermeasures (0)

Anonymous Coward | more than 3 years ago | (#35494372)

Recall and precision are given in the paper. As is an evaluation with various levels of noise.

Re:useless, and easy countermeasures (1)

uid7306m (830787) | more than 3 years ago | (#35495130)

Exactly. The phrases used are fairly long, for instance: "Laugh, dance, and sing if fortune smiles upon you." In the TIMIT corpus, there are 122 of them. In the English language, there are hmm, lots of sentences of that length. There are about 1000 different syllables in English, and I count 11 syllables in that sentence. Thus, there are some fraction of 10^33 sentences of that length.

So, if you tried this on English, one of two things would happen. If you used that recognizer without any modification, then it would sit there silently until you said one of the sentences in TIMIT, like "She had your dark suit in greasy wash water all year." And, it would be a *long* wait.

Or, you could change the recognizer so it could recognize more than 122 possible sentences. In that case, the error rate would go way up.

Nexidia (1)

randyjparker (543614) | more than 3 years ago | (#35494460)

Nexidia has been selling proprietary tech to do this for years

Average accuracy of 50%? (1)

fishbowl (7759) | more than 3 years ago | (#35494684)

On any digital signal, comparing a random source of bits should get you 50% accuracy.

Better than guessing? (1)

KnownIssues (1612961) | more than 3 years ago | (#35494850)

I'm sure there's a mathematical/statistical reason why 50% accuracy is better than guessing in this case, but that would be very counter-intuitive. Same with as high as 90% under certain conditions. I could get to 90% accuracy if I could select out everything that reduced my accuracy as well. I don't doubt the full article explains better though. I'm not suggesting MIT, Google, etc scientists are stupid.

Re:Better than guessing? (0)

Anonymous Coward | more than 3 years ago | (#35497210)

Your point only to applies to test with a two-sided outcome, like "does this packet stream match this one sample?" "match/no match"

Searching against a corpus is something else, the set of possible answers is a bit larger than 2, and there's still one correct answer. Random choice from that set is not going to be 50% accurate.

This opposite thing applies to DNA matching. You can do a single comparison and have a small chance of a false positive inaccuracy, if you search against a large corpus (DNA database) then you are very likely to get at least one false positive by chance.

I only need to filter OUT certain conversations. (0)

Anonymous Coward | more than 3 years ago | (#35495638)

Like any that include the word, 'WINNING!'.

An exercise of pattern detection (1)

c0lo (1497653) | more than 3 years ago | (#35498224)

Seems that I started to detect a pattern between the current TFA and this [slashdot.org] one.
Now, DHS, I know I'm not at MIT, but other [wikipedia.org] cases showed I don't need to... So, just where is my grant for advanced research of the subject?

Unscrewing molds (0)

Anonymous Coward | more than 3 years ago | (#35499378)

Unscrewing molds [intertech.net.tw]
is the core business of Intertech (Taiwan). With world level technology,
Intertech enjoys a very good reputation for making Injection Mold [taiwanmoldmaker.com] and
Plastic Molds [taiwanmoldmaker.com] for their worldwide customers.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?