Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Two Spam Filters 10 Times As Accurate As Humans

timothy posted more than 10 years ago | from the dev/null-is-getting-fatter dept.

Spam 487

Nuclear Elephant writes "The authors of two spam filters, CRM114 and DSPAM, announced recently that their filters have achieved accuracy rates ten times better than a human is capable of. Based on a study by Bill Yerazunis of CRM114, the average human is only 99.84% accurate. Both filters are reporting to have reached accuracy levels between 99.983% and 99.984% (1 misclassification in 6250 messages) using completely different approaches (CRM114 touts Markovan, while DSPAM implements a Dolby-type noise reduction algorithm called Dobly). If you're looking for a way to rid spam from your inbox, roll on over to one of these authors' websites."

cancel ×

487 comments

Sorry! There are no comments related to the filter you selected.

ob (-1, Offtopic)

Billobob (532161) | more than 10 years ago | (#8368801)

In soviet russia, the spam filters you!

Outclassed... (5, Funny)

Klatoo55 (726789) | more than 10 years ago | (#8368802)

I'm sorry, Dave... That Nigerian guy looks suspicious and I can't let you send him money.

Huh? Aren't humans 100%? (5, Insightful)

MrP- (45616) | more than 10 years ago | (#8368805)

How can a spam filter be more accurate than humans? Humans are always the last step in spam filtering.. i use popfile and it catches 99% but it still needs me.. because im the only one capable of identifying spam 100% of the time.

Re:Huh? Aren't humans 100%? (5, Informative)

MarkJensen (708621) | more than 10 years ago | (#8368834)

I haven't been 100% accurate.

I received an email from my sister-in-law from her work, and the address looked suspicious (one of those weird-looking "letter and number" jumbles.

I deleted it. It happens.

*slams head against wall* (5, Funny)

Faust7 (314817) | more than 10 years ago | (#8368932)

I received an email from my sister-in-law from her work

Yeah, so did I. The subject line was "I want you so bad."

I deleted it. Turned out the message was genuine. I'll never forgive myself...

Re:Huh? Aren't humans 100%? (-1, Redundant)

Anonymous Coward | more than 10 years ago | (#8369008)

Most humans are more accurate* at filtering spam than software filters.

* There are some rare exceptions, for example, MarkJensen professes to be less accurate.

Re:Huh? Aren't humans 100%? (2, Informative)

msgmonkey (599753) | more than 10 years ago | (#8368836)

Humans sometimes make mistakes, that's where the inaccuracy comes from.

Re:Huh? Aren't humans 100%? (2, Interesting)

hatrisc (555862) | more than 10 years ago | (#8368841)

but can you identify spam before opening it 100% of the time? Now, I realize that the mail program is looking at the actual data as well, which gives it an advantage, but on the other hand, how else can IT detect spam?

Re:Huh? Aren't humans 100%? (0)

Anonymous Coward | more than 10 years ago | (#8368950)

Sure I can. No one knows My Email. I regard it all as spam. therefore, I only have email so that I can complain about spam.

Re:Huh? Aren't humans 100%? (1, Insightful)

Phillup (317168) | more than 10 years ago | (#8368848)

I agree 100%.

If I say it is spam, I'm not reading it... and I am deleting it.

Any software that tries to stop me is removed via
rm -Rf
because it is faulty.

Re:Huh? Aren't humans 100%? (4, Insightful)

Behrooz (302401) | more than 10 years ago | (#8368850)

I suppose it depends how you're defining spam. Perhaps the ultimate spam messages that don't get past them are capable of passing a turing test... hence fooling those gullible human recipients into thinking that it isn't even spam!

Fortunately, soon we will all be able to use the superhuman spam-detection capabilities of these filters to save us from ourselves. Imagine all of those pesky e-mails from your 'friends' getting caught by your spam filter before they even impinge upon your consciousness.

It'd be a wonderful world.

Re:Huh? Aren't humans 100%? (5, Insightful)

gid13 (620803) | more than 10 years ago | (#8368878)

If you read the post, it quotes a study and says humans are only accurate 99.84% of the time.

Kinda makes you wonder how they can know the filters are right though. :)

(please don't reply telling me how)

Re:Huh? Aren't humans 100%? (5, Insightful)

mattkime (8466) | more than 10 years ago | (#8368927)

Obviously you've never seen someone new to the internet sit in front of their computer. Lots of people don't know what popups are. Lots of people read some spam not knowing what it is. To these people, a computer is merely an interesting string of sensations.

Re:Huh? Aren't humans 100%? (-1, Flamebait)

Anonymous Coward | more than 10 years ago | (#8368933)

Maybe their results were based on spam classification that was outsourced to India.

Re:Huh? Aren't humans 100%? (4, Insightful)

Celandro (595953) | more than 10 years ago | (#8368943)

Perhaps they mean that Human A is reading email intended for Human B and attempting to classify the email as spam or not spam. I wouldnt be surprised if a computer could do a better job at that sort of task. Besides Im sure Human B wouldnt want Human A reading that cyber sex chat log.

Re:Huh? Aren't humans 100%? (5, Insightful)

evilmrhenry (542138) | more than 10 years ago | (#8368989)

Quite simple:
With 10 messages (after automatic spam detection) humans are 100% accurate.

With 1,000 messages, (before automatic spam detection)
humans are less than 100% accurate.

The experiment was done on 5849 messages.

Remember; one thing computers are good at is doing boring things repeatedly.

Re:Huh? Aren't humans 100%? (5, Interesting)

Elwood P Dowd (16933) | more than 10 years ago | (#8369003)

No, humans are not 100%.

If you see a strange name in your inbox with an odd title, that might be a Nigerian businessman, or it might be your long lost Nigerian brother.

I recently tried to order a t-shirt from this guy for a band he used to be in. I found his band because we have the same (semi-uncommon) name. So, he got an email From: himself. I had to send him two emails because he deleted the first one assuming it was spam.

I ordered some RAM for my dad a while back. He gets 200 spam emails a day (email addy in resume & web page), and he deleted the confirmation email from the RAM vendor. The RAM never shipped, and it took us a week to figure out that there was a problem.

People make mistakes all the time. Why is this an unexpected result? People are jackasses. This should be obvious.

Re:Huh? Aren't humans 100%? (2, Interesting)

Dulimano (686806) | more than 10 years ago | (#8369018)

No, imaginary humans with infinite time and dedication are 100%. But real humans are not. The percent goes down with time and dedication continuously, so I really don't understand what this 99.84% means.

IM Spam (5, Interesting)

jeffskyrunner (701044) | more than 10 years ago | (#8368806)

Once Email Spam is eliminated, then IM spam will begin...

Re:IM Spam (0)

Anonymous Coward | more than 10 years ago | (#8368830)

I already get IM spam, over ICQ. lots of cute girls asking me to check out their webcams...

Re:IM Spam (0)

Anonymous Coward | more than 10 years ago | (#8368832)

mandatory sterilization for the people buying from spam, and the actual spammers.

we dont want those idiots running around with their subpar genes

Re:IM Spam (2, Informative)

Vancouverite (227795) | more than 10 years ago | (#8368983)

Far too late for that. ICQ has had IM Spam for some time, as has Yahoo, MSChat, and AOL.

What *will* happen is that trawling robots will now also trawl for IM addresses, rather than just email addresses. As it is, only deliberate IM spammers (who are usually in an IM chat group with an intellectually stimulating name such as "Yung Hunnies 4 Married Men") are harvesting the IM addresses that show up in these chat groups. In the future, don't have your ICQ # or Jabber ID on your website, or you are setting yourself up for more spam.

Hmmm... a use for reverse 3133t spelling? "Contact me at ICQ #lEloAAT" (1310447)

Re:IM Spam (0)

Anonymous Coward | more than 10 years ago | (#8369033)

I don't participate in chat groups, but I still get spam through ICQ.

I'd be tempted to say that it's a case of someone sending out blanket ICQ messages to addresses, starting with account #n and ending with account #n+x

Re:IM Spam (1)

rokzy (687636) | more than 10 years ago | (#8369000)

I use (a)MSN and it has the option that no-one can contact me unless I've already got them on my list.

worked perfectly so far.

fp (-1, Offtopic)

Angry Black Man (533969) | more than 10 years ago | (#8368807)

but can it filter this post FIREST pots assholes! oh yeah baby!!

Spamassassin (1)

Czmyt (689032) | more than 10 years ago | (#8368809)

It's hard to believe that a single approach like this is better than SpamAssassin. I wonder hot is compares?

Re:Spamassassin (2, Interesting)

pclminion (145572) | more than 10 years ago | (#8368931)

It's hard to believe that a single approach like this is better than SpamAssassin.

SpamAssassin is a single approach. It looks at a bunch of features, then combines them linearly and compares the result against a threshold function. It's a relatively simplistic method, compared to these two. Not hard to see how more sophisticated methods could do better.

Re:Spamassassin (0)

Anonymous Coward | more than 10 years ago | (#8369009)

> SpamAssassin is a single approach. It looks at a bunch of features

Didn't you just contradict yourself? SA uses RBLs, pattern matching, and spam-reporting clearinghouses to identify spam. How is that a single approach?

Re:Spamassassin (1, Informative)

Anonymous Coward | more than 10 years ago | (#8369025)

It's not a single approach: Mr. Yerazunis's setup for CRM114 sits behind several DNS blacklists, which pre-filter a huge amount of it. (I know his sys-admin.)

But it is far superior to SpamAssassin because it now examines groups of words. The short phrases and words identified by SpamAssassin are avoided by spammers, who are now adding huge amounts of un-displayed random text and terrible HTML tricks to avoid SpamAssassin and similar filters and to avoid the various hash functions that detect familiar phrases.

Re:Spamassassin (1)

gregfortune (313889) | more than 10 years ago | (#8369038)

I wonder hot is compares?

Now that's sneaky!! ;o)

Depends... (-1, Offtopic)

mstefanus (705346) | more than 10 years ago | (#8368810)

"Stoned" humans have much less accuracy of course

Re:Depends... (1, Funny)

DoctorCool (700514) | more than 10 years ago | (#8368890)

None of my mail is spam! I take the penis enlargment and brest enhancement very seriously.

Re:Depends... (0)

smharr4 (709389) | more than 10 years ago | (#8368978)

Combine the two together to get enlarged breast-shaped penises, or penis-shaped breasts.

wait, WTF? (5, Insightful)

PedanticSpellingTrol (746300) | more than 10 years ago | (#8368812)

I presume they mean more accurate than a human that was only looking at the subject line? I fail to see how someone could misclassify an email after they'd already opened it unless it was some kind of marathon testing, which would be totally unrepresentative of any real life situation. Once you're getting 6,000 messages, it's time to reach for "Delete All" and change your address, methinks

Re:wait, WTF? (2, Interesting)

LBArrettAnderson (655246) | more than 10 years ago | (#8368990)

look at it this way... you've just tuned in to your favorite radio station and you hear your favorite DJ talking about something. Sometimes you could mix what he's saying up between an advertisement or something he's discussing for the sake of discussing.

i'm sure there's spam out there that makes it seem like it's one of your friends talking to you (sending with "nick" or "john" as the sender name) and talks to you in a friendly manner about how great this product is.

i've got a few of those, but luckily all my friends have weird names.

2+2=3 (2)

Chess_the_cat (653159) | more than 10 years ago | (#8368813)

the average human is only 99.84% accurate. Both filters are reporting to have reached accuracy levels between 99.983% and 99.984%

Am I crazy or is that nowhere near "10 times better"?

Number of significant digits... (4, Informative)

jsimon12 (207119) | more than 10 years ago | (#8368845)

Human=99.84
New proggie=99.984

So the human misses .16% and the machine only missues .016% hence the machine is 10 times better.

Re:2+2=3 (1)

LightningBolt! (664763) | more than 10 years ago | (#8368857)

If you look at how inaccurate they are, it makes sense... Humans: 0.16% Computers: 0.016%

Re:2+2=3 (0)

Anonymous Coward | more than 10 years ago | (#8368861)

Quote from the Dobly site:

According to a study by Bill Yerazunis (CRM114), humans are approximately 99.84% accurate at filtering spam. As of today, DSPAM has classified 2835 spams and 3050 nonspams in my mailbox with only 1 false accept and 1 false reject. The false accept was caused by a bug in the Bayesian Dobly code which was fixed, so depending on how you count it, I am getting either 99.964% or 99.983% accuracy - nearly ten times more accurate than a human!

Re:2+2=3 (3, Informative)

Celandro (595953) | more than 10 years ago | (#8368863)

No, you are just bad at math
1 - .9984 = .0016
1 - .99984 = .00016

A factor of 10 in reduced error rates

160 errors per 10 thousand vs 16.

...yet another soviet russia joke... (-1, Offtopic)

Behrooz (302401) | more than 10 years ago | (#8368907)

In Soviet Russia, math is bad at you!

Re:2+2=3 (0)

Anonymous Coward | more than 10 years ago | (#8368967)

Man, the human that sorted those 10,000 messages has gotta had the biggest wang and helped the most Nigerian princess save their inheritance. Not to mention the excellent interest rate on his home mortgage.

Re:2+2=3 (1)

nzkoz (139612) | more than 10 years ago | (#8368879)

I think the point they're making is that they pass 1/10th as many spams as a human does. Not what most people would consider 10 times better, but still an improvement.

Re:2+2=3 (1)

Cocodude (693069) | more than 10 years ago | (#8368891)

Think of it as a human making an error of 0.16% (100% - 99.84%), and the filters 0.016% (100% - 99.984%). Thus, the human makes ten times as many mistake, which can be seen as the filters being "ten times better".

It is 10 times better (2, Informative)

flicken (182650) | more than 10 years ago | (#8368892)

Think of it in terms of an error rate:
100%-99.84% = 0.16%
100%-99.984% = 0.016%

0.16% = 10 * 0.016%

Re:2+2=3 (1, Redundant)

canajin56 (660655) | more than 10 years ago | (#8368906)

99.84% chance of success is a one in 625 chance of failure. 99.983% chance of success is a one in 5000 chance of failure. 99.984% = 1 in 6250. So yes, it is around 10 times better :D

Re:2+2=3 (2, Funny)

Deraj DeZine (726641) | more than 10 years ago | (#8368909)

Yeah, "10 times better" should be 998.4%, right?

And that's impossible. No one can give more than one hundred percent. By definition that is the most anyone can give

Re:2+2=3 (0)

Anonymous Coward | more than 10 years ago | (#8369027)

Well, at least one mod doesn't watch the Simpsons. This is from the episode with the baseball team. If you thought that I actually believed the math up there, you've got a sad outlook on humanity.

Seriously... ten times better meaning multiply the percentage?

Re:2+2=3 (1)

Ralp (541345) | more than 10 years ago | (#8368918)

Human: 99.84% accurate = 0.16% inaccurate
Filters: 99.984% accurate = 0.016% inaccurate

0.16% inaccuracy means ten times as much spam will get through as 0.016% inaccuracy, thus, ten times better.
(At least by that standard of "better", I must qualify for anyone who wants to twist the statistics another way!)

Re:2+2=3 (1)

The Dark (159909) | more than 10 years ago | (#8368940)

Your not crazy, they are claiming 10 times fewer incorrect classifications.
Although "10 times less inaccurate than humans" doesn't sound as catchy.

Re:2+2=3 (1)

Bishop (4500) | more than 10 years ago | (#8368965)

99.84% == 1 error in 625 tests
99.984% == 1 error in 6250 tests

99.984% is 10 times better then 99.84%. This is not obvious until you do the math.

Re:2+2=3 (0)

Anonymous Coward | more than 10 years ago | (#8369035)

Am I crazy or is that nowhere near "10 times better"?

Who says the two have to be mutually exclusive?

can it be used with SA? (4, Interesting)

Chuck Bucket (142633) | more than 10 years ago | (#8368818)

can this be used with Spamassasin, or is a stand alone program? Does it need something like Amasis to run?

CB

Re:can it be used with SA? (2, Funny)

Neil Blender (555885) | more than 10 years ago | (#8368976)

can this be used with Spamassasin, or is a stand alone program? Does it need something like Amasis to run?

I'd tell you, but I'm not 100% sure.

Who is sending that one? (5, Funny)

ObviousGuy (578567) | more than 10 years ago | (#8368820)

If your email is indistuinguishable from spam by a human, perhaps the problem isn't the receiver. It's the sender.

Forgive me if I don't feel any pity that some moron's email gets filtered to the junk bin because I couldn't discern it from spam.

Bleh. (0)

SphericalCrusher (739397) | more than 10 years ago | (#8368824)

That has to be some stupid people it is comparing to.

The day that something programmed out performs a human just goes to show how bad the World is coming to... although there was that Chess game that beat the World's Champion.. even though that was a different story. =/

SPAM definition (2, Insightful)

Embedded Geek (532893) | more than 10 years ago | (#8368825)

Isn't the rough defintion of SPAM "Anything I don't want in my mailbox"? If that's the case, isn't the human score going to be 100% (at least for the intended recipient)?

To get this new spam filter... (5, Funny)

Anonymous Coward | more than 10 years ago | (#8368827)

Just enter a valid email address, and hit submit!

Huh? (1, Interesting)

MBCook (132727) | more than 10 years ago | (#8368829)

OK, I am the one who DEFINES what spam is for me, hence everything I say is spam is, and everything I say isn' isn't. I'm 100% accurate by the fact that as the person who defines what spam is for me, I know exactly what spam is.

Would someone like to explain how a program (even if it's right 99.something% of the time) is more accurate than me (100%)?

Re:Huh? (1)

jumpingfred (244629) | more than 10 years ago | (#8368856)

I don't know about you but I sometimes make mistakes and delete the wrong mail.

Re:Huh? (0)

phillk6751 (654352) | more than 10 years ago | (#8368911)

besides, are you absolutely sure that an e-mail message is spam by what its subject line and sender is? If you filter the e-mail solely based on that, spam filters can truely be more accurate than a human, depending on your situation

Is this possible? (1, Interesting)

Knetzar (698216) | more than 10 years ago | (#8368833)

How does one test a program like this that's more acurate the humans?

AI (0)

phillk6751 (654352) | more than 10 years ago | (#8368837)

Accuracy of the SBPH/BCR classifier has been seen in excess of 99 per cent, for 1/4 megabyte of learning text. In other words, CRM114 learns, and it learns fast .

Great, someone finally came up with a spam filter that learns.

Better (4, Interesting)

gid13 (620803) | more than 10 years ago | (#8368840)

Well, it certainly sounds better than the pay-per-email "postage" idea. If postage hasn't stopped snail spam, why would it stop e-mail spam?

Re:Better (1)

Grrr (16449) | more than 10 years ago | (#8368973)

If postage hasn't stopped snail spam, why would it stop e-mail spam?

The sender's cost of e-mail spam is negligible, per address, compared to snail mail postage (in the USA, anyway).

<grrr>

How can a human be wrong? (-1, Redundant)

LagDemon (521810) | more than 10 years ago | (#8368846)

Everyonme sets their own definition of what they think is spam. No matter what, in the end, the human CANT be wrong... right?

Re:How can a human be wrong? (4, Informative)

pclminion (145572) | more than 10 years ago | (#8368969)

No matter what, in the end, the human CANT be wrong... right?

[*Bing* -- mail from VP of sales pops into my inbox. Subject: "Making money fast!"]

[*Bam* -- I hit delete, thinking "Stupid Spam!"]

Ahh, shit! Lookie, a human screwed up.

The filter would have actually examined the message and probably decided that it was legitimate.

Re:How can a human be wrong? (0)

Anonymous Coward | more than 10 years ago | (#8369007)

Ah, but if you had opened the email, thereby giving both you and the program the same criteria with which to classify as spam or not, then you wouldn't have deleted it.

Based just on subject line I'd be tempted to say that a computer would also have classified your example as spam.

Re:How can a human be wrong? (1)

Behrooz (302401) | more than 10 years ago | (#8368971)

No matter what, in the end, the human CANT be wrong... right?

Nah, wrong.

At least, I think it's wrong. Either way, one of us is wrong, so I must be right, because you said that humans can't be wrong and I said that you're wrong about that. Right?

Re:How can a human be wrong? (0)

Anonymous Coward | more than 10 years ago | (#8369016)

No, that just means that the human defined the rules that it and the computer will follow. However, the computer will always follow those rules, while the human won't. People often will sometimes delete a mail, perhaps not recognizing an address from a valid sender, thinking it was spam. This would be a failure on the human part. Or perhaps they might consider opening up an email and reading it when it is in fact spam, a failure.

less thought for me... (3, Funny)

Digitus1337 (671442) | more than 10 years ago | (#8368852)

...and only one locked pod bay door per 6250, I like those odds.

maths (-1, Redundant)

Anonymous Coward | more than 10 years ago | (#8368860)

anyone want to explain to me how 99.84% * 10 = 99.983% ?

Use a realtime blacklist + spam filtering (1)

servicepack158 (678320) | more than 10 years ago | (#8368864)

One doesn't ever seem like enough. Like blacklist + spamassassin. how come you can never get to the links in the spam anyway, what's the point ? :)

Hmmmm (5, Funny)

Anonymous Coward | more than 10 years ago | (#8368882)

Probably used those same people who open viruses as test subjects.

i tend to think... (3, Funny)

caino59 (313096) | more than 10 years ago | (#8368886)

that i'm 100% accurate.

maybe some of those people just dont know where their 'del' key is, or what it does...

Combined accuracy? (2, Interesting)

LagDemon (521810) | more than 10 years ago | (#8368893)

Does this mean that if I use the 2 together, i get a 99.99999728% accuracy? Awesome! THat means it would takes months for me to see a single error!

Or, in my case... (1)

Atario (673917) | more than 10 years ago | (#8368920)

...days! Yee haw!

Re:Combined accuracy? (2, Interesting)

canajin56 (660655) | more than 10 years ago | (#8369023)

No, that only works if the probability of system X being wrong is independent of the particular message it is checking. (This also means that their figures are dependent on the makeup of the e-mail you are getting) Also, you couldn't really combine them usefully. If one says yes and the other says no, what do you do? You could either accept in these cases, or reject. But either way you could increase the error over just using one or the other.

how to lie with statistics.. (2, Interesting)

isaac338 (705434) | more than 10 years ago | (#8368894)

1 in 6250?

Who wants to bet that they only sent two 'spam' and one of them was disguised well? ;)

Obligatory Q... When will mozilla/TB have them? (5, Interesting)

sisukapalli1 (471175) | more than 10 years ago | (#8368898)

I reached the conclusion of "two filters better than humans" by using two sequential filters:
server side spamassassin, and a couple of simple procmail recipes. They have kept almost all the SPAM away.

However, it is good to see such good techniques becoming available and we can hope to see them as straight forward usable tools.

So, when will mozilla/TB (or your favourite server side or client side filter) get them?

S

Ah, procmail... (1)

telekon (185072) | more than 10 years ago | (#8369034)

The procmail element, IMHO, incorporates a bit of 'human' in the machine... as other posts have mentioned, I decide what I consider to be spam for me. So, the server-side component would filter out what a machine can determine to be clearly spam, and a couple of standard procmail recipes would catch the rest of what is "most spam to most people."

But I don't wanna have to do any of it by hand, so I'm gonna add my own recipes... so I don't see any see the stuff that isn't spam to the machine (not in the .0016% that it 'misses') but that I consider spam. That's a human using a tool to do something...

Nothing will ever be 100%, but the asymptote can get smaller and smaller the more closely the user and machine are working together for this.

As far as Mozilla integration... mozilla's just reading from my mail spool, so why would I want my MUA consuming resources that procmail will use more efficiently, silently in the background?

Some people won't run procmail, or run some OS that isn't compatible. I understand. But that's like not wearing a seatbelt: do so at your own risk and possible injury.

Accuracy different for diff people (1)

xot (663131) | more than 10 years ago | (#8368902)

Would'nt accuracy differ from user to user? For a user who receive almost no spam and likes to keep his mail clean wouldnt the anti-spam learn to delete stuff that is just being cleaned and is not spam?
And also i'll be the one to judge its accuracy as ONLY I know what my spam is.

knowspam.net (2, Interesting)

flyingrobots (704155) | more than 10 years ago | (#8368910)

I still think it is the best 'filter' available, since filtering is a lookup into a database of 'good senders' http://www.knowspam.net [knowspam.net]

actually (5, Funny)

Digitus1337 (671442) | more than 10 years ago | (#8368913)

it's not that humans are not as accurate, it's that 1 in X times we really do want a mini camera or free porn. It is what seperates us from those cold, heartless machines.... mini cameras and porn....

Re:actually (2, Funny)

Deraj DeZine (726641) | more than 10 years ago | (#8369006)

What about that 1 in 6250 for the automated filters? Your computer might be spying on you at this very moment!

This is indeed a disturbing development.

It could be more accruate than human (0)

Anonymous Coward | more than 10 years ago | (#8368916)

I use sa, and still get about 200 spams a day. Every once in a while, while deleting spam, I accidentally open one up. I imagine I have deleted non-spam mail too.If you count this human error, these methods could actually be more accurate than humans.

News story Headline (3, Funny)

tacokill (531275) | more than 10 years ago | (#8368926)

My Machine outhinks me!!"

I've seen better stories in Highlights for Children

I'm sure they're great, but... (5, Insightful)

LesPaul75 (571752) | more than 10 years ago | (#8368935)

I'm also sure that Yahoo's "SpamGuard" was great when they first introduced it. Now, It catches roughly half of all the spam I get. Why? Because people have figured out how it works and taken advantage of it. The same will happen with any content-recognition-based spam software. In the extreme case, even if a piece of software were 100% accurate at saying "This piece of e-mail looks like spam," then spammers would just make their e-mails look exactly like e-mail from one of your buddies. How could software ever tell the difference between:

Hey, dude, check out this website I found. There are some hot naked chicks and stuff. Sweet.
Signed,
Your Buddy


and

Hey, dude, check out this website I found. There are some hot naked chicks and stuff. Sweet.
Signed,
SpamKiddy


Even a human can't tell the difference. The only real difference is who they're from.

Re:I'm sure they're great, but... (0)

Anonymous Coward | more than 10 years ago | (#8368995)

if your friends send mail that look like spam, get new friends.

Dup filters (1)

Tablizer (95088) | more than 10 years ago | (#8368945)

I am testing a dup filter for slashdot stories.
It is 99.9% accurate.
It is 99.9% accurate.
It is 99.9% accurate.
It is 99.9% accurate.
It is 99.9% accurate.
It is 99.9% accurate.
It is 99.9% accurate.

oh yea.... 99%? (-1, Offtopic)

segment (695309) | more than 10 years ago | (#8368952)

Lets....

4275742e2e2e2063616e206974206361746368746869732121 21

Meh. (0)

Anonymous Coward | more than 10 years ago | (#8368955)

Yes, we all agree that being better than a human is damn near impossible.
Great.

Still better than the pay-for-email thing.

Honestly, who wouldn't rather delete an email or two
a week about their penis than pay for every message they send?
If pay were required for email, a new kind of electronic mail would develop.

Once again, the old saying was right (0)

Anonymous Coward | more than 10 years ago | (#8368964)

"Once robots outlaw humans from detecting spam only spam detecting outlaws will be human robots." or something like that.

How exactly did that work? (1)

Stevyn (691306) | more than 10 years ago | (#8368966)

Okay, so someone let 1 or 2 go during a test of over 6000 emails. I'd like to see their faces when the testers told them that their mother telling them to enlarge their penis was spam. I'd actually like to see that email that they thought was legitimate but in fact some nigerian asking for $5000 to "buy" $1,000,000

Here's the real test (2, Interesting)

Otter (3800) | more than 10 years ago | (#8368972)

I'm very happy with POPFile but there's one thing it just can't handle -- bounces from spam with my domain forged in the header when the original text isn't included. And how could it know? The response is the same whether it's to my mail or to spam. The domain is a clue, I guess, but otherwise it seems like an impossible task. I just let them be sorted into my inbox and delete them manually.

If these filters can hit 99.99% with those, I'd be quite impressed.

Adaptive adversaries (5, Insightful)

Pendersempai (625351) | more than 10 years ago | (#8368977)

It's really easy to design an effective solution when the problem is purely mechanical or natural. As long as you're working with spammers who don't adapt, you can slice through their shitstorms very effectively.

But when a single solution becomes mainstream, spammers will adapt to it. Bayesian filters tend to work very well, but now spammers are adding sprawls of randomly generated green-light text to offset the filter's score.

Google found an excellent way to rank websites, but then it became widespread enough that webmasters began to game the system it had created. It's been playing catch-up ever since.

Once the adversary begins to adapt, we lapse into the same cat-and-mouse game of technological barriers and counter-barriers that we've seen so many times before.

Bad science (1)

bkhl (189311) | more than 10 years ago | (#8368992)

What kind of stats is this? I would guess that the selection of what mails to receive he user makes would be the definition of accuracy here.

Could somebody explain this to me... (5, Interesting)

heldlikesound (132717) | more than 10 years ago | (#8369013)

I order all kinds of stuff online, wouldn't the receipt emails look like spam? My current spam solution is very simple:

1. display my email online as little as possible

2. use a number of addresses that all filter into one account, then filter by the sent-to address... this has turned up some VERY interesting results, for instance. I used dellorders@mydomain.com for an order from Dell, and NEVER used it or even typed it anywhere again, and started get spam about 6 months later, and I mean the nasty stuff, no just innocent stuff from Dell resellers...

3. i built a rudementary filter that looks for viagra,free,debt,enlarge, etc... if the sender is not in my address book, and the email contains these words, it is sent to a "check these out" folder...

How might a spam filter help me out without zapping confirmation type emails?

Operating on a different scale... (2, Interesting)

ptolemu (322917) | more than 10 years ago | (#8369015)

I think these guys are trying to put the focus on the server side of things where they emphasize greater speed and efficiency in eliminating spam from a large number of accounts as opposed to a single one. Just out of curiosity, do Thunderbird and iMail use similar filtering techniques with their junk mail controls?

This is just carp. (3, Insightful)

corian (34925) | more than 10 years ago | (#8369021)

Spam is what is defined by humans as Spam.

To determine the accuracy of a spam detector, it is necessary first to come up with a sample of what is or isn't Spam. (I'd assume a human would do this?) So the best result we can get be evaluating humans is how often they agree with the result of the initial label.

This figure probably won't be 100%. People have slightly different concepts of what mail is requested vs. unwanted, and what is advertising or useful information. So there is a valid possibility of disagreement.

That doesn't mean humans can't do the job accurataly. (After all, if they couldn't, then the initial human-made labels would themselves be wrong and any data based on them meaningless!)

If the training data is labeled with the same criteria as the test data, it is obviously possible that a trained system can acheive results which more closely agree with the test data. They are being trained on similiar data. But that doesn't mean that the system is MORE accurate at detecting spam than humans. It means that the system agrees with a particular human (or set of humans) more than other people do in a labelling of spam/non-spam.

For all we know, the evaluators idea of spam is "wrong".
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?