Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Proving Which Spam Filters work Best

samzenpus posted more than 8 years ago | from the get-rid-of-it dept.

263

pirateninja writes "Dr. Gord Cormack decided to find and prove what the best spam filter is. In his study he looked at the major spam filters (DSPAM, SpamAssassin, etc.) along with those submitted by various academics. The results are quite surprising, with a previously unheard-of spam filter, which uses ideas from various compression algorithms, performing the best overall. He recently presented the results and methodology used in a presentation titled 'Spam Filters, Do they Work? and Can you prove it?'" Note that this is a video of his presentation.

Sorry! There are no comments related to the filter you selected.

Not at 400 (0, Informative)

Anonymous Coward | more than 8 years ago | (#15837325)

400 Megs that is.......

Spamming Lot. (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#15837335)

"Dr. Gord Cormack, decided to find and prove what the best spam filter is."

Will it keep me from getting first post?

Easier? (2, Insightful)

Ec|ipse (52) | more than 8 years ago | (#15837338)

Isn't there an easier way to display the results, liek a chart or something. 400M per file download is a bit extream.

Harder! (5, Funny)

Profane MuthaFucka (574406) | more than 8 years ago | (#15837466)

I uuencoded the video file, translated it into Sumerian cuneiform, and pressed it into a billion little clay tablets. They are cooking in my oven right now. Now, the Internet is NOT some kind of truck you can just dump stuff onto, so if you want to get the data you're going to have to come to my house.

Re:Harder! (4, Funny)

rts008 (812749) | more than 8 years ago | (#15837509)

I can't come to your house, you insensitive clod!, teh tubes are clogged with clay tablets!

I won't be able to download my internet until Friday now!

Turn that crap down, and get off of my lawn! Damn kids!

Re:Harder! (1, Funny)

Cylix (55374) | more than 8 years ago | (#15837591)

Excellent...

By chance, are you nearby?

I have a wonderful set of wikipedia tablets I made and I'm eager to offload them...er I mean... trade them.

It's the updates you see, I've been having a bit of a nightmare trying to keep them all in sync.

Re:Harder! (1)

ciscoguy01 (635963) | more than 8 years ago | (#15837730)

Now, the Internet is NOT some kind of truck you can just dump stuff onto, so if you want to get the data you're going to have to come to my house.

No, I understand the internet is actually a series of tubes, and there will be hell to pay if they get "full".

In my experience... (4, Informative)

vivin (671928) | more than 8 years ago | (#15837340)

... the ones which have worked best (for me) are Bayesian Spam Filters [wikipedia.org] (A Plan for Spam [paulgraham.com] , SpamBayes - a free filter [sourceforge.net] ) and CRM114 [sourceforge.net] The Controllable Regex Mutilator (Paul Graham mentions it here [paulgraham.com] ). I've always had a very high success rate with these.

Re:In my experience... (3, Insightful)

coffeeisclassy (991791) | more than 8 years ago | (#15837371)

Whats surprising is, while Bayesian spam filters work well in his tests, the one that performs the best was never really heard of before.... I wonder how long it will be before we see something using the methods available, who wants to bet OpenSource will beet closed source to implementing this?

Re:In my experience... (5, Funny)

ozmanjusri (601766) | more than 8 years ago | (#15837378)

I've always had a very high success rate with these.

I haven't tested this one myself, Barrett Filter [barrettrifles.com] but I understand it is 100% effective at reducing spam from known sources. False positives may be a problem, however.

Re:In my experience... (1)

emag (4640) | more than 8 years ago | (#15837457)

Just repeat after me... "They're comin' right for us!"

oh, wait, you can't use that anymore. Try "Aw, look, they're starvin' to death! We have to thin the herd!"

Re:In my experience... (1)

TorKlingberg (599697) | more than 8 years ago | (#15837542)

SpamAssassin uses Bayesian Filtering as well as other methods.

Re:In my experience... (4, Informative)

Red Alastor (742410) | more than 8 years ago | (#15837652)

I like popfile because it's a bayesian filter that sorts into any arbitrary categories you want, not just spam and ham.

http://popfile.sourceforge.net/ [sourceforge.net]

Re:In my experience... (0, Redundant)

Anonymous Coward | more than 8 years ago | (#15837697)

I have a simple, foolproof idea to help eliminate spam.

Email certification.

If you want to be able to send Certified Email (CE), you apply for Certification from the company that gives you internet connectivity. They check you out, and 'Certify' you as being a legitimate emailer (ie: not a spammer). Then, you generate a private/public key pair and give them the public one. In the headers of all your email, is their certification, and an encrypted header line that's createdusing your private key.

When email arrives at the recipients server (or this could be done at the client level, as well), the server sees the certification, and connects to the certifying server to get your public key. It attempts to decrypt the header line. If it does it marks the email as 'certified', if it cannot, it marks the email as 'uncertified', and the email client can be programmed to filter messages based on that.

Due to the public/private key cryptography, there can be no certified email spoofing. (Assuming the private keys are secure, the keys are of decent length, etc.) All emails are traceable back to the originating server. CORRECTION- all CERTIFIED emails are traceable. Anonymous email is still possible. People can still set up email servers for mailing lists without "having" to get them certified. And people can still receive non-certified mail.

If an email server sends out spam, the complaints go to it's certifier. They can drop the certification, deleting the public key from their server. When this happens, ALL the email from the spamming server is now 'uncertified', and gets handled accordingly by email clients. If nothing is done, complaints go to THEIR upstream, etc. Individuals and groups can keep their own blacklists, if they wish, and anyone can choose to filter emails according to those lists.

Now, I've looked over that 'form email' that people like to post to shoot down anti-spam ideas. And nothing applies to this idea. (If something seems to apply, it's because I either left out details, or explained something wrong.) This idea does NOT need to be universally adopted, nor does it need to be adopted by everyone all at once. It's primarily a way of reliably tracing (certified) emails back to their originating server. The anti-spam part comes later: if you receive certified spam, complain and get the server un-certified. If you receive un-certified spam... well, just have your email client dump all uncertified emails in the trash. (Not nessisarilly, you could just use it's un-certifedness as a factor in filtering your email.)

This idea does not require anything be changed with SMTP. It simply requires a second connection be made to the certifying server. Now, before you bitch about the extra bandwidth, I'd like to remind you that, once this idea catches on, spam will be greatly reduced. This reduction will MORE than make up for the slight increase in bandwidth created in querying the certifying servers. Also, the certifying servers can set time limits on when the certifications expire, and need to be re-downloaded (kind of like DHCP leases). A 'new' company that just applied for certification might have it's certificate set to expire almost instantly. This way, every email they send requires a download of the certificate. This allows the certificate to be pulled rapidly if they start spamming. After a month or two, it could be set to expire weekly or monthly.

To sum up: Email Certification is reliable way of tracing the certified emails back to their originating server. This allows spammers to be identified unequivocally, and have their certification pulled. Email servers are NOT required to be certified, and anonymous email is still possible. Email recipients can, if they choose, set up their client to send uncertified emails to the trash, or to handle them however they wish. White lists and black lists are still possible. 'Hobby mailing lists' are still possible, certified or not. The extra bandwidth is minimal, and easily overshadowed by the reduction in spam being send once spammers realize no one is even seeing, much less reading or replying to their spam.

Re:In my experience... (3, Insightful)

I!heartU (708807) | more than 8 years ago | (#15837892)

Domain keys... now just get everyone to use it.

Re:In my experience... (1)

Haeleth (414428) | more than 8 years ago | (#15837951)

How does your email certification scheme prevent malicious false reports of spam from causing lazy certification providers to incorrectly revoke the certification of innocent users, leading either to false positives or to the usefulness of certification being largely lost?

Why not just douse the server in gas... (3, Funny)

shotgunefx (239460) | more than 8 years ago | (#15837343)

400MB?

Why not just douse the server in gas if you want to see it melt.

Re:Why not just douse the server in gas... (5, Funny)

Tsiangkun (746511) | more than 8 years ago | (#15837368)

I'm getting 8kb/s downloads from the site, it's just like the good old days !

I'll post more next week after I watch the video.

Re:Why not just douse the server in gas... (2, Informative)

coffeeisclassy (991791) | more than 8 years ago | (#15837417)

Its round robin mirrored accross a whole bunch of different servers so if youre only getting 8kb/s you could try cancelling and downloading again and seeing if it goes faster.

Re:Why not just douse the server in gas... (1)

Tsiangkun (746511) | more than 8 years ago | (#15837478)

whoa, thanks,
I was happy to have a connection, so I was letting it run.
But now I'm downloading at a much more reasonable speed.

Re:Why not just douse the server in gas... (1)

darkfish32 (909153) | more than 8 years ago | (#15837514)

Funny, I don't know whether traffic has died down, they've increased bandwidth, or it matter of inter-university connections, but I'm getting well over 200KBps.

from china with love (0, Redundant)

nihaopaul (782885) | more than 8 years ago | (#15837579)

44% [===============> ] 220,996,832 89.21K/s ETA 48:16

almost got it!

Re:from china with love (1)

afaik_ianal (918433) | more than 8 years ago | (#15837761)

Downloading in china:
"Progress: 220,996,832 of 400,000,000 bytes. Did we say 400,000,000? What we meant was 320,000,000... Yes, that's right!"

Re:from china with love (0, Offtopic)

nihaopaul (782885) | more than 8 years ago | (#15837779)

whooo, ok that slashdot lameless filter error is a `p155` me off

85% [=========> ] 424,250,976 30.16K/s ETA 18:54

almost have it!

Under present IST policy... (3, Funny)

patio11 (857072) | more than 8 years ago | (#15837407)

... they are not allowed to douse the servers in gas.

Combo of SpamAssassin and Spamhaus (2, Interesting)

hyperion454 (766214) | more than 8 years ago | (#15837361)

At work we've set up a combination of SpamAssassin and Spamhaus. Personally I've went from about 10 spams per day to about 1 every two weeks.

Re:Combo of SpamAssassin and Spamhaus (1, Insightful)

b0r1s (170449) | more than 8 years ago | (#15837392)

Bah. We use Spamassassin, multiple DNSBLs, and I still get hundreds per day, most of them to addresses published on websites (unavoidable).

The key is still: don't give out your address. Once you've done that, you're going to be screwed eventually.

A good DUL helps (1)

winkydink (650484) | more than 8 years ago | (#15837414)

DUL = DailUp List... a bit of a misnomer as it commonly refers to all dynamic hosts. My spam went down dramatically after starting to use Trend's DUL (formerly MAPS). Alas, it's a pay service, but it all comes down to your pain threshold. Mine is low relative to my income.

Re:Combo of SpamAssassin and Spamhaus (2, Informative)

emag (4640) | more than 8 years ago | (#15837432)

And turn off SMTP VRFY. Either that, or having windows systems @ my ISP managed to get the address associated with my account on spam lists. This is an address that's *only* used internally by my ISP (I use pobox or my own domain whenever someone asks for an address). Even that wasn't enough to provent it from getting harvested. :-(

Re:Combo of SpamAssassin and Spamhaus (1)

Etcetera (14711) | more than 8 years ago | (#15837879)


And turn off SMTP VRFY.

SMTP VRFY (or recipient-checking at the SMTP level in general) being disabled is pointless. Given a choice between allowing people to not send mail to invalid addresses or having to deal with bounce-back scatter and getting your MX server blacklisted for third-party spam, I'll take the former any day.

And I'd wager anyone who's had to admin a qmail server and decide which (if any) recipient-checking patch to use would feel the same way.

It's far less load on the servers to have a more expensive spam identification process on the back end, than have to deal with the billions of messages generated by a dictionary attack on the front end.

Re:Combo of SpamAssassin and Spamhaus (0)

Anonymous Coward | more than 8 years ago | (#15837907)

If your internet-facing mail hosts are capable of responding accurately to VRFY queries, then they're capable of rejecting mail for invalid recipients at the RCPT stage just as easily. Generating the bounce is then up to the client speaking to your server.

I don't see how enabling SMTP VRFY does anything to reduce backscatter etc.

Re:Combo of SpamAssassin and Spamhaus (2, Insightful)

antifoidulus (807088) | more than 8 years ago | (#15837472)

Heh, even if you are reasonably diligent in protecting your email address, 9/10 it will still get out(though maybe not as bad). All it takes is one recipient with a compromised windows box and your address can be all over the spammers lists in no time.
Or, as in my case, you could assume that a university you apply to will not send out a giant mass email to all the incoming graduate students inviting them to the graduate orientation. So now I have the email address of every grad student entering the University of Minnesota this year(and probably a few that aren't) and they have mine. All it takes is one infected box and my previously spam-free gmail account will no longer stay that way. The kicker is that I decided not to go to UMN because they didn't offer me funding...oy!

Fantastic Spam Filters Which Work Best Proving! (5, Funny)

_vSyncBomb (50710) | more than 8 years ago | (#15837375)

Hey Slashdot, what's up, man! Dude, I read your thing and like totally agree about Best Work Proving Spam Site Work! Dude, that's awesome!

Bro, in the same vein, I was totally checking out this dope ass site [microsoft.com] which you might wanna check out [doubleclick.net] too man. Guys like us that dig Spam Which Proving and Best work Filters will be all over this before long...

OK, man take care until I see you this Friday at the dinner thing, Slashdot!

Cheers,
John

Re: Very Interesting And Generally Really Amusing (5, Funny)

Anonymous Coward | more than 8 years ago | (#15837473)

Hey _vSyncBomb,

  Having trouble pleasing your woman? I've got something Very Interesting And Generally Really Amusing that you could try!!!

Your buddy,
_vAnoymousCoward

Amusingly, POPFile caught you (4, Interesting)

patio11 (857072) | more than 8 years ago | (#15837987)

I ran your message through a perl script to mail it to me for giggles (I do research on spam filtering at ye olde day job). Regretfully, you didn't make it through. Aside from header garbage, which was a mixed bag (half spam tokens, half "known-good automated email" tokens), you ran into problems with dope, ass, wanna, and... work*. Which is just as well, as I have no desire to speak to anyone who uses those words. * Last 15 occurrences in my mailbox are all of the "Make l0ads of $$$ work @ h0m3!" variety.

RTFA? (4, Insightful)

glowworm (880177) | more than 8 years ago | (#15837408)

So, how are we supposed to RTFA then the FA is over 470MB and a video file. Why not just a nice simple text summary Mr Submitter, but nooooo that would just be too easy!

Re:RTFA? (5, Funny)

emag (4640) | more than 8 years ago | (#15837475)

"We are sorry that these talks are not available as plain HTML, PDF, or text, however under present IST policy we are not allowed to provide plain HTML, PDF, or text."

Re:RTFA? (1)

Enderandrew (866215) | more than 8 years ago | (#15837522)

Yes, but the person submitting the story to Slashdot when preparing their little blurb could have spilled the results.

Re:RTFA? (1)

cerberusss (660701) | more than 8 years ago | (#15837670)

Yeah now the tubes are full again.

Not surprising... (4, Insightful)

RealGrouchy (943109) | more than 8 years ago | (#15837433)

Although I haven't WTFV (watched the video), it doesn't seem surprising that spam filters which use techniques that aren't used widely would be most successful.

If they aren't used widely, it would either be because they don't work, or they do work but they haven't caught on [yet].

It's like any other fad. As an example, when the original Survivor series came out, it was really popular because it achieved its goal (attracting viewers) in a way that was original. Heck, even I watched the original one. Now that all the networks are doing the reality TV thing, it has become hackneyed, and each successive version of survivor does a worse job of achieving its goal. And I've given up watching TV.

With antispam, new techniques are effective, but as they become more popular and more widely used, spammers will find equally innovative ways of getting around them.

I've noticed that at any given time, there will be a particular style of (non-blank) spam that manages to get through Gmail's filters fairly consistently, but every now and then Gmail adapts its spam filters to block the successful spam type of the season, and eventually a new type will make its way through.

- RG>

Re:Not surprising... (1)

Tweekster (949766) | more than 8 years ago | (#15837526)

Spam is easy to take care of, well 99% of it. the rest isnt a big deal so who cares.

My office went from 2000 spam mails a day to about 10. across 15 employees. Who gives a crap about the 10 emails remaining...

I only wish it could be taken care of upstream further to shut those pricks down. but for the end user in an admins perspective, most systems are pretty easy to deal with (particularly small offices)

Re:Not surprising... (1)

laa (457196) | more than 8 years ago | (#15837582)

Yupp I agree. My personal spams index (spams per day) has slowly risen the last few years and is now just below 400. Of the weekly 2000 spams around 4-5 pass SpamAssassin. So far these year I've had one false positive (that I know of), but browsing through the >80000 spams isn't that fun so I rarely do it.

It's a silly waste of bandwidth sending me all those viagra commercials, but at least they're easy to get rid of.

Picking nits (0)

Anonymous Coward | more than 8 years ago | (#15837620)

Survivor didn't spawn the current reality TV craze -- Who Wants to be a Millionaire did. (Though Survivor was already in development).

Got to go with Brightmail (4, Informative)

saha (615847) | more than 8 years ago | (#15837437)

We use Brightmail [brightmail.com] on our campus and our users love it with its very low false positive and pretty accurate flagging of SPAM. Another campus uses DSPAM and some people are up in arms at the prospect of losing their Brightmail to switch to DSPAM. Personally, DSPAM isn't nearly as good and has flagged many legitamate messages and sent them to the Junk folder.

I also echo a gripe of other posters. Its nice to have a video but 500MB video file it a bit much. A 50KB pie chart or bar graph would have been nice.

Flaw in the test (5, Informative)

lheal (86013) | more than 8 years ago | (#15837444)

The spammers actively try to subvert the more popular filters. That gives a lesser-known one a decided advantage, one which will go away as it becomes more popular.

As with most choices like this, factors such as ease of use, speed, and resource efficiency can overshadow selectivity. No system is perfect, so it's perfectly reasonable to go with a system that's pretty good if you already are using it, rather than switching to the latest cool thing.

I have found that using two dissimilar systems in a chain is quite effective.

Re:Flaw in the test (1)

MadAhab (40080) | more than 8 years ago | (#15837480)

Excellent point.

And that applies to spam filtering techniques as well - it's like anti-biotics. For serious stuff, a spread attack is a good idea.

I've found that using RBLs, SpamAssassin, and Bayesian filters prevents 99.5% of spam with essentially no false positives. And that means, by my day-to-day experience with addresses spammed for a full 10 years now, that instead of getting 100 spam and one real mail, I get 1 real mail, and once every could of days a spam that gets through.

Except for earlier this year. The RBLs went a little nuts, probably in response to some spam onslaught, and generated a few false positives.

Re:Flaw in the test (2, Insightful)

Jeffrey Baker (6191) | more than 8 years ago | (#15837564)

The problem with the spam filters, which you have stated, is that eventually a spammer figures out how to craft a spam which avoids the feature detection systems. Right now there's some zombie network sending around a stock market scam, of which I am getting roughly 300 copies per hour, even though spamassassin correctly classifies virtually all other unwanted mail.

Lately, I've been thinking about this problem a lot. The classic method of computer classification systems (Bayes, SVM, whatever) are all based on trying to detect features in a set of objects which separate the objects into two classes. But there is only one feature which is shared by all spam, and which is not shared by mail I wish to receive: all spam is sent by assholes. The problem is, you can't algorithmically detect the asshole coefficient solely from the contents of an SMTP transmission. Therefore I have recently come to the conclusion that we need to revert to a web of trust for accepting email. I have long avoided webs of trust because they seem difficult to manage, but I've come to believe that they are the only way to solve this spam problem.

Re:Flaw in the test (1)

jcr (53032) | more than 8 years ago | (#15837608)

Right now there's some zombie network sending around a stock market scam, of which I am getting roughly 300 copies per hour, even though spamassassin correctly classifies virtually all other unwanted mail.

If you're talking about spam with the pump & dump message in an image, and random-words text, I'm getting about a dozen of those a day. They're one of three types that's getting through my filters currently. 300 copies per hour would make me just about ready to kill somebody.

I have long avoided webs of trust because they seem difficult to manage, but I've come to believe that they are the only way to solve this spam problem.

Well, I'm also in favor of hiring goons to change the cost/benefit equation for the spammers.

-jcr

Re:Flaw in the test (1)

prandal (87280) | more than 8 years ago | (#15837982)

Use SA 3.1.4 and run-sa-update.

Theo van Dinter added a rule to catch these to the core rules on Tuesday.

Re:Flaw in the test (0)

Anonymous Coward | more than 8 years ago | (#15837546)

I love this comment. Put it in this anti-spam context and people nod their heads sagely. Put it in an OS thread and do a s/spammers/crackers and s/filters/OSes, and people start foaming at the mouth :)

obscurity (1)

TheSHAD0W (258774) | more than 8 years ago | (#15837451)

It may not be coincidence that a little-known filter algorithm produces the best results; many spammers probably test their spew on the more popular filters to try and fool them. If this new filter becomes more popular you may see its reliability decay.

Re:obscurity (1)

pe1chl (90186) | more than 8 years ago | (#15838007)

This is very true.
I have a successful spamfilter deployed at work. It uses SpamAssassin for the backend filtering, but that part has to do very little.
The bulk of the rejecting is done in the dedicated SMTP engine that receives the mail. There is a lot of information to be deduced from the SMTP transaction itself, which is normally not used by spamfilters.
Close adherence to RFC standards is something that most SMTP servers have achieved quite well, and the tools the spammers use are very bad at it.
I know several "bugs" in those spamtools that make them easy to identify and make it simple to discard spam without even receiving the body.

But unfortunately, when widely releasing such a filter the spammers would of course fix the bugs, and the effectiviness of the filter would be gone.

whitelist (0)

Anonymous Coward | more than 8 years ago | (#15837452)

whitelist

fuck power went out! (-1, Offtopic)

coffeeisclassy (991791) | more than 8 years ago | (#15837461)

OMG the power went out on campus slashdot you pawn n00bs :-)

Re:fuck power went out! (1, Funny)

lewp (95638) | more than 8 years ago | (#15837559)

I think it's trying to communicate with us...

Re:fuck power went out! (0, Offtopic)

TheShadowHawk (789754) | more than 8 years ago | (#15837643)

Oh my god.... that was funny... good thing I didn't have a mouthful of tea then... :)

Little known systems will often be most effective (1)

thesleepylizard (929953) | more than 8 years ago | (#15837462)

Against viruses and spam. For obvious reasons - hackers and spammers put their efforts into circumnavigating the major systems since this will maximize the impact of their work. That's why smaller anti-suckware products will often do a better job since the focus isn't on them.

It leads to a sad but inevitable cycle of products being improved, gaining popularity, then losing their effectiveness since they are now a bigger target.

At least, until a watertight (rather than guess-work) solution is found. I believe this is impossible without changing the way email works at a fundamental level. Even the much praised challenge-response is subject to email spoofing.

Reminds me of why I like living in Australia - globally speaking we're relatively irrelevant, making us a relatively small target. Hopefully we'll stay relatively irrelevant, lol :p

I got the 400M download! (3, Funny)

Ossifer (703813) | more than 8 years ago | (#15837474)

And I printed out every frame so I could scan them. I'll be posting the TIFFs on my website shortly...

Spam Ass Asian? (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#15837484)

SpamAssasian

WTF?

"SpamAssasian"? (0)

Anonymous Coward | more than 8 years ago | (#15837485)

Yes, I do get a lot of spam dealing with asses of the Asian variety. Luckily, most of it is tagged as such by Gmail's filter.

Torrents (1)

shack420 (821947) | more than 8 years ago | (#15837487)

I see that the organization is not authorised to host a torrent. Would it be possible for someone who has downloaded the video to put one up somewhere? Id be interested to see what kind of speed we would get out of a /. torrent too...

Re:Torrents (2, Interesting)

Pantero Blanco (792776) | more than 8 years ago | (#15837519)

I wonder how hard it would be for Slashdot/OSTG to host a tracker for large, article-related files like this. I don't think it would require a lot of funding to run, and it would certainly help with convention presentation videos.

I have one word: (1, Offtopic)

get quad (917331) | more than 8 years ago | (#15837508)

Postini.

Re:I have one word: (1)

bitserf (756357) | more than 8 years ago | (#15837530)

IronPort [ironport.com] works extremely well too. Be prepared to pay enterprise prices though.

Re:I have one word: (0)

Anonymous Coward | more than 8 years ago | (#15837567)

as an ISP i use them, they arent that hot. and gl trying to resolving issues regarding authentication, or anything for that matter.

Re:I have one word: (2, Informative)

Jeffrey Baker (6191) | more than 8 years ago | (#15837598)

I hope you also have another word, because the Postini service is incredibly bad. I had it enabled on my account at acm.org, and the Postini system was generating roughly one false positive for every 10 true positives. I disabled the Postini filtering and started using Spamassassin. Both the false positive and false negative rates are much improved. Among the traffic that Postini was flagging as spam were the Wikipedia article of the day, my daily email from musicbrainz.org, all messages to the BATN mailing list, many replies to my items for sale on craigslist, and other kinds of completely legitimate traffic. Among the mail they chose to deliver were messages in Korean, Cyrillic, other scripts I can't read, and known viruses.

Their main problem is the system doesn't learn. Using their web interface, I look through the spam folder and request delivery of all the false positives. The next day, nearly-identical mails are still generating false positives. You'd think it would be easy these days to design a filter that learns from negative reinforcement.

Re:I have one word: (1)

slugstone (307678) | more than 8 years ago | (#15837852)

I hate Postini interface. It is not very user friendly. At least with spamassassin you can build the interface you like

I hate to say this but... (1)

Crydee (988915) | more than 8 years ago | (#15837518)

I never used email because of the spam problem and the rampant use of IMs but once I started using G-Mail I never get spam in my inbox and my instant message time has dropped 70% I'd say. Whatever G-Mail uses is the one I would use if I was using a client to download my emails.

An alternative format (1, Funny)

Anonymous Coward | more than 8 years ago | (#15837525)

All someone needs to do is rig this video up to the wonderful Microsoft Voice Recognition software, and then post the resultant transcript. Surely it won't have that many errors...

text versions of the material (5, Informative)

martin-boundary (547041) | more than 8 years ago | (#15837527)

For those who don't relish downloading 400MB worth of video (why can't somebody cut out the audio as a standalone MP3?), the material of the talk is also available in text mode.

The official tests of spamfilters were done in last year's TREC conference, you can read the writeup here [uwaterloo.ca] (or pdf overview [uwaterloo.ca] ).

You can duplicate those tests yourself if you download the evaluation toolkit (GPL) [uwaterloo.ca] . It's a modular system where you can add a mail corpus (either one of the public TREC ones, or you can make your own trivially), and add a spamfilter package (there are 10 or so to download from the web, or create your own as per documentation).

There's also a video talk [researchchannel.org] given at Microsoft research which should cover pretty much the same ground, if text mode is slashdotted :).

There's a new scheduled test towards the end of the year at TREC 2006.

just got this in my inbox (1)

noneme (917222) | more than 8 years ago | (#15837549)

from: gordy to: me

SPAM FILTERS WORK

important filters - SPAM

Download Spam Filters in a number of formats: ,XviD(473M) ,DiVX(473M) SEX ,MPG(472M) ,OGG/Theora(481M) ,Real Media(471M) ,WIN ,Windows Media(476M) ,FREE ,SEX ,WIN

BUY SPAM FILTERS

Gord Cormack talk about the science, logistics, and politics of Spam Filter Evaluation.

Only one question... (1)

fm6 (162816) | more than 8 years ago | (#15837562)

Is there any filter that doesn't give false positives? I don't mean "almost none", I mean zero . It isn't a matter of "holding out for perfect". Some of us simply can't afford to have a key email discarded as "spam".

Re:Only one question... (2, Insightful)

Jeffrey Baker (6191) | more than 8 years ago | (#15837584)

There is no classification system with zero real risk, except for delivering all mail to the Inbox. Sorry.

If your mail is that important, you should be using couriers instead of email.

Human classification is not zero risk (1)

patio11 (857072) | more than 8 years ago | (#15838002)

How many spam do you get a day? I get hundreds. Half of them are not in my native language (much like half the mail in my inbox), which means it takes more than a split-second glance to figure out what is going on. I'd guess my accuracy in split-second decisions is probably on the order of 95%, which if I were a spam filter would earn me a D-. Paul Graham, who probably has more typical email habits when compared with the average Slashdotter, says he misses about 3 per 2,000. http://www.paulgraham.com/wsy.html [paulgraham.com] There are systems which are better than that.

In Soviet Spam Filter, the computer doesn't trust YOU to filter the email.

Re:Only one question... (0)

Anonymous Coward | more than 8 years ago | (#15837589)

Nonsense. Not even a human brain could accomplish that... and that's what Quarantines are for.

Re:Only one question... (1)

Cylix (55374) | more than 8 years ago | (#15837600)

Well,

You could have it only filtered completely if it's suspect rating is high enough and then otherwise just tag it if the rating is below a certain point.

That said... white lists are your friends.

Funny thing though... someone forwarded me some "funny" e-mail and usually they are not that humorous. I was so damned pleased when it was filtered out.

That said, I haven't moved to deletion just yet. I just tag the mail and sort it later. As soon as I'm sufficiently happy with the system highly suspect mails can get purged auto-magically.

Re:Only one question... (1)

cruachan (113813) | more than 8 years ago | (#15837880)

Cloudmark's safetybar product (http://www.cloudmark.com/ - lousy name, SpamNet which it was before was far better) is just about perfect for me. I get an average of about 20 spam emails a day and it has a false positive result of 0% and has had for months. In fact I've been using the product for several years now and I think the last time I saw a false positive was a couple of years back.

On the efficiency side it has a hit rate of nearly 100%. I would have said it was 100% a couple of months back, but just recently it's been having a bit of a problem with one stock-pushing spam.

Anyway, that aside it's the best spam filter I've ever seen by a very long way, and I'd highly recommend the service. It costs a few $ a month, but it's probably the best value subscription I have.

I have no connection with the company, just a very satisfied customer. The P2P nature of the product places it outside the usual spam filters so it's often missed from reviews.

Re:Only one question... (0)

Anonymous Coward | more than 8 years ago | (#15837891)

Is there any filter that doesn't give false positives? I don't mean "almost none", I mean zero . It isn't a matter of "holding out for perfect". Some of us simply can't afford to have a key email discarded as "spam".

Yes, indeed! The filter is called "pass-through". It only has one down side, all spam passes too!!

Now seriously, consider using a white list of known good senders (your clients?), and use the spam filter with those not in the list only.

Ask Slashdot ... (5, Funny)

Anonymous Coward | more than 8 years ago | (#15837587)

Dear Slashdot,
At the university where I work, they have recently adopted a pesky policy banning the use of bitTorrent.
What can I do to fix [uwaterloo.ca] this ?
Yours faithfully,
Dr. Gord Cormack

Argh! Gratuitous Video! (1, Insightful)

abh (22332) | more than 8 years ago | (#15837632)

A 400mb video file? Is this a joke? WTF is everyone thinking that everything on the web needs to be on video all of a sudden. I just blogged about this today: http://www.anotherblogger.com/2006/08/02/please-no -more-gratuitous-videoblogging/ [anotherblogger.com]

Re:Argh! Gratuitous Video! (1)

Kredal (566494) | more than 8 years ago | (#15837659)

Thanks for blogging about it... but did it really have to be a video blog?

(just kidding)

Good job the I don't filter web content (2, Funny)

slayer99 (15543) | more than 8 years ago | (#15837654)

"In his study he looked at the major spam filters ( DSPAM, SpamAssasian"

Spam about asian donkeys is a new one on me, though.

Which spam filter won? (1)

LoneBoco (701026) | more than 8 years ago | (#15837704)

So... um... I really don't want to wait 8 hours or more to find out which mysterious and generally "unheard of" spam filter performed the best. Does anybody know where a text version of the results can be found?

MS Anti Spam... (1)

pookemon (909195) | more than 8 years ago | (#15837719)

I use the built in Spam filter in Exchange 2k3 set to level 8. All "filtered" e-mails are archived. I get maybe 3 or 4 a day (on a "bad" day) that make it through. Once a week (or more if I can be bothered) I view the archive and send on any that aren't spam (<1%) on and those that are spam get junked. I do this using a little tool I wrote that displays the From, To and subject of all these e-mails. If I can't tell from these fields whether the e-mail is a SPAM or not (and it generally is anyway) then I can view the contents of the .eml file.

P**s easy, effective and "Free".

Re:MS Anti Spam... (1)

KiloByte (825081) | more than 8 years ago | (#15837793)

Er, what? A false positive rate of 1:100!?!?

Usually, anti-spam solutions which give more than 1:100000 are considered worthless. What you're quoting is beyond words.

Re:MS Anti Spam... (1)

pookemon (909195) | more than 8 years ago | (#15837841)

A false positive rate of 1:100

No, better than 1:100 - that's what <1% means. It's actually around the 1:500

Usually, anti-spam solutions which give more than 1:100000 are considered worthless

Got links, or is that just your opinion?

no torrent? (0)

Anonymous Coward | more than 8 years ago | (#15837747)

Does anyone else find it mildly ammusing that U.W., one of the top tech schools in North America, due to their regressive policy disallowing the use of torrents, now has a server getting a proper slashdotting?

So what is the previously unheard of spam filter? (1)

jefp (90879) | more than 8 years ago | (#15837750)

Anyone care to post a link?

No bittorrent... No credibility (4, Insightful)

bgog (564818) | more than 8 years ago | (#15837752)

Why exactly should be give any weight to anything from and organization so ignorant as to disallow bittorrent? I take someone pretty darn ignorant to disallow a protocol because some use it to transport illegal content. Why havn't then banned TCP? It is an evil technology used every day to violate copyright.

This guy should spend his time educating the fools at his institution.

Possible Text Version (5, Informative)

sciop101 (583286) | more than 8 years ago | (#15837755)

On-line Supervised Spam Filter Evaluation
Gordon Cormack and Thomas Lynam

Full Text, May 29, 2006 - PDF Format

http://plg.uwaterloo.ca/~gvcormac/spamcormack.html / [uwaterloo.ca]

best ever (0)

Anonymous Coward | more than 8 years ago | (#15837762)

In my office, the IT department is so cool they implemented the best spam filter ever (when Email server is up): Manual Filtering. It's awesome. Some of us can trash all of our spam before we even read it by carefully reviewing the subject line and sender. We never have false positives, so we don't miss anything. Granted, most people spend 3+ hours a day Emailing, but its OK. We filter out all spam, never miss anything. Some people even collect spam instead of junking it.

Funny thou, we keep buying a particular brand of hard drives for Email storage in our servers. The IT guys keep talking about their sea gate retirement plans. Good to see they want to spend their late years in sunny mexican beaches.

GMail Spam Filter (5, Interesting)

foxylad (950520) | more than 8 years ago | (#15837790)

I use greylisting (gld to be specific) which works wonderfully. A couple of customers wanted even better filtering...

First I tried DSPAM, but they refused to train it so the results weren't good. Then I tried Spam Assasin, which also let through a suprising amount of spam - a lot more than my personal account on Gmail.

So I set up accounts on Gmail for them, and forwarded their mail to those accounts (after greylisting - don't want to burden GMail too much!). Gmail lets you set up forwarding, so I simply forwarded all the filtered mail back to a second account on my mailserver for the customer to pick up. Finally I wrote a python script that logs in to Gmail once a week to prevent the account being closed due to non-use.

A tad involved, but it works like a dream. Yet again Google comes out on top, this time in a market it doesn't even know it's in!

Spam Ass Asian (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#15837792)

Excuse me, I have to go to the bathroom.

So Which One Won? (2, Interesting)

ryanisflyboy (202507) | more than 8 years ago | (#15837903)

So which one is the "unheard of spam filter?"

Wouldn't it make sense to put this in the /. submission (or at least a link).

Did I miss the obvious "and the winner is..." some place?

Cloudmark's SpamNet (2, Interesting)

cruachan (113813) | more than 8 years ago | (#15837906)

I have to push this as it usually gets missed from reviews as it's a hybrid P2P solution and not a straightforward filter, but Cloudmark's safetybar product (http://www.cloudmark.com/) is just about perfect for me. I get an average of about 20 spam emails a day and it has a false positive result of 0% and has had for months. In fact I've been using the product for several years now and I think the last time I saw a false positive was a couple of years back.

On the efficiency side it has a hit rate of nearly 100%. I would have said it was 100% a couple of months back, but just recently it's been having a bit of a problem with one stock-pushing spam.

Anyway, that aside it's the best spam filter I've ever seen by a very long way, and I'd highly recommend the service. It costs a few $ a month, but it's probably the best value subscription I have.

I have no connection with the company, just a very satisfied customer who's been using it since the beta some years ago. I have a publically available email address which I've had for years and must be on many spam lists, without Cloudmark it would be unusable, with it it's no problem at all. I recently installed it for my wife who was starting to get a lot of spam - on that I noticed it took about two weeks to get it trained not to junk a few mailing list emails she was on, but after that it's been just as highly reliable as my installation.

A bittorent policy protest (1, Insightful)

Anonymous Coward | more than 8 years ago | (#15837919)

As you wonder how long it will take for 400MB file to come down at 1.5kB/s, a note from TFA:
We are sorry that these talks are not available through BitTorrent, however under present IST policy we are not allowed to run BitTorrent. We thank you for your understanding.
Erm.. This is more about a "take this policy and shove it" protest than content of the movie. I applaud their creativity.

Best spam filter. (1)

Viceice (462967) | more than 8 years ago | (#15837920)

IMHO, the criteria for best spam filter is very simple. It is the filter that is able to consistantly maintain the highest spam to false positive ratio.

Feel free to add to it. :D

Give grey listing a try... (1, Insightful)

xt (225814) | more than 8 years ago | (#15837928)

The more effective way I have found to stop spam is grey listing. In the last two months, I have had zero spam messages go through to my mail server. I use GSLT (http://www.xmailserver.org/glst-mod.html [xmailserver.org] ), which is mostly for the XMail mail server ( http://www.xmailserver.org/ [xmailserver.org] ) but will work anywhere.

You should also check this article http://www.freesoftwaremagazine.com/articles/focus _spam_postfix?page=0%2C0 [freesoftwaremagazine.com] , lots and lots of good advice on spam filtering.

Re:Give grey listing a try... (0)

Anonymous Coward | more than 8 years ago | (#15838013)

If you prefer greylisting and dspam, take a look at the Asas project (http://asas.rpath.org) which aims to provide a greylisting/dspam appliance and much more.

Out of Date and Worthless (4, Informative)

prandal (87280) | more than 8 years ago | (#15837975)

This paper's a complete waste of time.

He tested spamassassin 2.3 - that's ancient! I'd imagine the other tools are similarly obsolete.

We currently use SA 3.1.4 with a well-trained Bayes database and Razor, Pyzor, and DCC.

Throw in a few custom rules and a selection of rules from http://www.rulesemporium.com/ [rulesemporium.com] and the results are outstanding.

With the new sa-update feature the core rules are updated between point releases, which came in useful this week dealing with the new image spams which seemed to be designed to avoid detection by spamassassin. Thanks Theo.

And the folk on the spamassassin-users mailing list really rock.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?