Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Spam Detection Using an Artificial Immune System

timothy posted more than 8 years ago | from the lymp0cty3z-narf-poit!-claire-said-the-laundry-wheel dept.

114

rangeva writes "As anti-spam solutions evolve to limit junk email, the senders quickly adapt to make sure their messages are seen. an interesting article describes the application of an artificial immune system model to effectively protect email users from unwanted messages. In particular, it tests a spam immune system against the publicly available SpamAssassin corpus of spam and non-spam. It does so by classifying email messages with the detectors produced by the immune system. The resulting system classifies the messages with accuracy similar to that of other spam filters, but it does so with fewer detectors."

cancel ×

114 comments

Sorry! There are no comments related to the filter you selected.

Nice! (-1, Offtopic)

beacher (82033) | more than 8 years ago | (#15693830)

But where does the spam snot and phlegm go?

-B

Re:Nice! (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#15693880)

my new linux distro codenamed 'badger' will include a spamm-fighting module, codenamed 'medicine man' which is based upon these principles. for those of you not yet familiar with 'badger' or 'medicine man', it is a new linux distro aiming at performance, security, scalability and portability. 'badger', in short, rules and will be very secure and spam-free, out of the box.

The utility of newer systems (3, Informative)

CRCulver (715279) | more than 8 years ago | (#15693846)

I have to admit, I don't see the need for these recent whizbang's additions to the spam-fighting repertoire. Sure, they might be ingenious, but on a practical level they don't do anything more than a properly-configured SpamAssassin system. I used to get a lot of spam coming through a default installation of SpamAssassin, but after spending some time with O'Reilly's book [amazon.com] (the free docs may already be up to this level of reader-friendliness, it's been a couple of years) and tweaking my installation, I get spam once in a blue moon. There's just no need for anything more.

Re:The utility of newer systems (4, Insightful)

crotherm (160925) | more than 8 years ago | (#15694009)

I have to admit, I don't see the need for these recent wizbang horseless carriages. Sure, they might be ingenious, but on a practical level, they don't do anything more than a fine team of horses. yada yada

But seriously, your attitude is one that would stop all progress. This new method does the job more efficiently.

From TFA, The lightweight nature of this solution -- requiring significantly smaller number of detectors when compared to SpamAssassin -- will doubtlessly prove attractive to those looking to implement a server-based solution where processing overhead may well be an issue. A server-based solution would be a one-size-fits-all mold since the filter is not personalized and does not learn for each particular user, but the reduced processing and storage time makes such a solution attractive.


That sounds like a good reason for this research.

Re:The utility of newer systems (1)

TheOtherChimeraTwin (697085) | more than 8 years ago | (#15694076)

Wow, I wished spamassassin worked that well for me. Mind you, it does get rid of most of the junk, but I still get a fair amount of spam that slips past SA. Every few days, I'll even get spam that has a score of exactly 0!

Re:The utility of newer systems (2, Interesting)

a_n_d_e_r_s (136412) | more than 8 years ago | (#15694222)

Good spammers run their spam through SpamAssassin to make sure they get a 0 score in it to make sure the spam gets through. Most sysadmins use the standards settings and thus the spam gets through.

No very smart to send spam that get caught by SpamAssassin.

nice amazon refferer link (0)

Anonymous Coward | more than 8 years ago | (#15694131)

lol
an Amazon spammer talking about spam
if you want to paste links to help people try them without sticking your stupid Amazon refferer code in there

Re:nice amazon refferer link (0)

Anonymous Coward | more than 8 years ago | (#15694298)

Hey Amazon troll... you realize that the link automatically puts in the referrer code when somebody who happens to be logged in searches for a title and then finds it? STFU, it's not people actually trying to get... whatever the fuck you think it is that Amazon gives them. People put the links in because it's a good website to find decent details on books/movies/music etc.

Re:The utility of newer systems (1)

AigariusDebian (721386) | more than 8 years ago | (#15694816)

The real precision of current good Bayesian filtering is close to the precission of a human filter - from 80 to 90 percent. There are newest advances in natural language processing (word sequence processing) and neural and functional text classfifcation areas (support vector machines with nonlinear kernels) that can get spam classification precision up to 99 percent. It might not be too much for spam, BUT when you transfer the same knoledge to other areas of text classification 99 percent of binary classfication precision turns into 80 precent of precision when classifying into 5 categories.

There is a lot of research in this area - I am actually doing it now.

Re:The utility of newer systems (1)

AigariusDebian (721386) | more than 8 years ago | (#15694830)

But the research in the article is pretty lame indeed - I have seen expiring Bayesian classifiers before, the only thing that I find interesting there is the use of word sequencing to reduce the feature vectors, but the paper is short of the details of automation of sequence selection which is a major reason why that process is quite underused currently.

Re:The utility of newer systems (1)

nixkuroi (569546) | more than 8 years ago | (#15696078)

Yeah, it might work for today, but Spam is only going to get worse and there's a point where traditional models won't scale. If you can get something that does the same job in fewer cycles, that implies you can scale up higher using fewer resources. Also, the methodology they're talking about here is growing organically. This probably means that it'll evolve organically, making it better with each generation. Spam fighters can't stop innovating because the spammers aren't going to.

Finally (4, Funny)

nizo (81281) | more than 8 years ago | (#15693848)

So now we can look forward to a spam filtering solution that actively searches for spammers and kills them?

Re:Finally (1)

modecx (130548) | more than 8 years ago | (#15693976)

So now we can look forward to a spam filtering solution that actively searches for spammers and kills them?

Hooah! First one to hook this up with an MLRS gets a cookie!

Death is too good for them (1)

hellfire (86129) | more than 8 years ago | (#15694052)

Any good programmer worth their salt would have programmed this to cut out their tongue, cut off their fingers one by one, slice off their eyelids and force them to watch "Biodome" 5 times in succession.

I want those fuckers to live painfully damnit, just like the rest of us do when we have too much spam.

Re:Death is too good for them (1)

RsG (809189) | more than 8 years ago | (#15694091)

Damnit that goes too far! You're a cruel human being. I wouldn't subject a dog to that level of torture, much less a human.

In the name of human rights, they should not be forced to watch Biodome any more than twice! :-P

Re:Finally (1)

kesuki (321456) | more than 8 years ago | (#15694309)

I know where most of them live, just kidding. the problem isn't that we have spammers the problem is that we have kids pretending to be spammers who just hack into legitimate spam networks to send out scams.

AIDS (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#15693853)

But can it get AIDS?

Re:AIDS (-1, Offtopic)

Anonymous Coward | more than 8 years ago | (#15693888)

His name is Chim-Chim. He likes to be called Henri, but we don't always get what we want do we?

False positives still a problem (1, Redundant)

Hungry Admin (703839) | more than 8 years ago | (#15693858)

I think this is a very useful new anti-spam tool, but as usual, it will have the possibility of false positives, which can be very damaging. And Spammers will adapt to this technology as well, reducing its effectiveness.

Re:False positives still a problem (2, Interesting)

CRCulver (715279) | more than 8 years ago | (#15694178)

And Spammers will adapt to this technology as well, reducing its effectiveness.

One wonders what sort of people have so little moral fiber that they study spam-blockers and create new methods for getting around it. Really, it would be great if Slashdot could profile one of these twisted people and show just who does it, what country they are from, what kind of upbringing they had, etc. But maybe anyone is susceptible to the temptation. Recently, while making a comment on a blog, I was thinking about just how easy it would be to automatically circumvent its arithmetic-based anti-spam question ("What is 7 + 9?"). It was like being called over to the dark side.

Re:False positives still a problem (1)

d1337 (849814) | more than 8 years ago | (#15694282)

Really, it would be great if Slashdot could profile one of these twisted people and show just who does it, what country they are from, what kind of upbringing they had, etc.
You forgot...let's get their /. username also

Re:False positives still a problem (1)

shawb (16347) | more than 8 years ago | (#15694328)

Slashdot would NEVER post a story [slashdot.org] about the sorts of sick, twisted individuals that perpetrate such sleazy tactics for profit.

(N.B: Okay, yeah, there's a difference between spyware and spam... I'd think that spyware is the worse of the two evils, though.)

Re:False positives still a problem (1)

techno-vampire (666512) | more than 8 years ago | (#15694429)

One wonders what sort of people have so little moral fiber that they study spam-blockers and create new methods for getting around it.


Simple: people who see the profit in it and don't care what people think of them. Who cares if there's a .001% reply rate when you send out tens of millions of spam per day? As long as there's a way to get money out of people with spam, there will be spam, and there will be people looking for ways to get around sny filtering program or algorythm designed.

Re:False positives still a problem (1)

stonecypher (118140) | more than 8 years ago | (#15695065)

Er, I think it's just people who don't think spam is a big deal and are amused by the several million dollars a year of revenue it generates. You act like they're organ-leggers.

The difference? (2, Insightful)

MoeMoe (659154) | more than 8 years ago | (#15693876)

Not that I'm arguing that it's the same, rather I'd like to know:

What seperates this from a Bayesian filter?

Re:The difference? (4, Insightful)

DragonWriter (970822) | more than 8 years ago | (#15693896)

What seperates this from a Bayesian filter?
If nothing else, it has new, improve buzzwords. "Artificial immune system" is so much more evocative than "Bayesian filter".

Not much (5, Informative)

jfengel (409917) | more than 8 years ago | (#15693970)

Ultimately, very little. At core, they're probably identical techniques, and if I were reviewing this as a scientific paper I'd ding them for not answering exactly that question. There are such strong parallels between the two (train them on known data, add up probabilities, cut stuff on a threshold) that I strongly suspect that they're identical.

There are useful things to be gained from a change of metaphor. For example, one difference between this and most bayesian spam filter implementations is that this explicitly incorporates a decay function. That could be useful, if a word that used to be common in spam no longer is (e.g. if I actually decided to buy a Rolex, it's no longer a strong spam indicator, whereas right now any email mentionining "Rolex" is 99.9999% certain to be spam).

You could easily modify a Bayesian filter to have time-decaying weights, but if the change in metaphor leads somebody to come up with a good insight, then perhaps this is useful. Mathematically, though, the equations look very similar.

SpamAssassin does "decay" them. (2, Informative)

khasim (1285) | more than 8 years ago | (#15694031)

Look up "bayes_expiry_max_db_size". If your database gets larger than the limit you set then the lesser used tokens are deleted.

Re:Not much (4, Interesting)

adrianbaugh (696007) | more than 8 years ago | (#15694048)

Perhaps a neat way to extend this idea would be to have the filter scan your outgoing mail, too; not to search for spam as such, but to look for changes in behaviour. Then, supposing you emailed sales@igottagetmearolex.com enquiring the price of a Rolex, the filter could modify the spam and ham probabilities of rolex. I suppose it would have to be clever enough to ignore emails sent to abuse@ addresses reporting spam and attaching the spam message, among other things I can't be bothered to think of now, but it's an idea that comes more readily from the immune system metaphor than the pure probability metaphor.

Re:Not much (0)

Anonymous Coward | more than 8 years ago | (#15694894)

Take a look at the Markovian filters, CRM114, at crm114.sourceforge.net. It's a more interesting approach and highly trainable. Unfortunately, it's not well bundled and has a lot of rough edges, but it seems to be faster to run and more effective than SpamAssassin.

Re:Not much (1)

stonecypher (118140) | more than 8 years ago | (#15695082)

The reason you're not working for a scientific paper is that you guess about techniques being identical and pooh pooh them based on said guesses. A bayesian filter is a very specific mathematical technique. This isn't actually very similar at all, other than that it's being used towards the same end.

Perhaps in the future you could know something about two algorithms before declaring them identical. Just a thought.

From their paper, nothing. (1)

khasim (1285) | more than 8 years ago | (#15694005)

They claim to be as accurate as a Bayesian process, but with fewer check items.

But from their paper, it seems that they're "tuning" their check items to the corpus of spam that they're testing against.

So of course they will use fewer check items. There are a finite number of characteristics of that corpus.

I did not see where they were using their system in a Real World environment (I may have missed it, the article was pretty painful to read). Now, if they can do as good as a fully tuned SpamAssassin system (comparable true positives, true negatives, false positives and false negatives), in a Real World environment, with fewer check items, then they MAY be on to something.

Great.... (4, Funny)

(pvb)charon (685001) | more than 8 years ago | (#15693895)

Ever heard of hay fever? Allergies? Think, people, think! charon

Re:Great.... (3, Funny)

Dannon (142147) | more than 8 years ago | (#15694109)

Thanks, now I have the mental image of a spam filter with sinus problems. Ewwww...

Re:Great.... (1)

megaditto (982598) | more than 8 years ago | (#15694408)

Good point, what the authors are doing is probably trying to score some NSF funds or something.

Arthritis, AIDS, tuberculosis, Leukemia, lupus, endometriosis, etc. Deadlier cousins of the failures of the immune system you mentioned.

What they should be modelling the next-gen spam filters on are intracellular def. mechanisms, RNAi, si/shRNA, nuclear translocation tags, etc. Which is what blacklists, senderid, etc. are copying anyways.

Fancy (4, Insightful)

roman_mir (125474) | more than 8 years ago | (#15693932)

It looks fancy but when you get down to it, all it means is that there are a number of heuristics that are combined into filters (this happens by user training.) The filters are 'weighted' and filters that are not used often enough are 'culled' (killed off.) I don't think this will be significantly better than any other Bayesian-type spam systems.

Real spam solution (3, Interesting)

Dryanta (978861) | more than 8 years ago | (#15693978)

Spam and content filtering will always be a struggle for anybody who actually utilizes email. Simply adding more logic will not solve the problem. Reporting spammers to every rbl list you can think of, and alerting forums and newsgroups of abusive ip blocks on the other hand is already doing quite nicely.

Re:Real spam solution (1)

techno-vampire (666512) | more than 8 years ago | (#15694459)

Reporting spammers to every rbl list you can think of...


Sure, for those of us with the time, knowledge and inclination to do it. Expecting Aunt Minnie to do it is unreasonable. All she cares about is keeping spam out of her inbox, and if running something like this, or SpamAssasin at the server gets rid of most of it, isn't that all she can reasonable ask for?

Re:Real spam solution (1)

tdelaney (458893) | more than 8 years ago | (#15694580)

token              spamprob       #ham    #spam
'utilizes'         0.992422       1       6140

I gave up (4, Interesting)

Scratch-O-Matic (245992) | more than 8 years ago | (#15693982)

I recently gave up on tweaking filters for myself and a few dozen people whose accounts I administer. I wrote a little script that asks for confirmation from the sender...if the sender confirms, they are added to a whitelist and will go straight through after that. I can also add addresses manually to the whitelist, and will soon be able to have wildcard (domain-wide) approved addresses. I've gotten exactly two spam in 6 weeks...both were confirmed by either a person or an autoresponder. Five years ago I never would have wanted such a blunt system...nowadays it's just the ticket.

Re:I gave up (5, Interesting)

babaloo (259815) | more than 8 years ago | (#15694220)

I understand your frustration but I was the victim of a Joe Job attack and systems like you describe just add to the pain of the victim. I feel that these types of responses are just as unwelcome as spam and I report them as such. Have you had any issues like this?

Re:I gave up (4, Interesting)

CFrankBernard (605994) | more than 8 years ago | (#15694337)

I recommend joining the SPAM-L mailing list of 900+ email admins and ask for opinions on "challenge response" (C/R) spam fighting systems. Sending a confirmation message to the alleged/purported sending address *is* spam when it is spoofed/forged (quite common). The only way to ensure sending info back to the connecting email server is to do so /during/ the SMTP conversation.

Re:I gave up (4, Insightful)

rudedog (7339) | more than 8 years ago | (#15694810)

So it appears that you decided that the responsibility for fighting your spam should be moved onto the backs of everybody else on the Internet? Spam almost always comes from a forged sender. By doing this, you're just sending tons of spam to the forgery victims. Please do us and you a favor and google "challenge response harmful", and then turn off your C/R system.

More of the same; not a solution (3, Interesting)

mrheckman (939480) | more than 8 years ago | (#15693998)

The "immune system" solution is just another way to detect spam, but it is unlikely to be much more successful than existing methods. As someone else pointed out, SpamAssasin is pretty good already. So what if this new type of filter eventually improves the spam filtering accuracy from 98% to 99%? A more highly-polished rock is still a rock.

The real problem is the sending of spam itself, and that problem arrises from an inability to correctly attribute the spam to the spammers. If we can do that, we can block it, or at least better convict the spammers who violate the law. Things that solve this problem, like Yahoo!'s "DomainKeys", are the future of anti-spam, not more highly-polished rocks.

Re:More of the same; not a solution (2, Interesting)

Mean Variance (913229) | more than 8 years ago | (#15694139)

Things that solve this problem, like Yahoo!'s "DomainKeys", are the future of anti-spam, not more highly-polished rocks.

Domain Keys, at least to this point is utter crap in my experience. I get these small floods of spam into my Yahoo! mailbox. What most of them have in common is they are certified by Domain Keys. A couple months ago, I was getting the exact same spam every day for some mortgage coming from different addresses. All were DK certified.

For what it's worth, I do send off those specific emails to the abuse alias at Yahoo! Their canned emails state that they have dealt with the problem according to their TOS.

I don't know where the flaw lies, but it's there in Domain Keys.

Re:More of the same; not a solution (1)

mrbobjoe (830606) | more than 8 years ago | (#15694474)

So what if this new type of filter eventually improves the spam filtering accuracy from 98% to 99%?
Halving the number of errors? Sounds like a good deal.

Re:More of the same; not a solution (1)

mrheckman (939480) | more than 8 years ago | (#15694518)

Halving the number of errors is good, but that wouldn't stop my problems with spam. My chief objection to spam now is that there are still too many false positives -- things that show up in the spam box that should not -- so I still have to look through all of the hundreds of spam messages that arrive every day to find the few that are misclassified. Even cutting the number of false positives in half won't solve that problem. If, however, we could eliminate most of the spam, then I would have many fewer false positives and many fewer real spam messages to have to look through to find the false positives (I don't think we will ever eliminate false positives. Some people just label as "spam" anything they don't like. In systems that depend on feedback from users, such as Yahoo mail, that means that one person's valuable message is another's spam, which results in messages from one source sometimes being labeled as spam and other times not.)

Also consider that the net bandwidth is flooded with spam, which slows things down for everyone. Improving the filtering at your mailbox doesn't help with this either.

Re:More of the same; not a solution (1)

Antique Geekmeister (740220) | more than 8 years ago | (#15695055)

In fact, such keys are currently strong signs that the ad is, in fact, spam. They're far too easy to buy or steal from other people's machines, often by installing spam zombie software on the machines of unsuspecting and innocent people.

fgdfg (0)

Anonymous Coward | more than 8 years ago | (#15693999)

oh snapz terminator coming soon D:

Obligatory HIV & AIDS reference (1)

ElliotLee (713376) | more than 8 years ago | (#15694000)

Now your spam filter can catch AIDS too. But don't ask how.

Re:Obligatory HIV & AIDS reference (0)

Anonymous Coward | more than 8 years ago | (#15694118)

Sometimes you just need the honesty and security of a whore.

Re:Obligatory HIV & AIDS reference (0)

Anonymous Coward | more than 8 years ago | (#15694422)

Sometimes you just need the honesty and security of a whore.
Well, my spam filter has acheived the honesty and security of a politician. God willing, one day it will reach that of a lawyer. And some future generation may indeed be able to achieve the fabled honesty and security level of a whore, but I won't guarantee it. :-P

Re:Obligatory HIV & AIDS reference (1)

stonecypher (118140) | more than 8 years ago | (#15695442)

One supposes it's from the porn and penis enlarging cream, though one is led to wonder whether smacking the monkey for an iPod is a disease vector.

I'm waiting... (1)

darkrowan (976992) | more than 8 years ago | (#15694002)

I'm waiting for the day when we see our first email 'virus'. Something not unlike what happens with real viruses. Then we'd need antibodies similar to this.

Re:I'm waiting... (1)

stonecypher (118140) | more than 8 years ago | (#15695098)

Yeah, hi. 1992 is on the phone. They said you need to shut off the portal, because their power bill is stratospheric. (Either that, or this is the subtlest Keanu Reeves [imdb.com] joke 3v4r.)

Re:I'm waiting... (1)

not-admin (943926) | more than 8 years ago | (#15695200)

You won't have to wait long, nay, you won't have to wait at all.

There have been e-mail "virii" around for a long time, one of the most famous being the Bill Gates Quick Cash [wired.com] . Don't think that all viruses require an attachment.

Useless -- solves a non-problem (performance) (2, Interesting)

CurtMonash (986884) | more than 8 years ago | (#15694018)

I have two major objections to this idea, and to the article that presents it.

1. The ONLY problem this solves is performance -- i.e., processing throughput. And that's not what's wrong with anti-spam systems today. They live and die on the precision/accuracy tradeoff, or maybe on UI.

2. The authors seem to assume that Bayesian systems work really, really well. While technically most or all current spam-filtering products are Bayesian in some sense, that still speaks of considerable naivete about real-world spam.

The easiest way to eliminate most spam ..... (2, Insightful)

travisco_nabisco (817002) | more than 8 years ago | (#15694020)

I just had a thought while reading about the spam filters about spelling. So I went and looked in my spam folder and found that every piece of spam has many, many words that are not in a dictionary, ie not spelled correctly.

Why not run a script that filters messages based on spelling? If there are more than 'xx' many words that do not exist in the dictionary you choose to use, then the message gets sent to the spam folder. This would catch the odd e-mail from friends who don't know how to spell or what a spell checker is, but then when you clean out your spam folder you should notice it.

Re:The easiest way to eliminate most spam ..... (1)

Cisko Kid (987514) | more than 8 years ago | (#15694180)

Because I suck at spelling and many people I know suck at spelling. Hoked on fonix werked fer me.

Seriously, the spammers will adapt no matter when anti-spam tactics you use.

Re:The easiest way to eliminate most spam ..... (1)

Senzei (791599) | more than 8 years ago | (#15694233)

Generally techniques like that are not used because false positives are much more disasterous than false negatives. Accidentally allowing a couple of spam messages to creep into the regular mail is not so big of a deal; deleting a reply asking for a job interview because it was miscategorized is. Most spam detection systems have to walk a fine line between doing their job and not hosing somebody's mail. That said the systems could be set up so that misspellings add weight towards the decision to categorize as spam.

Re:The easiest way to eliminate most spam ..... (2, Insightful)

dhasenan (758719) | more than 8 years ago | (#15694571)

Do you actually WANT to interview a job applicant who can't spell 20 words in a 150-word email?

Re:The easiest way to eliminate most spam ..... (1)

CFrankBernard (605994) | more than 8 years ago | (#15694371)

To avoid false positives, I recommend using a regex generator for spamvertized variations of common spam terms.
See http://public.kvalley.com/regex/regex.asp [kvalley.com]
Fore example, to allow viagra but detect most of its spamvertized variations:
(?!viagra)(([v])|(\\\W{0,2}\/))[i1l\|\\\/!îíìï:;]( ([a@àáâãäå^æ])|(\/\W{0,2}\\))[gqp96][r](([a@àáâãäå ^æ])|(\/\W{0,2}\\))

Re:The easiest way to eliminate most spam ..... (1)

cyber-dragon.net (899244) | more than 8 years ago | (#15694563)

An interesting idea... but you would need to allow for multiple dictionaries. I commonly get e-mail in english, american, french and japanese every day. And before anyone flames me I -do- make a distinction between english and american. They are spelled and pronounced differently so when discussing dictionaries they ARE different.

As another responder pointed out... perhaps this could be used in some form of "weight" calculation. I would think counting special characters and individual characters ( barring I and A ) would hold just as much "weight" however.

The proposed system I liked better was requiring mail servers to be "registered" and any email being received would check it's claimed registration against the IP it came from. Thus any email being sent via bot from a dsl line is automatically thrown out. If it is "legit" spam you have a record in the header of who sent it and can track them down.

Modelling Nature (3, Interesting)

A Dafa Disciple (876967) | more than 8 years ago | (#15694022)

Your post advocates a

(x) technical ( ) legislative ( ) market-based ( ) vigilante

approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
(x) An enormous amount of spam will initially go undetected before your idea is effective
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
(x) Your idea proposes a solution that only large corporations could deploy
( ) Requires too much cooperation from spammers
( ) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business

Specifically, your plan fails to account for

( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
( ) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
(x) The large amount of resources needed for implementation of your idea that small companies don't have
( ) Outlook

and the following philosophical objections may also apply:

( ) Ideas similar to yours are easy to come up with, yet none have ever been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
(x) Your solution is nothing more than a conceptual remanifestation of a solution that already exists
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough

Furthermore, this is what I think about you:

(x) I think it is a creative concept, but there is no need to reinvent the wheel.
( ) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!

Re:Modelling Nature (0)

Anonymous Coward | more than 8 years ago | (#15694240)

In what way is the parent post attempting to troll? (i.e. illicit discussion to take the conversation offtopic or engage in flamewars, etc.) Get your facts [wikipedia.org] straight you moderating morons!

BTW, so you don't screw this up, this post should be modded (-1 Offtopic).

Re:Modelling Nature (0)

Anonymous Coward | more than 8 years ago | (#15694401)

The post is technically on topic, so offtopic would be dead wrong.

This post does, however, fit into the standard definition of crapflooding [wikipedia.org] which is considered by most people to be a form of trolling.

I would have also accepted a moderation of Redundant, as you get a couple of these every single time spam is mentioned in a slashdot article.

An attempt to "engage in a flamewar" would be classified more directly as "Flamebait" which is, in itself, a form of trolling.

Re:Modelling Nature (0)

Anonymous Coward | more than 8 years ago | (#15694750)

The first AC's reply didn't say that the parent post was offtopic, it said that it, itself, was offtopic, a misunderstanding on your part I think (not that it matters). I also think that all think the original post in question isn't redundant because the scope of a redundancy should be limited to the article it was posted as a reply to. No one yet had posted one of those types of messages for this article.

Re:Modelling Nature (1)

mrheckman (939480) | more than 8 years ago | (#15694477)

Furthermore, this is what I think about you:

(x) Brilliant!
( ) I think it is a creative concept, but there is no need to reinvent the wheel.
( ) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!

Re:Modelling Nature (0)

Anonymous Coward | more than 8 years ago | (#15694719)

> (x) Brilliant!

Paula, is that you?

^^ Mod Parent Up! (1)

InakaBoyJoe (687694) | more than 8 years ago | (#15694706)

Mod parent up. That was an awesome post.

And kind of ironic that the author slipped in some unsolicited politically motivated PR on the Falun Gong as part of his/her message.

Re:^^ Mod Parent Up! (0)

Anonymous Coward | more than 8 years ago | (#15694899)

That's not ironic [wikiquote.org] .

In any event, if the poster practices Falun Dafa/Gong then that is their business, and if they would like to tell everyone that it isn't an evil cult as the Chinese authorities propagandize it to be [wikipedia.org] and that it, instead, is a beneficial spiritual practice, then that is that poster's perogative. Unlike China, the United States is a free country and posting that in a forum isn't going to result in the poster getting killed [clearwisdom.net] or tortured [faluninfo.net] .

Besides, that quip wasn't part of the message, that was part of the poster's signature.

Re:^^ Mod Parent Up! (0)

Anonymous Coward | more than 8 years ago | (#15695087)

It's a pretty old canned joke. Google it.

Still addressing the symptom, not the root (1)

Lead Butthead (321013) | more than 8 years ago | (#15694028)

Inflict heavy fine on people buying spamvertised products and execute spammers. Only then can spam be stopped for good.

TOANTFOITOWTBS (1)

spun (1352) | more than 8 years ago | (#15694096)

Take off and nuke them from orbit. It's the only way to be sure.

Re:TOANTFOITOWTBS (1)

Antique Geekmeister (740220) | more than 8 years ago | (#15695099)

That's what finally knocked Cyberpromo off the air: not the lawsuits from other abused companies, not the out-of-court settlements they made with AOL and other victims of their spam, not the incensed public, but a bunch of irritated script kiddies who knocked down the router connection sold to them by Agis and kept it off the air.

Eventually the peasants will revolt.

Abysmal results (4, Interesting)

gvc (167165) | more than 8 years ago | (#15694033)

More specifically, it correctly classifies 84% of spam and 98% of non-spam.

The authors used the SpamAssassin corpus. Holden shows that, on the Spamassasin corpus, Bogofilter correctly classifies 90.3% of spam and 99.88% of non-spam. See http://sam.holden.id.au/writings/spam2/ [holden.id.au]

This approach is nowhere near state of the art.

Clone (0)

Anonymous Coward | more than 8 years ago | (#15694074)

Sounds like a genetically modified clone of Bayes :-)

Sounds cool, but... (1)

fm6 (162816) | more than 8 years ago | (#15694079)

Has anybody stopped to think that the human immune system is a little less than perfect? It doesn't stop all diseases, not by a long shot. And sometimes it creates illness, as anybody with Hay Fever — or Multiple Sclerosis — will testify.

no more biological metaphors.... (5, Insightful)

illuminatedwax (537131) | more than 8 years ago | (#15694125)

I'm seriously sick of people abusing biological methodolgies. People seem very attracted to ideas simply because they are grounded in "how nature works" and ignore the mathematical benefits or weaknesses. Now this idea pretty much just sounds like statistical rules based on a corpus - pretty much how every successful solution out there now works. This solution simply prunes rules that aren't being used, but there are better ways to get a smaller spam detection database. Have you seen the stuff the CRM114 people are doing? [sourceforge.net] This is nothing new.

Read your Russell and Norvig, people. Airplane research didn't get off the ground (ugh) until we stopped trying to mimic birds and study physical principles of flight.

Re:no more biological metaphors.... (1)

EMiniShark (631279) | more than 8 years ago | (#15694555)

Read your Holland and Koza. Evolutionary computing (and others: Neural nets, Cellular Automata, ...) have a wide array of successful applications. Dismissing this just work because it's biologically inspired is inappropriate and counter-productive to science.

And just so you know, the AIS community is absolutely not ignoring fundamental questions of complexity and mathematical weaknesses. I met one of the authors at ICARIS 05, and her presentation of this work was cautious, qualified, and thorough.

It's reasonable to argue that the spam filtering mechanics of this work aren't novel. But to attack the practice of biologically-inspired computing because of this paper is just overreacting.

Re:no more biological metaphors.... (1)

illuminatedwax (537131) | more than 8 years ago | (#15695091)

There are a million other biological ideas we could borrow, and other biological ideas we could borrow in radically different ways, but we don't because they don't work. Those ideas that do work may have been inspired by biological phenomena, but other than that they do little better than provide a good analogy. In this case, they aren't doing anything different and it is only considered interesting because they thought of a good analogy for it. Nothing works because it is based on biological phenomena. I realize that much of AI research is well grounded in mathematical theory (heck, my master's advisor does stuff with COLT and the like), but many students, Slashdotters, and a few researchers still have a romantic kind of view of "intelligence" and "living computers" or whatever. So basically my comment was directed more at Slashdot than the research group. Hell, I didn't RTFA, so they could be doing Serious Research.

The flip side to what you are saying of course is that accepting a work just because it is biologically inspired is also inappropriate and counter-productive to science.

Re:no more biological metaphors.... (1)

Illserve (56215) | more than 8 years ago | (#15695191)

Normally I would agree with you. A great deal of crappy research gets hyped up because of an inappropriate analogy to biology... but this isn't one of them.

Stopping "spam" is almost exactly the problem that our immune system has to deal with. It has to go through reams of data (i.e. every cell in your body) and figure out what is junk and what isn't, and it does this by learning through exposure positive and negative examples. It's not perfect either, sometimes it goes berzerk, producing false positives (autoimmune disorders).

There's a great deal to be learned from our immune system for the sake of solving the spam problem. Don't be so quick to dismiss it.... this time.

 

Re:no more biological metaphors.... (1)

illuminatedwax (537131) | more than 8 years ago | (#15695777)

Excellent analogy, but that's all there is. It might be inspiring, but this time the idea wasn't originally inspired by biology. These methods of filtering spam have been around for a long time.

In any case, the basic idea is simple: use a corpus of examples separated into classes to create an algorithm to decide if a new example is in a certain category. There are million AI techniques to do this. What differs in each case are the details of what each part means.

The immune system analogy is flawed in its details anyway. For example, in the human body, antibodies are created more like a genetic algorithm: there are a few families of them and they recombine randomly and float around the body. (Those harmful to the body are never let out.) Those that find a matching host invader protien are then made to reproduce. Should we implement a similar strategy for spam? Probably not, or at least my intuition says that this method does not work as well with spam as some of the very very successful strategies that we have now, especially most of these converge very quickly. GA-style seems like it would take a long time.

A new range of spam (1)

Wierdy1024 (902573) | more than 8 years ago | (#15694244)

Has anyone come across the newer spam ideas, where the spam message looks so much like a real message, I can sometimes have to spend a good few minutes looking at it to see if it's genuine - they use your nickname - eg. "Dear Bob", and end with the name of someone you know. They are usually about mundane things (eg. "do you want to come to a party on saturday?"), and the emails make good sense and have a suitable subject line. The only giveaway is that they all have a tinyURL link to the actual spam site - but how can I tell if a spammer is using tinyURL of if a friend of mine is using tinyurl? The annoying thing is each email has a unique tinyurl, so by clicking on the link they know it's an active address - and I made the mistake of clicking on the first oine I got.

One thing that concerns me is how certain fields are filled in, for example my nickname and a friends name at the bottom. Also, it seems to sometimes use my geographic location (nearest city - presumably from IP location) - eg. "Meet tomorrow in London, UK." I suspect the fields are filled in by some spyware on the pc reading previous emails and analysing them - All these emails appear on my vmware spyware/virus test machine. It's also possible the fields could be filled in by a hack of someone elses mailbox (mail server or PC), because as soon as they've got a mailbox full of email (including headers), they can auto-analyse it to find out nicknames etc. fairly reliably with a decent amount of mail.

Re:A new range of spam (1)

gvc (167165) | more than 8 years ago | (#15694361)

No I haven't. Unless you think I can't tell what's below from correspondence from somebody I know.

--

Hello .

I think we had correspondence a long time ago if it was not you I am sorry.
If it was I could not answer you because my Mozilla mail manager was down for a
long time and I could not fix it only with my friend's help I got the emails
address out for me ..:)
I hope it was whom we were corresponded with you are still interested, as I am,
though I realize much time has passed since then...
I really don't know where to start ....
Maybe you could tell me a little about yourself since I lost our early letters,
your appearance,age , hobbies, and are you still in the search?
If it was you I wrote to and you are interested to get to know me better, I have
a profile at :
http://www.im-waiting-4you.net/ [im-waiting-4you.net]

Don't really know what else to say for now I hope this is the right address

Let me know if you are interested, And I hope
you won't run when you see my picture :-)

talk to you soon.....

Galinka

Re:A new range of spam (1)

Antique Geekmeister (740220) | more than 8 years ago | (#15694908)

Many emails like that do not actually contain an ad or commercial message: they're email address probes, being sent by the million to gather email addresses, and often with a webbug (a one-pixel GIF in a URL) to track exactly which email address's HTML-reading client received the message.

Those valid email addresses are themselves highly saleable to spam companies, whether the company is even vaguely legitimate or not.

This article was published in 2004 (0)

Anonymous Coward | more than 8 years ago | (#15694345)

How is this even close to news?!

The first paragraph of TFA, even above the abstract:

"This article was published in Crossroads Magazine, November 2004 edition. It was supposed to be on their website, but since it no longer seems to be available, I have provided this copy for reference."

No wonder it's not even near the "state of the art", maybe it was.. back then.

/ AC

sounds like something he would say (1)

bersl2 (689221) | more than 8 years ago | (#15694368)

from the lymp0cty3z-narf-poit!-claire-said-the-laundry-whee l dept.

Pinky, if I could reach you I would hurt you.

"News" from 2004? (1)

44BSD (701309) | more than 8 years ago | (#15694650)

Come on, guys.

Are we still doing this? (1, Insightful)

Anonymous Coward | more than 8 years ago | (#15694919)

Are we still on the message-filtering bandwagon? I know it was all the rage when we talked about it in 2000, but now it's 2006, and we've all had experience with it. Pattern-matching has been defeated, and it was an embarassing defeat. This is usually a sign to those who proposed it that they should consider a career change. With the exception of those patterns that correspond to firewall rules blocking domains run by companies with names like "Megaultra Webcram Holdings, Inc", it's a dead issue.

The real issue I have is with those researchers and businesses that to continue to push this cyber snakeoil. It's getting to the point that e-mail is worthless, not because of the high volume of spam, but because easy-confused pattern-matching blockers remove just enough messages to cause major problems for the rest of us. Here is why it's stupid, and should be stopped:

* While contaminated pattern-matching filters don't always block wanted messages, they remove just enough messages to cause doubt and frustration with my users, and those on the other end of the loop. This leads to network administrator (me) having to individually resolve each problem by sifting through the logs.

* Because the matched-messages are removed on the far end of the transaction, i.e. on the "client side", there's no indication of trouble, or even an error message (to the user or in the logs). Neither party understands where the message has gone, and this reinforces superstition. For years, I whined, teased and scolded to get the attention of the morons who were going gung-ho with client-end filtering for spam and viruses, but they just wouldn't listen.

* ISPs and other service providers have deployed these infernal filters everywhere, making a huge mess which I cannot resolve. It is next to impossible to politely explain the problem is theirs, without having their attention tossed amid a sea of techie jargon. They usually come away with the message, "it is your fault, not ours". I'm fed up dealing with the hostile confrontations that result.

I have a sneaking suspicion that the same morons who thought spam/virus filtering based on pattern-matching the 'From' line was brilliant are the same idiots responsible for the current crop of "security" dud-ware. Do I sound hostile? I am, and these charlatans can go shove it. At this point, I think only the "homeopathic remedy" market has more frauds than the computer industry.

I'm sorry, no matter how graceful the descriptions or the analogies, I will no longer accept content-based pattern-matching filters on e-mail. They have been proven horribly ineffective. Spam-filtering isn't rocket science, okay? First you block any SMTP traffic without a zone pointer, then block large chunks of addresses from underdeveloped countries based on message header sampling. From there, build up a list of UK, US, and Canadian spam-pushers based on their domain registrations. You'll eliminate most of it, and unless you communicate extensively with people in China, Bolivia, Russia or Brazil, you won't have to do much tuning.

This is all incredibly stupid anyway. The solution to the spam problem is not a technological one, or a political one. It's an economic problem. The powers that be chose - in their infinite wisdom - to allocate huge blocks of addresses to largely underdeveloped nations based on populace, instead of demand. Most of these people don't have a network device, and won't have one in the foreseeable future. The value of these addresses is so ridiculously deflated, that they're worth close to nothing. Spammers have massive chunks of address space, and can cycle through millions of IPs before all of them are at risk of being blocked. Want it to stop? Charge a reasonable rate to pass the traffic through your country's network backbone.

Re:Are we still doing this? (1)

illuminatedwax (537131) | more than 8 years ago | (#15695796)

My Gmail account has a success rate of about 2/1000 or 99.8% success rate. My Thunderbird email has a similar success rate. Speak for yourself, buddy, statistical filtering works.

Re:Are we still doing this? (0)

Anonymous Coward | more than 8 years ago | (#15695942)

Actually, I'm speaking for myself and a couple hundred users... and a couple thousand recipients. I don't know about GMail, but the top of my sh*tlist is populated by Hotmail, Yahoo Mail, Thunderbird and Outlook Express. Either Hotmail or T-Bird is the worst, I can't decide which. It only took a little while for a properly "trained" T-Bird filter to get contaminated, and junk 17 legitimate (and highly important) messages inside of a workweek.

Connection blocking has given me much greater gains. It doesn't need special functionality from an e-mail client, it doesn't require user configuration, and it doesn't require "training" individual client installations every time the nature of the spam changes. By rejecting connections outright, it uses almost no network bandwidth, and no storage space on the server or the client machines.

Apparently, the type of spam you receive and your habits are not the same as those belonging to the typical shlub with a PC on his desk.

Re:Are we still doing this? (1)

illuminatedwax (537131) | more than 8 years ago | (#15696024)

OK, well, Hotmail blows. But that just means they are doing it wrong. (Plus, the conspiracy theorist in me says they don't want to filter spam.)

They aren't just doing pattern matching; it's more sophisticated than that. It is also adaptive. As Paul Graham said, you can defeat spammers this way because they rely on their message. Email clients can do whitelisting techniques to reduce or eliminate false negatives as well as other things. This can all be done behind the scenes, with user interaction limited to the initial training of spam and the discovery of false positives. We have the technology! No, filtering hasn't been defeated nearly as far as I can see.

It works for my mother and all of her employees perfectly. A few questions: How are you training your Thunderbird install? What do you mean "contaminated?" And why the hell would you *delete* filtered spam immediately? The idea is to save that spam for a while (30 days is good) in a "Trash" or "Recycle Bin"(patent pending) just in case one gets through. Someone notifies you that you aren't responding, you dig it up, classify it, and your filter gets better. But if you spend long enough with a spam filter, filtering it correctly, you will generally not get false negatives.

There is a price to smart filtering: you have to spend time with it to train it correctly. If you train it wrong, you've got a huge problem on your hands. I've said it elsewhere in the comments: look at CRM114 [sourceforge.net] to see how good this kind of filtering has become, and how quickly you don't have to worry about it. But I personally have never lost an important, critical email with Thunderbird or Gmail. Neither have I heard a single complaint about spam from any Gmail user.

I do however, agree with you that ISPs should not be filtering your spam for you. That just gets annoying. But rejecting spam from IP addresses is an idea that can only go so far: like you said, spammers have huge swaths of IP addresses, sometimes ones that are used by legit emailers. IPv6 is coming, which means even more IP addresses for you to block. Personally, I think client-side filtering is quickly becoming the superior spam solution - look at that "SPAM solution checklist" that someone else posted.

Personally, it's been a long time since I worried about spam.

Augment this "immune system" with some (1)

ScrewMaster (602015) | more than 8 years ago | (#15694996)

.45 caliber penicillin, applied directly to the spammer's kneecaps.

junk science (1)

m874t232 (973431) | more than 8 years ago | (#15695143)

The idea of applying immune system models to spam and computer virus detection is old. Nobody has so far demonstrated that it is any better than a sound statistical approach, and this paper fails to do so as well. It's junk science.

Immune System Attacking Spammers (3, Interesting)

cyberscan (676092) | more than 8 years ago | (#15695310)

Here is a better Idea: Blue Security was attacked and shut down because the Internet is septic. The germs (spammers) have taken over. The best way to win this is to take the profit out of spamming. This can be done in a similar manner in which the body's t cells alert the rest of an immune system on how to attack a pathogen. A cryptographically signed spammer complaint (attack) file should be distributed via a peer to peer network protocol. This file is sent amongst complaining programs that complain to a spammer's website each time a spam advertising said website is received.

Like an immune system, this network of spam attack programs will have a t-cell. The "t-cells" will be a small group of people who draw up the complaint instruction file. Whenever the pathogen (spammer) releases enough toxins (spam) into the body (Internet), the T-cells (people who write the complaint instruction file) alert the immune cells (spam complaint program) of the presence of the pathogen and how to attack (complain to website advertised) it. The pathogen is overwhelmed with a quick immuno responce (high bandwidth usage resulting from many, many complaints).

When the cost of running a website surpasses the revenue earned from said website, the website is shut down. When the costs of spamming or advertising via spam exceeds the income, spam stops. Blue Security was beginning to become successful. Too bad they bowed out.

So, I was thinking (1)

ratboy666 (104074) | more than 8 years ago | (#15695718)

How about a REAL IMMUNE SYSTEM anti-spam filter? I had a dream...

Here's how it works. I catch me a SPAMMER, and have it tested. IFF it is alergic to a common item (ragweed, peanuts, shellfish, etc.). I keep it in the sub-basement. Otherwsie, I release it back to the wild and catch me another.

Once SPAMMER is aquired, I put it in a chair, and provide food and water. SPAMMER is given computer, internet access, and is also attach to an allergen device that delivers the substance SPAMMER is allergic to, in controllable quantities.

SPAMMER is given control of the COMPUTER INCOMING SPAM FILTER, and allowed to freely hack on the internet.

If SPAM is delivered, and identified by my userbase, the ALLERGEN DEVICE is activated, releasing a quantity of the ALLERGEN. If a period of time (settable) goes by WITHOUT identified SPAM, the ALLERGEN DEVICE is disabled, with a random delay in the system.

If the SPAMMER is able to capture two additional SPAMMERs, it is removed from service.

Ratboy

Give Them What They Want..... (1)

IHC Navistar (967161) | more than 8 years ago | (#15695952)

Someone should set up an organization where a panel reviews submitted spam emails, and when an email is identified as spam, a program is activated that sends massive quantities of replies, essentially a DoS, to the spammer's computer. After getting bombarded with thousands of requests (that is what they wanted, right?) the hosting server will eventually crash and shut down. How can they complain when you gave them what they wanted? ----- Sig Sauer

Greylisting? (0)

Anonymous Coward | more than 8 years ago | (#15696059)

This is a general question, how does a well configured Spam filter compared to a simple grey listing?

Haven't been able to find any nice graphs that show a direct comparison.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>