Spam Detection Using an Artificial Immune System 114
rangeva writes "As anti-spam solutions evolve to limit junk email, the senders quickly adapt to make sure their messages are seen. an interesting article describes the application of an artificial immune system model to effectively protect email users from unwanted messages. In particular, it tests a spam immune system against the publicly available SpamAssassin corpus of spam and non-spam. It does so by classifying email messages with the detectors produced by the immune system. The resulting system classifies the messages with accuracy similar to that of other spam filters, but it does so with fewer detectors."
Comment removed (Score:4, Informative)
Re:The utility of newer systems (Score:5, Insightful)
But seriously, your attitude is one that would stop all progress. This new method does the job more efficiently.
From TFA, The lightweight nature of this solution -- requiring significantly smaller number of detectors when compared to SpamAssassin -- will doubtlessly prove attractive to those looking to implement a server-based solution where processing overhead may well be an issue. A server-based solution would be a one-size-fits-all mold since the filter is not personalized and does not learn for each particular user, but the reduced processing and storage time makes such a solution attractive.
That sounds like a good reason for this research.
Re:The utility of newer systems (Score:2)
I guess for me the question is... "what KIND of efficiency are we talking about here? Simplicity for the CPU? Simplicity f
Re:The utility of newer systems (Score:1)
Of course servers are getting faster all the time, but the whole point of computer science is to make things work more efficiently regardless of the actual hardware it runs on.
Re:The utility of newer systems (Score:2)
Re:The utility of newer systems (Score:1)
Re:The utility of newer systems (Score:3, Interesting)
No very smart to send spam that get caught by SpamAssassin.
Re:The utility of newer systems (Score:2)
Re:The utility of newer systems (Score:2)
Re:The utility of newer systems (Score:1)
Finally (Score:5, Funny)
Re:Finally (Score:1)
Hooah! First one to hook this up with an MLRS gets a cookie!
Death is too good for them (Score:2)
I want those fuckers to live painfully damnit, just like the rest of us do when we have too much spam.
Re:Death is too good for them (Score:2)
In the name of human rights, they should not be forced to watch Biodome any more than twice!
Re:Finally (Score:2)
Re:Finally (Score:1)
http://mirror12.escomposlinux.org/comic/ecol-205-
False positives still a problem (Score:1, Redundant)
Re: (Score:3, Interesting)
Re:False positives still a problem (Score:1)
Re:False positives still a problem (Score:2)
(N.B: Okay, yeah, there's a difference between spyware and spam... I'd think that spyware is the worse of the two evils, though.)
Re:False positives still a problem (Score:2)
Simple: people who see the profit in it and don't care what people think of them. Who cares if there's a .001% reply rate when you send out tens of millions of spam per day? As long as there's a way to get money out of people with spam, there will be spam, and there will be people looking for ways to get around sny filtering program or algorythm designed.
Re:False positives still a problem (Score:2)
The difference? (Score:3, Insightful)
What seperates this from a Bayesian filter?
Re:The difference? (Score:5, Insightful)
Not much (Score:5, Informative)
There are useful things to be gained from a change of metaphor. For example, one difference between this and most bayesian spam filter implementations is that this explicitly incorporates a decay function. That could be useful, if a word that used to be common in spam no longer is (e.g. if I actually decided to buy a Rolex, it's no longer a strong spam indicator, whereas right now any email mentionining "Rolex" is 99.9999% certain to be spam).
You could easily modify a Bayesian filter to have time-decaying weights, but if the change in metaphor leads somebody to come up with a good insight, then perhaps this is useful. Mathematically, though, the equations look very similar.
SpamAssassin does "decay" them. (Score:3, Informative)
Re:Not much (Score:5, Interesting)
Re:Not much (Score:2)
Perhaps in the future you could know something about two algorithms before declaring them identical. Just a thought.
Re:Not much (Score:1, Informative)
Here, let me clarify the differences for you. The primary difference is in the nature of the tokens used to classify a message. The Bayesian system has words/tokens that are either predefined by a human or taken from messages verbatim. The artificial immune system has tokens that are randomly and automatically generated by the system using s
From their paper, nothing. (Score:2)
But from their paper, it seems that they're "tuning" their check items to the corpus of spam that they're testing against.
So of course they will use fewer check items. There are a finite number of characteristics of that corpus.
I did not see where they were using their system in a Real World environment (I may have missed it, the article was pretty painful to read). Now, if they can do as good as a fully tuned SpamAssassin system
Great.... (Score:4, Funny)
Re:Great.... (Score:4, Funny)
Re:Great.... (Score:1)
charon
Re:Great.... (Score:1)
Arthritis, AIDS, tuberculosis, Leukemia, lupus, endometriosis, etc. Deadlier cousins of the failures of the immune system you mentioned.
What they should be modelling the next-gen spam filters on are intracellular def. mechanisms, RNAi, si/shRNA, nuclear translocation tags, etc. Which is what blacklists, senderid, etc. are copying anyways.
Re:Great.... (Score:2)
Greeeat I can see it now...
Doctor: Do you have any allergies to medication?
you: No, But my computer has developed an allergy to Viagra, Cialis, and is also sensitive to weight loss pills. not to mention the keyboard seems to have grown several inches in length.
Fancy (Score:5, Insightful)
Real spam solution (Score:3, Interesting)
Re:Real spam solution (Score:2)
Sure, for those of us with the time, knowledge and inclination to do it. Expecting Aunt Minnie to do it is unreasonable. All she cares about is keeping spam out of her inbox, and if running something like this, or SpamAssasin at the server gets rid of most of it, isn't that all she can reasonable ask for?
Re:Real spam solution (Score:2)
'utilizes' 0.992422 1 6140
I gave up (Score:5, Interesting)
Re:I gave up (Score:5, Interesting)
Re:I gave up (Score:4, Interesting)
Re:I gave up (Score:5, Insightful)
More of the same; not a solution (Score:3, Interesting)
The real problem is the sending of spam itself, and that problem arrises from an inability to correctly attribute the spam to the spammers. If we can do that, we can block it, or at least better convict the spammers who violate the law. Things that solve this problem, like Yahoo!'s "DomainKeys", are the future of anti-spam, not more highly-polished rocks.
Re:More of the same; not a solution (Score:2, Interesting)
Domain Keys, at least to this point is utter crap in my experience. I get these small floods of spam into my Yahoo! mailbox. What most of them have in common is they are certified by Domain Keys. A couple months ago, I was getting the exact same spam every day for some mortgage coming from different addresses. All were DK certified.
For what it's worth, I do send off those specific emai
Re:More of the same; not a solution (Score:1)
Re:More of the same; not a solution (Score:1)
Re:More of the same; not a solution (Score:2)
Obligatory HIV & AIDS reference (Score:1)
Re:Obligatory HIV & AIDS reference (Score:2)
I'm waiting... (Score:1)
Re:I'm waiting... (Score:2)
Re:I'm waiting... (Score:1)
There have been e-mail "virii" around for a long time, one of the most famous being the Bill Gates Quick Cash [wired.com]. Don't think that all viruses require an attachment.
Useless -- solves a non-problem (performance) (Score:2, Interesting)
1. The ONLY problem this solves is performance -- i.e., processing throughput. And that's not what's wrong with anti-spam systems today. They live and die on the precision/accuracy tradeoff, or maybe on UI.
2. The authors seem to assume that Bayesian systems work really, really well. While technically most or all current spam-filtering products are Bayesian in some sense, that still speaks of considerable naivete about real-w
The easiest way to eliminate most spam ..... (Score:2, Insightful)
Why not run a script that filters messages based on spelling? If there are more than 'xx' many words that do not exist in the dictionary you choose to use, then the message gets sent to the spam folder. This would catch the odd e-mail from friends who don't know how to spell or what a
Re:The easiest way to eliminate most spam ..... (Score:2)
They 'cannot' beat the filtering I use now...
Not long ago, I added a form of rbl support to a personal copy of My homebrew Windows email client freebie [rapidshare.de] and the results were 'amazing'....
Essentially NO spam gets through now!
Recently, one got through [slashdot.org] so I spent a few minutes to take care of it.
The only drawback to using a rbl is that it can be inaccurate if an innocent party starts using a blacklisted IP. But in the real world due to
Re:The easiest way to eliminate most spam ..... (Score:2)
Re:The easiest way to eliminate most spam ..... (Score:3, Insightful)
Re:The easiest way to eliminate most spam ..... (Score:2)
Re:The easiest way to eliminate most spam ..... (Score:1)
See http://public.kvalley.com/regex/regex.asp [kvalley.com]
Fore example, to allow viagra but detect most of its spamvertized variations:
(?!viagra)(([v])|(\\\W{0,2}\/))[i1l\|\\\/!îíìï:;]( ([a@àáâãäå^æ])|(\/\W{0,2}\\))[gqp96][r](([a@àáâãäå ^æ])|(\/\W{0,2}\\))
Re:The easiest way to eliminate most spam ..... (Score:1)
As another responder pointed out... perhaps this could be used in some form of "weight" calculation. I would think counting special characters and individual characters ( barring I
Modelling Nature (Score:4, Interesting)
(x) technical ( ) legislative ( ) market-based ( ) vigilante
approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)
( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
(x) An enormous amount of spam will initially go undetected before your idea is effective
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
(x) Your idea proposes a solution that only large corporations could deploy
( ) Requires too much cooperation from spammers
( ) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business
Specifically, your plan fails to account for
( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
( ) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
( ) Armies of worm riddled broadband-connected Windows boxes
( ) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
(x) The large amount of resources needed for implementation of your idea that small companies don't have
( ) Outlook
and the following philosophical objections may also apply:
( ) Ideas similar to yours are easy to come up with, yet none have ever been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
(x) Your solution is nothing more than a conceptual remanifestation of a solution that already exists
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
( ) Killing them that way is not slow and painful enough
Furthermore, this is what I think about you:
(x) I think it is a creative concept, but there is no need to reinvent the wheel.
( ) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!
Re:Modelling Nature (Score:1)
(x) Brilliant!
( ) I think it is a creative concept, but there is no need to reinvent the wheel.
( ) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your house down!
^^ Mod Parent Up! (Score:1)
And kind of ironic that the author slipped in some unsolicited politically motivated PR on the Falun Gong as part of his/her message.
Still addressing the symptom, not the root (Score:2)
TOANTFOITOWTBS (Score:2)
Re:TOANTFOITOWTBS (Score:2)
Eventually the peasants will revolt.
Abysmal results (Score:5, Interesting)
The authors used the SpamAssassin corpus. Holden shows that, on the Spamassasin corpus, Bogofilter correctly classifies 90.3% of spam and 99.88% of non-spam. See http://sam.holden.id.au/writings/spam2/ [holden.id.au]
This approach is nowhere near state of the art.
Sounds cool, but... (Score:2)
no more biological metaphors.... (Score:5, Insightful)
Read your Russell and Norvig, people. Airplane research didn't get off the ground (ugh) until we stopped trying to mimic birds and study physical principles of flight.
Re:no more biological metaphors.... (Score:1)
And just so you know, the AIS community is absolutely not ignoring fundamental questions of complexity and mathematical weaknesses. I met one of the authors at ICARIS 05, and her presentation of this work was cautious, qualified, and thorough.
Re:no more biological metaphors.... (Score:1)
Re:no more biological metaphors.... (Score:2)
Stopping "spam" is almost exactly the problem that our immune system has to deal with. It has to go through reams of data (i.e. every cell in your body) and figure out what is junk and what isn't, and it does this by learning through exposure positive and negative examples. It's not perfect either, sometimes it goes berzerk, producing false positives (
Re:no more biological metaphors.... (Score:2)
In any case, the basic idea is simple: use a corpus of examples separated into classes to create an algorithm to decide if a new example is in a certain category. There are million AI techniques to do this. What differs in each case are the details of what each part means.
The immune system analogy is flawed in its de
Re:no more biological metaphors.... (Score:1)
A new range of spam (Score:1)
Re:A new range of spam (Score:2)
--
Hello
I think we had correspondence a long time ago if it was not you I am sorry.
If it was I could not answer you because my Mozilla mail manager was down for a
long time and I could not fix it only with my friend's help I got the emails
address out for me
I hope it was whom we were corresponded with you are still interested, as I am,
though I realize much time has passed since then...
I really don't know w
Re:A new range of spam (Score:2)
Those valid email addresses are themselves highly saleable to spam companies, whether the company is even vaguely legitimate or not.
sounds like something he would say (Score:2)
Pinky, if I could reach you I would hurt you.
"News" from 2004? (Score:2)
Are we still doing this? (Score:1, Insightful)
The real is
Re:Are we still doing this? (Score:2)
Re:Are we still doing this? (Score:2)
They aren't just doing pattern matching; it's more sophisticated than that. It is also adaptive. As Paul Graham said, you can defeat spammers this way because they rely on their message. Email clients can do whitelisting techniques to reduce or eliminate false negatives as well as other things. This can all be done behind the scenes, with user interaction limited to
Re:Are we still doing this? (Score:1)
Augment this "immune system" with some (Score:2)
junk science (Score:2)
Immune System Attacking Spammers (Score:4, Interesting)
Like an immune system, this network of spam attack programs will have a t-cell. The "t-cells" will be a small group of people who draw up the complaint instruction file. Whenever the pathogen (spammer) releases enough toxins (spam) into the body (Internet), the T-cells (people who write the complaint instruction file) alert the immune cells (spam complaint program) of the presence of the pathogen and how to attack (complain to website advertised) it. The pathogen is overwhelmed with a quick immuno responce (high bandwidth usage resulting from many, many complaints).
When the cost of running a website surpasses the revenue earned from said website, the website is shut down. When the costs of spamming or advertising via spam exceeds the income, spam stops. Blue Security was beginning to become successful. Too bad they bowed out.
So, I was thinking (Score:2)
Here's how it works. I catch me a SPAMMER, and have it tested. IFF it is alergic to a common item (ragweed, peanuts, shellfish, etc.). I keep it in the sub-basement. Otherwsie, I release it back to the wild and catch me another.
Once SPAMMER is aquired, I put it in a chair, and provide food and water. SPAMMER is given computer, internet access, and is also attach to an allergen device that delivers the substance SPAMMER is allergic to, in contr
You CANNOT stop spam (Score:2)
Secondly, you can't end spam. Too many companies rely on its existence for their business model to work.
The only way to stop spam is to stop the spammers from SENDING the stuff. However if this happened, you would see a huge number of companies suffer and possibly go bankrupt. Sure, the organised crime groups behi
a real solution (Score:2)
I get (practically) no spam.... (Score:1)