×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

690 comments

spamassasin (1, Offtopic)

matt4077 (581118) | more than 11 years ago | (#4083016)

How does this compare to spamassasin. Anybody know any figures?

Re:spamassasin (1)

dylanm (159359) | more than 11 years ago | (#4083095)

SpamAssassin is about 98% effective in catching spam, with about 1 in a 3000 false positives. (The whitelist feature helps decrease the rate of false positives over time)

They should call it "Spankdot" (-1, Troll)

Anonymous Coward | more than 11 years ago | (#4083241)

Park visitor removed for lewd act [michigancityin.com]

By staff

A Michigan City man was removed from Washington Park Wednesday morning after allegedly masturbating in front of the Portage High School girls volleyball team.

CmdrTaco, 39, 116 Frey Court, was removed from the park at about 11 a.m. after lifeguards observed him reportedly "playing with himself" while watching the team.

Taco was allegedly sitting at a picnic bench about 20 feet from the team. Police were summoned and ordered him to stand up, but Taco initially resisted, police said.

Volleyball team head coach Christine Dixon, Crown Point, told police she and six to eight of the girls saw Taco with his hands inside his shorts, making an up-and-down motion and rubbing himself.

Taco was removed from the park by Michigan City police and Recreation Director Darrell Garbacik, who said Taco could not return to Washington Park Wednesday, and his status as a park visitor would be reviewed.

The report was sent to the LaPorte County Prosecutor's Office for review.

Taco is on the LaPorte County and Michigan City registry for sex offenders.

Re:spamassasin (4, Informative)

tomknight (190939) | more than 11 years ago | (#4083111)

As you appear to have difficulty reading articles, I've give you a helping hand:

"But the real advantage of the Bayesian approach, of course, is that you know what you're measuring. Feature-recognizing filters like SpamAssassin assign a spam "score" to email. The Bayesian approach assigns an actual probability. The problem with a "score" is that no one knows what it means. The user doesn't know what it means, but worse still, neither does the developer of the filter. How many points should an email get for having the word "sex" in it? A probability can of course be mistaken, but there is little ambiguity about what it means, or how evidence should be combined to calculate it. Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability. And Bayes' Rule, equally unambiguous, says that an email containing both words would, in the (unlikely) absence of any other evidence, have a 99.97% chance of being a spam."

Tom.

Re:spamassasin (1)

Bahamuto (227466) | more than 11 years ago | (#4083267)

So I have a question then. What if I write a 'sexy' email to my girlfriend, and I use the word sex or even worse ones, wouldn't that get filtered out too? I'm be curious to see if he tried something like that and didn't get a false positive.

You know what is better? (-1, Flamebait)

Anonymous Coward | more than 11 years ago | (#4083235)

Spamicide

The spam guard for discriminating lesbians.

This is wrong. (1, Insightful)

www.sorehands.com (142825) | more than 11 years ago | (#4083035)

SPAM is wrong!

The proper way to get rid of spam is to get rid of spammers. Have it illegal to send spam, to market using spam, and to host spammers.

Make each link in the chain liable!

Re:This is wrong. (2, Insightful)

morgajel (568462) | more than 11 years ago | (#4083096)

"if you outlaw spam, the only people with spam are outlaws..." er something.
anyways, what I was going to say is ok, US outlaws spam. now what? sue korea as a whole? how about china? nigera?

laws don't mean shit.
you need to go after the people making MONEY off spam, not the spammers. Most of them are US "businesses". ...and I use the term 'business' loosely.

When I said... (1, Insightful)

Anonymous Coward | more than 11 years ago | (#4083137)

When I said market using spam, that includes the company that hires someone who spams.

Re:This is wrong. (2)

Stonehand (71085) | more than 11 years ago | (#4083160)

Given that much of my spam is not only /from/ Korea, but /in/ Korean, a considerable amount likely comes from Korean businesses.

As for what to do? One heavy-handed bit of leverage would be to block /all/ telcommunications from Korea until they develop some responsible marketing laws and enforce them (with, say, a 90-day notice in advance).

Re:This is wrong. (2, Insightful)

japhmi (225606) | more than 11 years ago | (#4083228)

One heavy-handed bit of leverage would be to block /all/ telcommunications from Korea


This is a very bad idea. What about companies such as Hyundai that have Korean and American (and many other countries) divisions? Or, what about my friends from Korea trying to e-mail their family back home - should they be hurt because some companies in their home country do bad things (and/or it's government doesn't have/enforce laws to stop them)? Name a country that doesn't another country/ies thinking that they need to 'change how they do things over there.'

Re:This is wrong. (1)

njet (189864) | more than 11 years ago | (#4083098)

The same method should be applied also for cracking/ddos/.... But it does not work. abuse reports don't get to right hands....admins (if they have some) don't care.......

Re:This is wrong. (-1)

Anonymous Coward | more than 11 years ago | (#4083114)

I love this.

First you slashdotters hate the fact that the US has too many damn laws. You want more freedoms.

Now you want to make more laws?

MAKE UP YOUR MINDS!

Re:This is wrong. (1)

schroedinbug (207181) | more than 11 years ago | (#4083118)

I completely agree with that except for the part about making it illegal to host spammers.

Now if they are knowingly hosting them, thats a different thing, but I know where I work, we had one try to start spamming people. When we got the notices that this was happening, we promptly deleted his account and put his name and address on the perma-ban list.

ISP's shouldn't be held liable unless they are purposely letting the spammer create headaches in the mailboxes of millions.

Re:This is wrong. (1)

www.sorehands.com (142825) | more than 11 years ago | (#4083168)

Well all of this is based on knowing.


I mean, if an ISP keeps a spammer, after being made aware that they are a spammer, then the ISP should becomes liable. That includes bandwidth providers.


We should treat spam money like drug money, all assetts that have been possibly bought with spam money, even if given away, subject to judgment.

Re:This is wrong. (1)

mhore (582354) | more than 11 years ago | (#4083129)

The proper way to get rid of spam is to get rid of spammers. Have it illegal to send spam, to market using spam, and to host spammers.

...or have them shot on spot. messy, though... hmm... AND THEN MAKE THEM INTO REAL SPAM! YAH! A fitting end.

Mike.

Shooting spammers is wrong. (0)

Anonymous Coward | more than 11 years ago | (#4083203)

We should put them chain them up in the tech centers around the country. Then people get to pay $20 / lash get to whip them.


This does several things:

  • It feels good!
  • Generates money to pay for their damages,
  • Discourages other spammers, and
  • It feels good!

Re:This is wrong. (2)

ceejayoz (567949) | more than 11 years ago | (#4083133)

Spam is wrong, but so's murder. That doesn't stop it from happening.

We should pursue legal avenues for stopping spam, but that doesn't mean we shouldn't try to block it in the meantime! The article sounds like a phenomenal way of blocking spam.

Re:This is wrong. (2)

tomknight (190939) | more than 11 years ago | (#4083144)

So you're after a world-wide law outlawing spam? Most of mine is currently coming from Taiwan, so that's what I'd need... Please, get real!

Tom.

Re:This is wrong. (2)

nougatmachine (445974) | more than 11 years ago | (#4083146)

Yes, because that works so well for heroin. And prohibition worked really well, too. And isn't something like 95% of the trading on KaZaA and Gnutella illegal as well? And all of the child porn readily available on the net?

Spam, like these things, is going to be extremely difficult to enforce. Laws or no laws, filters will be necessary.

Re:This is wrong. (0)

Anonymous Coward | more than 11 years ago | (#4083162)

We should use every reasonable option to fight spam, legal and technological. We need to make it as difficult as possible so there are very few people willing to go through the trouble to bother sending it out.

Re:This is wrong. (0)

Anonymous Coward | more than 11 years ago | (#4083253)

Make each link in the chain liable!

Of course!

Suing people is the modern-day equivalent of beating people up, torching their houses, shaming their daughters...

Re:This is wrong. (0)

Anonymous Coward | more than 11 years ago | (#4083256)

kill spammers. A .44 magnium would work.

Re:This is wrong. (1)

RylandDotNet (81067) | more than 11 years ago | (#4083286)

Considering the fact that spammers don't feel any compunctions about hijacking an open mail relay, I don't think they're going to consider a law against spamming much of an obstacle.

Absolutely..... (2)

reaper20 (23396) | more than 11 years ago | (#4083036)

I propose we define spam as unsolicited automated email. This definition thus includes some email that many legal definitions of spam don't. Legal definitions of spam, influenced presumably by lobbyists, tend to exclude mail sent by companies that have an "existing relationship" with the recipient.

This needs to happen, just because I buy a book from a company doesn't mean I want their stupid monthly mailing list.

This seems very similar to Spamassassin, which alot of us are using with great success.

I heard about this! (2, Funny)

WilliamsDA (567274) | more than 11 years ago | (#4083038)

I got an email last night about this! Also, it asked me to help out his Nigerian cousin...

Filter for color ff0000 (2)

geekoid (135745) | more than 11 years ago | (#4083053)

of course! it sounds so obvious now.
jeez, that alone would cut down on spam, cross reference that with my trusted address book, and I'll probably be ably to filter all spam.
I have that feeling you get when you've been stuck with a problem, and some guy looks at the code for about 2 seconds and finds a problem.

If you use Outlook... (2, Informative)

Anonymous Coward | more than 11 years ago | (#4083055)

(Yeah, yeah, I know...)

But if you do, check out Cloudmark's SpamNet [cloudmark.com]. I've been quite please with it's ability to stop spam, and it gets better the more people that use it.

Re:If you use Outlook... (0)

Anonymous Coward | more than 11 years ago | (#4083233)

I'll second Cloudmark's SpamNet.. I use it and while it doesn't catch it all, it has reduced it by over half and I don't have to actually do anything.

Ok, that is hot.... (4, Insightful)

Vengie (533896) | more than 11 years ago | (#4083056)

1) Lisp...ever since i ran into scheme, I have _loved_ the concept of lisp based languages. A nice Hoo-ha to anyone who says there are no practical applications of lisp based languages. (except haskell...which personally, i think sucks! if one of our own professors hadn't invented it, it would be dead by now)
2) _0_ false positives. I'm perfectly happy to settle with "some small number of spams getting through" given there are NO false positives. Early on in the article he states that he realizes this is a critical problem, and from the start keeps no false positives as a goal. It is far better to have no false positives then to have 100% no-spam rate with that in mind...
3) the statistical word analysis is really interesting..."describe" is innocent. unfortunately....what happens when a few smart spammers get their hands on this analysis
*sigh*

Re:Ok, that is hot.... (1, Insightful)

Anonymous Coward | more than 11 years ago | (#4083105)

I'm perfectly happy to settle with "some small number of spams getting through"

I'm not singling you out, but this statement is the exact reason spam has become as popular as it has. It's annoying, it's cumbersome, but everyone is willing to 'settle' to avoid further problems. People spend effort developing complex filters and programs and proxies. which the spammer spends about a minute and a half figuring out how to get around. I think with the spammers there should be ZERO tolerance and ZERO SPAM. To stop spam you need to stop THE SPAMMER.

Re:Ok, that is hot.... (2)

Vengie (533896) | more than 11 years ago | (#4083134)

I was referring to the spam filtering software. I realize spam is an evil that must be fought at the source -- while I _do_ wish for the eventual removal of ALL spam, in assessing a SPAM FILTERING software package, the critical element is the false positives. I'd rather have a software package that has 50% filtering and 0 false positives then 100% filtering and 1 false positive. I _never_ want to miss an actual email directed at me.

Re:Ok, that is hot.... (1)

GrenDel Fuego (2558) | more than 11 years ago | (#4083232)

I have a garunteed method for making sure that no spam gets through. Filter all e-mail to /dev/null, and you're sure not to miss a single spam message.

However, I'm not going to use this method because I'd actually like to read mail that someone sent to me.

He wasn't suggesting that getting rid of all spam is not a goal to strive for, it's that you shouldn't use methods that may keep you from reading real e-mail.

Re:Ok, that is hot.... (5, Insightful)

Plutor (2994) | more than 11 years ago | (#4083234)

1) [...] A nice Hoo-ha to anyone who says there are no practical applications of lisp based languages. (except haskell...which personally, i think sucks! [...])

You ridicule people who dismiss the usefulness of your personal "favorite" language, and then you dismiss the usefulness of one particular language that you happen to dislike? That's a bit hypocritical.

3) [...] what happens when a few smart spammers get their hands on this analysis[?]

Paul covers this. First, he suggests that each user's filters should be personalized, so that any spammer would not be able to circumvent everyone's filters. Second, the filters would be continually learning, possibly dumping older words from the corpus in favor of newer ones. And third, even if a spammer put at the end of his spam "describe describe describe describe", this still wouldn't work; the basic premise of the filter is that the spammer HAS to tell you what he's selling, and in the process of doing that, gives himself away as a spammer.

Fighting Sperm (0)

Anonymous Coward | more than 11 years ago | (#4083057)

The best way to avoid a torrent of gloppy manjuice shooting all over your naked buttocks every time you even CONSIDER turning your computer on is to remove Linux from your hard drive immediately.

By installing a stable, sensible OS like Xenix, you can ensure an ejaculate-free user experience.

Easy way to beat spam 100% (4, Interesting)

Anonymous Coward | more than 11 years ago | (#4083060)

Create an E-Mail address called, say, spam@example.net.

Put a link to it on your website, but tell people not to use it for anything, E.G.

<a href="mailto:spam@example.net">Spam trap - don't use me</a>

Then, it'll get harvested along with all the others on your site. That mail box will fill up with spam, and nothing else.

What good is that? Well, you've got a ready-made list of messages to filter *out* of your other mail boxes!

So, just write a script that checks each inbound E-Mail against the spam list. If it matches, you *know* it's either:

1. Spam

or

2. An E-Mail that somebody has also sent to the "Don't use me" address.

In either case, you don't want to read it, so it gets auto-deleted. Nice.

Oh, I think I'll patent this, and not tell any of you about the royalty I'm going to charge in 15 years time. Hahahahahahaha!!!

Oh, by the way, first post, first post... NOT!

Re:Easy way to beat spam 100% (1)

elmegil (12001) | more than 11 years ago | (#4083150)

You don't even have to make it "don't use me". Use <font> tags to make the text the same color as the background, and nobody will ever see it except for the spam harvesting bots.

Re:Easy way to beat spam 100% (0)

Anonymous Coward | more than 11 years ago | (#4083250)

Yeah, good point. The only thing not to do is to make it an empty link, like:

<a href="mailto:foobar"></a>

because I think the spam bots probably *would* filter that out if it got widespread.

This really can work (1)

kcroke (466899) | more than 11 years ago | (#4083064)

There are some internet filters out there that use Fuzzy Logic out there instead of databases. They are able to determine what catagory a web page can go into without ever having seen the web page before.

This technology should also be able to be applied to spam.

I hope yahoo reads that article.

not to be pessimistic.. (1)

shiafu (220820) | more than 11 years ago | (#4083068)

Even if someone develops a clever algorithm that's 99% effective, won't the spammers just find a way around it? It's sort of like the music industry and their vain attempts at copy protection. Some of these spammers are smart, computer-savvy people too.

Re:not to be pessimistic.. (0)

Anonymous Coward | more than 11 years ago | (#4083091)

He talks about why he thinks this will not be a problem in the article.

But spammers evolve... (1)

bobdotorg (598873) | more than 11 years ago | (#4083069)

One feature of spammers is to adapt to any sort of anit-spam technology. What's to stop spammers from writing spam filled with 'non-spam' words?

Re:But spammers evolve... (0)

Anonymous Coward | more than 11 years ago | (#4083119)

Because spam written without spam words just isn't spam. Then you've got false postivies, so you adjust the filter again, to include more non-spam words, which were not from spam to begin with. So, the non-spam words are not really non-spam words, but non-non-spam words, and the problem with that is that non-non-spam words are just spam words, because of the double negative, so do you filter them out or not? In effect you are creating your own spam, or meta-spam. So, then you've got to filter out meta-spam, using non-meta-spam words, which gets a bit confusing.

Re:But spammers evolve... (1)

russx2 (572301) | more than 11 years ago | (#4083271)

And of course there's the fact that spam, without using spam-like words, just won't be effective. Now I'm not saying spam in its present form of 'cum see me and my friends naked in my dorm room FREE' is particularly effective either, but if spammers can't make their spam at least... well, intriguing :-), what's the point?

Re:But spammers evolve... (0)

Anonymous Coward | more than 11 years ago | (#4083288)

or send images instead of text to ppl with html enabled email clients....

Ack! LISP! (0)

Anonymous Coward | more than 11 years ago | (#4083070)

His sample code is written in LISP! Run away! RUN AWAY!

SPAM (-1, Offtopic)

Anonymous Coward | more than 11 years ago | (#4083079)

Spam is good....I ate some for breakfast.

spam is a necessary evil (0, Troll)

Pink Hamster (598388) | more than 11 years ago | (#4083083)

I think that spam is a necassary evil that can be easily controlled. If we make a law to simply ban spam then we might be banning other things like mail lists. I personally recieve NO SPAM in my main account and less than one piece a day in my "junk mail account." That's inluding things that the spam filter catches. All people have to do is to be careful with their e-mail addresses. Spam is not a problem for people who use a modicum of common sense

Re:spam is a necessary evil (3)

matt_wilts (249194) | more than 11 years ago | (#4083199)

I think that spam is a necassary evil that can be easily controlled. If we make a law to simply ban spam then we might be banning other things like mail lists. I personally recieve NO SPAM in my main account and less than one piece a day in my "junk mail account." That's inluding things that the spam filter catches. All people have to do is to be careful with their e-mail addresses. Spam is not a problem for people who use a modicum of common sense

Let me tell you, the longer you've been online the more likely you are to get this shite. Remember, it only takes ONE posting of your mail address to a newsgroup (which in my case could have been years ago) and that's it. Then of course you end up on one of these "1 BILION fresh email addresses for $100" lists and you're dead meat.

Matt

Re:spam is a necessary evil (0)

Anonymous Coward | more than 11 years ago | (#4083280)

a mailing list is not spam, because i took the effort to explicitely sign up for the wxPython mailing list. i want to recieve those messages. it is not spam then.

arc (0)

Anonymous Coward | more than 11 years ago | (#4083084)

I wonder when Paul will release arc [paulgraham.com] to the world.

A weak point... (2)

tomknight (190939) | more than 11 years ago | (#4083085)

One question that arises in practice is what probability to assign to a word you've never seen, i.e. one that doesn't occur in the hash table of word probabilities. I've found, again by trial and error, that .2 is a good number to use. If you've never seen a word before, it is probably fairly innocent; spam words tend to be all too familiar.

Sadly once the spammer knows this method's being used, he'll start chucking in obscure (but valid) words... ah well, maybe at least spanm will start getting interesting to read, assuming the spammer tries to use the word in context.

"Buy my superlatively efficacious mail list."

Maybe not...

Tom

Re:A weak point... (2, Interesting)

sebi (152185) | more than 11 years ago | (#4083192)

You should have continued to read the article.

To beat Bayesian filters, it would not be enough for spammers to make their emails unique or to stop using individual naughty words. They'd have to make their mails indistinguishable from your ordinary mail. And this I think would severely constrain them. Spam is mostly sales pitches, so unless your regular mail is all sales pitches, spams will inevitably have a different character.

Basically the only way to get around this proposed method of statistical analysis ist to completely change the way spam copy is written. But changing that would basically defy the whole point of spam. If, to get through a filter, you had to stop writing sales pitches, then why spam in the first place?

Re:A weak point... (3, Insightful)

tomknight (190939) | more than 11 years ago | (#4083229)

Yes, I'll admit I hurried in with the comment there. Stupid ;-)

Spammers would learn to adapt, and the sales pitches would change character/format. The sales pitch will still be that, but it'll be more cleverly designed - it may be hard to do, but people will manage it. having said that, this method does look like it could be worth implementing - maybe even on the mail server...

Tom.

I CAN'T BE STOPPED! (-1, Offtopic)

Anonymous Coward | more than 11 years ago | (#4083099)

Due to excessive bad posting from this IP or Subnet, comment posting has temporarily been disabled. If it's you, consider this a chance to sit in the timeout corner. If it's someone else, this is a chance to hunt them down. If you think this is unfair, please email jamie@slashdot.org with your MD5'd IPID and SubnetID, which are "c9e8c27a161ecc03213c2f93dc3ea51a" and "a67245123af6bd3ea6538d034162ec02".

This is not news ... (5, Informative)

dougmc (70836) | more than 11 years ago | (#4083101)

The statistical approach is not usually the first one people try when they write spam filters. Most hackers' first instinct is to try to write software that recognizes individual properties of spam.
And he's correct. A few years ago, most spam filters did look for individual properties of spam.

BUT, now, the best spam filters out there already use statistical properties. Spamassassin [spamassassin.org] does this, for example, and it works *extremely* well. Before I found Spamassassin, I had a huge procmial recipe that used it's scoring mechanism to do basically the same thing -- but of course spamassassin does it better, so I switched :)

FIGHT SPAM! (-1, Troll)

Anonymous Coward | more than 11 years ago | (#4083104)

Paul Graham, the Lisp Guru is back with a great technique to fight spam! Paul Graham, the Lisp Guru is back with a great technique to fight spam! PAUL GRAHAM THE LISP GURU IS BACK WITH A GREAT TECHNIQUE TO FIGHT SPAM! Paul Graham, the Lisp Guru is back with a great technique to fight spam! Paul Graham, the Lisp Guru is back with a great technique to fight spam! PAUL GRAHAM THE LISP GURU IS BACK WITH A GREAT TECHNIQUE TO FIGHT SPAM! Paul Graham, the Lisp Guru is back with a great technique to fight spam! Paul Graham, the Lisp Guru is back with a great technique to fight spam! PAUL GRAHAM THE LISP GURU IS BACK WITH A GREAT TECHNIQUE TO FIGHT SPAM! Paul Graham, the Lisp Guru is back with a great technique to fight spam! Paul Graham, the Lisp Guru is back with a great technique to fight spam! PAUL GRAHAM THE LISP GURU IS BACK WITH A GREAT TECHNIQUE TO FIGHT SPAM!

This can be a troll or it can be funny, depends how you look at it. But it's definitely not off-topic.

Stop sending spam, you spammer! (-1, Offtopic)

Anonymous Coward | more than 11 years ago | (#4083142)

foobar

No Need (-1)

Anonymous Coward | more than 11 years ago | (#4083108)

Just use <a href=http://www.atqui.com>http://www.atqui.com</a>

Major geek bias there... (5, Funny)

Kaa (21510) | more than 11 years ago | (#4083109)

From the article:

Based on my corpus, "sex" indicates a .97 probability of the containing email being a spam, whereas "sexy" indicates .99 probability. And Bayes' Rule, equally unambiguous, says that an email containing both words would, in the (unlikely) absence of any other evidence, have a 99.97% chance of being a spam.

Hmm.... take an average adult geek and yes, an email mentioning sex or sexy can go to /dev/null immediately without as much as a second glance... :-)

On the other hand if you run the statistics on email of an average horny teenager, the probabilities might get a bit different.

There's only one true solution (1, Interesting)

Dimensio (311070) | more than 11 years ago | (#4083113)

Spammers will try to work around filters, as they don't care that no one wants their crap. Further, filtering it doesn't solve the bandwidth situation, as the lines are still tied up with the bits running through the system until it hits the filter.

There is only one good solution for spam: killing spammers. It should be done, and it should be done brutally and painfully. When known criminal spammers like Ralsky (who ran a child pornography site at one point) are brutally murdered, others may think twice before firing up "EmailBlaster 2002".

Who doesn't get Lisp related porn? ;) (1)

ssimpson (133662) | more than 11 years ago | (#4083122)

To quote the author: "I get a lot of email containing the word "Lisp", and (so far) no spam that does".

He obviously doesn't getting the "Lesbians with a Lisp" pr0n......


(spam) (0, Funny)

Anonymous Coward | more than 11 years ago | (#4083123)

(insert (lisp joke (here)))

This approach is very easy to defeat (5, Interesting)

Bazzargh (39195) | more than 11 years ago | (#4083131)

Here's how: the spam should be written as a 'multipart/alternative' with an html version of the spam as the primary alternate. The text version contains an innocuous message intended to pass the statistical spam filter. The spam message is entirely contained as an /image/ within the html. The text of the spam becomes invisible to the reader but not to the poor schmuck who gets the email.

I'm guessing here that the inclusion of a single image tag in the html is unlikely to trigger the spam filter, and supplying a wealth of evidence that the email is 'not' spam in the unseen alternate text will let the letter through.

Never mind that... (1)

billbaggins (156118) | more than 11 years ago | (#4083202)

The existence of a whitelist (e-mail addresses that are "trusted" to send nonspam) makes things very easy. Right now you can buy CDs with zillions of addresses. If the whitelist lives, then the next generation of that CD will have pairs: your address, and the address of someone you've probably mailed (say, an address that appears on the same page as yours). Voila!

As for multipart/alternative... right now anything I get that has a content-type other than text/plain goes to a special folder, where it usually gets deleted without even being opened... fortunately most of my friends use proper mailers that send text/plain :-)

Re:This approach is very easy to defeat (2)

topham (32406) | more than 11 years ago | (#4083268)

until it gets put into the 'spam' archive and processed where the word "alternate" is set at .99.

Circumvent (5, Interesting)

conner_bw (120497) | more than 11 years ago | (#4083151)

This is a nice concept. His algorithm works because spam uses the same repetive syntax. Because so many spam/emails are sent out - it can be flagged by pattern recognition... based on the assumption that it is written in English! It would also probably flag Spam parodies written by friends, or marketing information you were actually subscribed to...

I'm a bigger cheerleader for Human Authentication, example:

http://www.si20.com/webauth.php?auth_id=17&mailto= true [si20.com]

I think this methodology is more effective.

Working with the assumption that that *all* spam is sent out by machines, you can easily conclude the need for an automatic process asking the sender to add him or herself to your trusted list by following the simple steps outlined in the URL above. Any literate (i.e. email sender) human can read the bitmap of jumbled text found in the URL above, but a computer can't.

I have been using this method along with filters included in the SI20 Software [si20.com] package and I have not received *ANY* spam in over a month.

Zero spam, period.

That being said, regardless of the software package you use, the Human Authentication approach is relentlessly effective.

Re:Circumvent (2)

clare-ents (153285) | more than 11 years ago | (#4083181)


I guess you never wish to converse with a blind person, or someone who's restricted to a text only medium then?

Re:Circumvent (1)

conner_bw (120497) | more than 11 years ago | (#4083215)

Unfortunately, I do not correspond with any blind people at the moment, so this is a point i never considered...

But if i did, I could manually add said person to my trusted list in the aforementioned solution.

Same with a person restricted to text.

"delete-as-spam button" (3, Interesting)

xipho (193257) | more than 11 years ago | (#4083155)

This is the brilliant part, and crucial to the endeavour, and so easy to implement!

It appears all the nay-sayers here haven't even read the article (no surprise). With as little code as needed to implement this it should be a must in the next mozilla mail/pine etc. code base.

Ban this IP, it's just a CGI proxy ;) (0)

Anonymous Coward | more than 11 years ago | (#4083157)

Due to excessive bad posting from this IP or Subnet, comment posting has temporarily been disabled. If it's you, consider this a chance to sit in the timeout corner. If it's someone else, this is a chance to hunt them down. If you think this is unfair, please email jamie@slashdot.org with your MD5'd IPID and SubnetID, which are "c9e9c670161ecc03213cef93dc3ea53a" and "167245123af6b03ea65389334162ec02".

Another way to stop Spam (5, Interesting)

mr.nicholas (219881) | more than 11 years ago | (#4083158)

Having had the same email address since '93, I receive close to 1000 spams per day to my personal account (which is also aliased from root/postmaster/webmaster).

I've tried everything under the planet to reduce the amount that I see in my mailbox; SpamAssassin being one of the best so far. But even that lets through quite a bit (around 10%).

So I decided to attack it from a different angle. I wrote a series of perl-scripts that I plunked into my procmail file.

The scripts work by checking the address of the sender each time a message is received. That address is looked up in a database. If it exists in the db, and it's marked as "authorized", it's just passed into my mailbox.

If it's marked as denied, /dev/null.

If it's never been seen before, an authentication message is sent to the sender asking them to reply to it to authorize themselves. If that authmessage is bounced back, a db entry is made as "denied".

If it's replied to in a normal fashion, that email is marked as "authorized" and any queued up mail from that person is pushed out.

The concept is that spam will almost never have a valid reply-to; so it will bounce and be marked as denied.

Even if the email doesn't bounce, no spammer alive will reply to it; so after 30 days, that email is marked as "denied".

Since I've set this up (for myself and my 10-year-old son who receives porn in his box (grrr!!!!)), it has worked flawlessly. The "real" email is unharmed, while the spam is stopped.

Oh, and I have a web-based control page so that users can manually add email addresses (for lists and such).

This week, for the first time in YEARS, I don't have spam in my mailbox anymore.

Hurray!

No if I can only stop those damned dictionary-based scanning of my servers, I'll be set. Thank the gods that I don't have metered service.

Re:Another way to stop Spam (0, Interesting)

Anonymous Coward | more than 11 years ago | (#4083221)

Huh, actually 5 minutes editing my Outlook mail rules acheved exactly the same thing and I've been nearly spam free for years even though I receive at least 300 a day from my domain. No scripts, no voodoo. Just sinple point and click. There's the difference between closed source and open source. Closed source you use, open source you code.

Re:Another way to stop Spam (2)

Mr_Silver (213637) | more than 11 years ago | (#4083242)

The scripts work by checking the address of the sender each time a message is received. That address is looked up in a database. If it exists in the db, and it's marked as "authorized", it's just passed into my mailbox.

Whilst this is a very good and effective method, for a person on the end of this it's an absolute pain in the butt to go through this palava just so you can send someone one email, get one response and then never communicate with them again.

I'm not knocking your solution, but personally I'd rather something that didn't inconveniance the legitimate people that do want to contact me.

(plus, this sort of thing looks rather poor corporate-wise)

Re:Another way to stop Spam (0)

Anonymous Coward | more than 11 years ago | (#4083266)

Yeah, the webmaster at php.net uses the same idea.

Plug for OnLisp (1)

nonya (65503) | more than 11 years ago | (#4083167)

While you are there check out his book "OnLisp" (available for free at http://www.paulgraham.com/onlisptext.html). It is an extreamly well written book and gives a flavor of what makes lisp special - its macros. Because lisp has such a regular syntax you can do amazing things with macros.

My only complaint about OnLisp is it only has one chapter on the common lisp object system, which is very powerful - multimethods, method combination, and a metaobject protocall - and could have used more explanation; I don't think it talks about lisp's exception handling at all.

But for a flavor of why people love lisp give this well written book a try!

A slightly different solution (1)

Edrick (590522) | more than 11 years ago | (#4083171)

It seems that spam blocking software has become very popular, and for the most part is rather effective, with mistakes being little to none.


I would have to say, though, that the best way to avoid spam is to not solicit your email address to anyone aside from close friends or trusted coworkers. If you really need an email address to enter on web pages, then create a yahoo account or such.


I have had several email addressed including a school, cable, and alumni account that receive practically no spam.


The fact here is that spam will continue no matter what laws are created...it is as likely to stop as hacking and pirating. I personally prefer having an email account or two that are clean and don't require special programs to weed out hundreds of junk mails a day.


All it takes is signing up for the wrong listserv or posting your email to the wrong sites to allow the chain reaction of spam to begin for you. Just be responsible and treat your email address the way you would your mailing address or phone number and you will be fairly well off and not require all of these spam blocking utilities. At least, this is my philosophy.

Content-Type: text/plain; Encoding: base64 (2)

GGardner (97375) | more than 11 years ago | (#4083173)

This is the latest trick that spammers are using -- encoding a plain text (or html) message in base64. I guess this is because many filters don't mime decode before filtering.

However, for me, it's a really easy way to check for spam -- 100% of all text/plain or text/html encoding in base64 is spam. Easy to check for, easy to remove.

Misleading (5, Interesting)

RainbowSix (105550) | more than 11 years ago | (#4083187)

He isn't fighting spam, he is filtering it. There is a difference. Filtering still costs in bandwidth. Fighting it would eliminate the source and free up the gigabytes of bandwidth lost for this marketing purpose.

Filtering is fine for now, but ultimately it must be fought and defeated.

Re:Misleading (4, Insightful)

sebi (152185) | more than 11 years ago | (#4083261)

In the long run filtering would eliminate the source as well. Spam has to be payed for by two sides: Both the spammer and the recipient have to pay for the bandwith. The spammer has to pay a lot more though. Spamming is a business that will continue to exist as long as its profitable. If the success rate of Spam drops dramatically due to refining filters than sooner or later Spammers will no longer be able to afford the bandwidth they need.

Time for a spam contest! :) (2)

stere0 (526823) | more than 11 years ago | (#4083195)

Using Graham's system, write a message that will get a very high mark. The highest mark will win.

The message has to be understandable English. Please post your entry as a reply to this message.

Is this thing patented? (2)

WetCat (558132) | more than 11 years ago | (#4083206)

Can I use that feature for my own (commercial
or open source) mail client development?

AI Anti-Spam Papers (1)

bpfinn (557273) | more than 11 years ago | (#4083211)

There are several papers describing using Naive Bayes classification, as well as others AI techniques, to filter spam here [www.aueb.gr]. Look for the section on "Document Filtering".

Perl (2)

Mr_Silver (213637) | more than 11 years ago | (#4083217)

This looks like something that could easily be done in Perl.

Although to be honest, I don't understand how the algorithm works. However I'm sure some enterprising soul can probably work it out and code something (hell I will if someone can explain it in decent mathematical terms).

All we need then is a repository of spam mail and non-spam mail to "teach it".

Whatcha reckon?

Best anti-Spam method is TMDA (3, Interesting)

Erore (8382) | more than 11 years ago | (#4083219)

I'm continually amazed at the people who are beating their heads up against a very simple problem. The answer is not statistics, it is not heuristics, it is not AI, it is not procmail.

The answer is verification...aka whitelists. Check out TMDA, tmda.sourceforge.net. This program assumes you don't want mail from anybody whom you haven't explicitly allowed, or who has verified that they are a real person and not a spammer.

Verification is simple, and some people will point out that it could be defeated by a spammer. But, the economics of spam do not make it feasible for a spammer to attempt to defeat TMDA.

TMDA is similar to making your phone number private. You only get phone calls from people you have given your number to, and you never get telemarketers.

TMDA user since December 2001. Spam messages that tried to get in, 12,133, spam messages that got in 3, false positives, 0. Time I've spent tweaking and modifying the program since installation, 0 minutes.

Re:Best anti-Spam method is TMDA (1, Funny)

Anonymous Coward | more than 11 years ago | (#4083282)

"verified that they are a real person and not a spammer."

Heh, spammers are people too, you know :-)

Please explain the LISP code (0)

Anonymous Coward | more than 11 years ago | (#4083244)

For those of us that are not LISP gurus,
can someone explain what's he's doing with
the following code:

(let ((g (* 2 (or (gethash word good) 0)))
(b (or (gethash word bad) 0)))
unless ( (+ g b) 5)
(max .01 (min .99 (float (/ (min 1 (/ b nbad))
(+ (min 1 (/ g ngood)) (min 1 (/ b nbad)))))))))

Another idea (2, Interesting)

caesar79 (579090) | more than 11 years ago | (#4083285)

a nice idea to filter spam ...another one to fight it.

1. the MTA's (mail transport agents like sendmail etc) establish trust relationships between themselves or manually. They also maintain a users safelist (i.e. addressboook + list of addresses user wants to recv mail from)

2. All email over the trusted links and from addresses in the safelist are delivered unfiltered.

3. For each email sent over an untrusted link
a. Perform MD5 over message body.
b. Ask neighbouring trusted agents if they have received an email whose MD5 is given.
c. If no. of positives are greather than a threshold, reject as spam.

Could this also be used for studying spam? (3, Interesting)

FuzzyDaddy (584528) | more than 11 years ago | (#4083289)

Could this technique be used as a way to track evolving spam techniques over time?

You could develop a corpus of spam over a long period of time, and look for shifts in the data. What this paper describes is distinguishing between a spam-corpus and a legit-corpus, but you could also compare a spam-1999 corpus to a spam-2002 corpus, and see if the spammers are up to anything new.

Not that it would be useful, but it might be kind of cool to try it out and see.

possible oversites (1)

Launch (66938) | more than 11 years ago | (#4083291)

I have no doubts about the research that goes into the calculation of words that were in spam, since pretty much everyone gets simular types of spam and it's not difficult to collect spam marketed to many demographics.

What I do wonder about is his collection of non-spam. I agree that this approach is very good, but I think a hash of non-spam needs to be collected by an end user or for a specific demographic.

For instance in his article he said that the word madam almost never appears in his non-spam mails. Well he isn't a woman. It is a quite common business practice to send e-mails with the greeting madam. Also the vocabulary used in a personal e-mail enviorment would be drasticly diffent then in a business enviorment.

Say your an AOL teeny-booper... the chances that another teen is sending you an e-mail with red text (fl0000 was one of the key words that was 99% chance of spam) are much greater then a business e-mail envoirment (which actually I use bright read sometimes when in-line replying to e-mails at the office).

So like I said before. I really think the hash of 'good' e-mails has to come from a end-user or at the very least from a demographic...

Another idea! Need repository of spam (2)

Mr_Silver (213637) | more than 11 years ago | (#4083293)

I've got another idea which might work using Markov chains. You strip the text, work out the probabilities of groups of words appearing after each other and then score that way. As spam changes so would this.

However to test such an idea I need a repository of spam mail - something I don't have. Hotmail junk is no good, it's just the same old adverts regurgitated over and over again.

Does anyone have anything like the 4000 junk emails that this guy has? If so, please could you pop me an email to org dot ewtoo at silver as I'd really appreciate it!

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...