Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

The Next Step in Fighting Spam: Greylisting

michael posted more than 11 years ago | from the ideas-are-a-dime-a-dozen dept.

Spam 481

Evan Harris writes "I've just published a paper on a new and unique spam blocking method called "Greylisting". The best thing about it other than achieving better than 97% effectiveness in blocking spam, is that it practically eliminates the main problem of other solutions: the false-positive. There's even source code for an example implementation written as a perl filter for sendmail, along with instructions for installing, so you can get up and running quickly."

cancel ×

481 comments

Sorry! There are no comments related to the filter you selected.

DON'T CLICK LINKS IN PARENT -- GOATSE.CX (-1, Troll)

I am the blob (239590) | more than 11 years ago | (#6256169)

Re:DON'T CLICK LINKS IN PARENT -- GOATSE.CX (-1, Troll)

Anonymous Coward | more than 11 years ago | (#6256223)

ãã--Î"ãHDãã sucks

and so do you! props for the FP (faggoty post)

Is Evan Harris the GOATSE.CX GUY? (-1, Offtopic)

Anonymous Coward | more than 11 years ago | (#6256471)

he has such a cute hole

FIRST SPAM POST! (-1, Offtopic)

Anonymous Coward | more than 11 years ago | (#6256173)

please mod this as -1,Spam.

your first mistake (4, Insightful)

frieked (187664) | more than 11 years ago | (#6256181)

I'm going to try to say this as nicely as possible and without trolling:
You have just rendered Greylisting pretty useless by making it open source. Spammers are much smarter than you think and what you have basically done is shown them what they need to do in order to get around Greylisting. That's just my take on the issue, maybe I'm wrong but I doubt it.

security through obscurity, again? (4, Insightful)

dh003i (203189) | more than 11 years ago | (#6256240)

If they can get around it by looking at the source, then something was wrong with it, waiting to be exploited. Might as well fix it.

Re:security through obscurity, again? (4, Interesting)

blakestah (91866) | more than 11 years ago | (#6256300)

The thing that is wrong is the SMTP protocol, and most people's conception of a spammer. Once you see a few "confessions of ex-spammers", everything changes.

There are people out there who pay $10000 in startup costs, and then make $2000/week for spamming. The $10000 gets them software written by knowledgable internet security experts. This software finds any and every way to anonymify the email spam, and finds lists of people to spam.

As long as knowledgable internet security experts are getting paid good cash to enable spammers, and SMTP doesn't change, spam will only continue to get worse. There needs to be a fundamental change in SMTP protocols. It oughta take the spammers about 2 days to fix their MTA bug to get around greylisting.

Re:security through obscurity, again? (3, Insightful)

SuiteSisterMary (123932) | more than 11 years ago | (#6256477)

The way to get around this, of course, being that you send each email twice. In other words, run through your database, then run through your database. Same IP addy, same sender, same recipient. As far as the MTA's concerned, it's retrying. Boom.

Re:your first mistake (4, Insightful)

Soko (17987) | more than 11 years ago | (#6256266)

I'm going to try to say this as nicely as possible and without trolling:

Not trolling at all - you have a legitimate (though perhaps misguided) problem with this method.

You have just rendered Greylisting pretty useless by making it open source. Spammers are much smarter than you think and what you have basically done is shown them what they need to do in order to get around Greylisting. That's just my take on the issue, maybe I'm wrong but I doubt it.

So, the spammers themselves will be of significant help in debugging and helping to fix the code so they can't circumvent it, won't they? OSS means anyone who finds how the greylist script is beaten can figure out a fix and post it. Sounds like the best thing to do IMHO.

Soko

Re:your first mistake (1)

lovemayo (674154) | more than 11 years ago | (#6256536)

So, the spammers themselves will be of significant help in debugging and helping to fix the code so they can't circumvent it, won't they?
Yeah... If the spammers writes patches for the workarounds they find and submit them... Very likely that that will happen...

Re:your first mistake (2, Funny)

Horny Smurf (590916) | more than 11 years ago | (#6256548)

So, the spammers themselves will be of significant help in debugging and helping to fix the code so they can't circumvent it, won't they? OSS means anyone who finds how the greylist script is beaten can figure out a fix and post it. Sounds like the best thing to do IMHO. Soko

So, once the spammers find how to get around the greylist, they'll submit patches to the spam blocking software?

Re:your first mistake (5, Funny)

Schnapple (262314) | more than 11 years ago | (#6256277)

You have just rendered Greylisting pretty useless by making it open source.
You're assuming the spammers can read source code.

Re:your first mistake (1, Insightful)

Anonymous Coward | more than 11 years ago | (#6256316)

And you are assuming no-one is going to make something to curcumvent this protection and make it available to spammers.

Re:your first mistake (4, Funny)

L. VeGas (580015) | more than 11 years ago | (#6256313)

That's just my take on the issue, maybe I'm wrong but I doubt it.

That's what I like to see. Someone with strong opinions. Or maybe not.

I think not (5, Interesting)

Monoman (8745) | more than 11 years ago | (#6256356)

Doels this mean all public crypto algorithims are useless?

Re:I think not (1)

xWeston (577162) | more than 11 years ago | (#6256418)

This is a good example and definitely something that i was thinking of as well.

Just because something is open source (or in this case, open idea) doesnt mean that it is rendered useless. There are plenty of open source programs involving cryptography as well as other sensitive subjects that work perfectly well.

In this case as others have mentioned, it would require some fixes to patch up the things that spammers figure out quickly, but I do believe that in the end it will be a stronger system that if it were not "open idea."

As mentioned in other comments the hard part is that the people writing the spamming/gathering software arent stupid. They know what they are doing and make good money off of it.

Re:I think not (0)

frieked (187664) | more than 11 years ago | (#6256498)

No, we're not talking about crypto algorythms here, we are talking about email and you obviously fail to see the difference between the two.

You are right to point out that my logic may not work for crypto but it does work for this case. Reading crypto source would tell me I need the correct key to decrypt. A key which is rather difficult to crack.
To forge a legit email however is nowhere near as difficult and this source basically tells how to do it.

Re:your first mistake (4, Informative)

tomstdenis (446163) | more than 11 years ago | (#6256382)

You're missing a big part of it though. If you have to try say 3 times to send a message [over a 5 day period or so] you're ability to mass send 100million emails is really squashed.

Legitimate people first time sending won't really mind the few day wait and most MTAs will try for upto a month.

Tom

Re:your first mistake (5, Informative)

TheCarp (96830) | more than 11 years ago | (#6256411)

not at all

Read the paper. Spammers would figure it out eventually. What it buys is what they have to do to get around it.

It means they have to do retrys...that means spam runs take longer, especially since they have to run...then wait for a locally defined timeout, and run all those addresses again

AND they have to do it from the same IP.

This raises their bandwidth profile. It wastes their time... all in all... it raises their cost of doing buisness and cuts into their profit margins.

It means they will have to upgrade their tools again. It means they get headaches. And of course, the next step is to impliment spam traps that watch activity and see that a spammer is spamming, and promotes them to a blacklist before they can even retry. (oh gee 1000 new greylist triplets from 1 IP in under 5 mins? Set the timeouts for that IP to 12 hours)

-Steve

Re:your first mistake (0)

Anonymous Coward | more than 11 years ago | (#6256421)

But it still makes it slightly more expensive for spammers to operate. They will have to keep messages around to retry later.

If you combine this technique with the one where the mail server blocks for a few seconds for each receipt, you will make it even more expensive.

Re:your first mistake (2, Interesting)

JJAnon (180699) | more than 11 years ago | (#6256437)

I don't think the mistake has anything to do with it being open source. It could be closed source, and would still fail because the basic premise is so simple - it relies on spammers sending spam to your inbox and not bothering to resend it if an error code is returned. So all a spammer has to do is just resend the message a couple of times to get around the spam 'filter'.

Re:your first mistake (1)

WTFmonkey (652603) | more than 11 years ago | (#6256506)

Say along with me: "There's NO SUCH THING as blockbox security."

If they can break it by looking at source, it was already broken.

Questions (2, Insightful)

Traa (158207) | more than 11 years ago | (#6256187)

Some questions about this method:
  • It delays all incoming emails for a certain amount of time. Unfortunate side effect of the algorithm. Can anyone tell me what the average extra time is?
  • I am not convinced that most of the spam comes from specialized email applications that can be fooled with a temporarily failure. Can anyone provide numbers on this?
  • How does the algorithm adapt when aforementioned email applications adapt to 'greylisting'?
  • I see a lot of spam that was probably produced by applications that use an automated signup to yahoo/hotmail/etc. to obtain a temporary email address and leave the actual emailing to those services which will circumvent 'greylisting'.
  • How much of the total internet traffic is made up of email? What happends of we all install 'greylisting' filters and each email has to be resent several times? Is doubling/tripling the amount of email traffic going to be noticable?


I like the idea though. Since SMTP is broken anyway, why not use another of it's features in a new way to help filter unwanted email. Keep up the good work!

Re:Questions (3, Insightful)

sulli (195030) | more than 11 years ago | (#6256264)

1 hour is the time proposed. Completely unacceptable unless the whitelist works.

Since most personal users are on dialup or dynamic IPs, unless the mail client can upload the whitelist in a trusted fashion (or the MTA remembers what users the client sent messages to!), this won't work.

Do any mail clients include whitelist-collection? Mail.app for OS X does collect all addresses you've sent to, but I've never seen any tool to upload it somewhere.

Re:Questions (1, Interesting)

Anonymous Coward | more than 11 years ago | (#6256456)

From my reading, it sounded like it stored the ip address of the MTA. This would not affect dial up users more than any others.

Re:Questions (2, Informative)

sharlskdy (460886) | more than 11 years ago | (#6256366)

Retry is configurable, and it depends on the MTA. Qmail has a default retry of 400 seconds (6 minutes, 40 seconds).

Much of my e-mail comes through within seconds - I'm not sure I want that delayed too much. Although, this delay is on the first matching triplet.

Server disk space requirements for major providers would climb considerably, I would expect. Legitimate mass-mail programs, and mailing list services would have a problem, tho.

The algorithm takes advantage of the lazyness of spammers, which is not a bad idea.

can't believe their numbers (5, Informative)

sqrt529 (145430) | more than 11 years ago | (#6256194)

most spam today is sent through open relays. Those relays will simply retry the delivery no matter which software the spammer uses, so the method won't work.

Re:can't believe their numbers (5, Informative)

McDutchie (151611) | more than 11 years ago | (#6256312)

Eh, open relays are soooo 20th century. :) Actually most open relays today are either blocked or closed, and newly installed MTAs are secure against third-party relaying by default, so this spam method is dying out [it-analysis.com] . Most spam today is sent either directly to the receiving MTA, through open proxies, or through formmail.pl and similar exploits.

Re:can't believe their numbers (3, Interesting)

grub (11606) | more than 11 years ago | (#6256357)

Open proxies get most of my rejects, here's a paste from "spamstat" (a quick script I did that cron's me the output once a day). The logs rotated not quite 2 hours ago.
Open Relay: 1
Dialup Spam Source: 0
Confirmed Spam Source: 2
Smart Host: 0
Spamware Developer or Spamvertized site: 0
Unconfirmed Opt-In List Server: 0
Insecure formmail.cgi: 0
Open Proxy Server:8

Poor use of statistics (4, Insightful)

GGardner (97375) | more than 11 years ago | (#6256401)

The data in this article claims that 1% of all corporate mail servers in the UK allow open relaying, down from 91% in 1997. For all we know, the total number of corporate e-mail servers has grown by a factor of 100 (or more) in the last six year, meaning that perhaps there are more open relays now.

The article also doesn't measure the amount of spam coming through those relays. Even if there are only 10 open relays in the UK at any one time, it still might be possible for all of the spam to be coming through them.

Certainly, closing down open relays is a good thing, but lowering the percentage of open relays doesn't prove anything about the source of spam

Re:can't believe their numbers (1)

Slack0ff (590042) | more than 11 years ago | (#6256325)

I ask was this algorrithm tested? It seems as if he would not go through all this work without testing. Did he test direct spam sending? Because any decent spammer will not send e-mail direct. Bouncing off a few misconfigured e-mail servers is a much better tactic if you wish to me a "successful" spammer. Any thoughts?

Re:can't believe their numbers (0)

Anonymous Coward | more than 11 years ago | (#6256469)

Glock's EasyMail Pro and socks chains :D trace that :D (visual route also is handy) and use political countries like china, india, iran etc so its ahrder to get logs to the western world :D thats how i fake all my revenge mail tactics:D

Re:can't believe their numbers (0)

Anonymous Coward | more than 11 years ago | (#6256350)

Actually I use Glock's easymail pro via SOCKS chains. :D perfect for faking :D (aka revenge :D)

Re:can't believe their numbers (1)

gasp (128583) | more than 11 years ago | (#6256395)

What's your source of data for this? In my experience, my own MTA's ORB data indicates that the vast majority of spam my domains receive does _not_ come from open relays. My general reading on the subject in recent months gives me the impression that this isn't an anomaly, and that blocking open relays isn't significantly effective any longer in reducing spam.

Open Relays a smaller problem? Viruses instead? (2, Informative)

garyebickford (222422) | more than 11 years ago | (#6256490)

According to this article [theregister.co.uk] (June 12), open relays at least in the corporate environment are becoming hard to find, requiring spammers to find new ways. In 1997, 91% of mail servers tested were open; as of a year ago only 1%. ISP and home machines apparently weren't tested.

This doesn't really say what's actually being used by spammers, but it's a sign of improvement. At the least, it narrows the pool of available relays. Continuing progress will increase the spam pressure on those remaining, which will in turn make it more likely that they'll be fixed.

The article also doesn't say what spammers might use as an alternative. From something else I read recently (don't recall where), mail viruses that take over users' machines are rapidly becoming the tool of choice. There are a lot more of them than mail servers, so it makes sense for the spammers. It does put them in a more dangerous position WRT the law. IMHO (IANAL), using a virus to exploit someone's machine for profit is almost certainly illegal under existing law.

Re:can't believe their numbers (1)

anthony_dipierro (543308) | more than 11 years ago | (#6256535)

most spam today is sent through open relays.

Not most spam I receive. Then again, I have a filter against known open relays.

In case of /.'ing (4, Informative)

Anonymous Coward | more than 11 years ago | (#6256196)

The Next Step in the Spam Control War: Greylisting
By Evan Harris
Copyright 2003, all rights reserved.

Introduction
This paper proposes a new and currently very effective method of enhancing the abilities of mail systems to limit the amount of spam that they recieve and deliver to their users. For the purposes of this paper, we will call this new method "Greylisting". The reason for choosing this name should become obvious as we progress.

Greylisting has been designed from the start to satisfy certain criteria:

1. Have minimal impact on users
2. Limit spammers ability to circumvent the blocking
3. Require minimal maintenance at both the user and administrator level

User-level spam blocking, while somewhat effective has a few key drawbacks that make its use in the continuing spam war undesirable. A few of these are:

1. It provides no notice to the senders of legitimate email that is falsely identified as spam.
2. It places most of the costs of processing the spam on the receivers side rather than the spammers side.
3. It provides no real disincentive to spammers to stop wasting our time and resources.

As a result, Greylisting is designed to be implemented at the MTA level, where we can cause the spammers the most amount of grief.

For the purposes of evaluating and testing Greylisting, an example implementation has been written of a filter that runs at the MTA (Message Transfer Agent) level. The source for this example implementation is available as a link below, and as other implementations or additional utility code become available, they will also be linked.

Greylisting has been tested on a few small scale mail hosts (less than 100 users, though with a fairly diverse set of senders from all over the world, and volumes over 10,000 email attempts a day), however it is designed to be scalable, as well as low impact to both administrators and users, and should be acceptable for use on a wide range of systems, including those of very large scale. Of course, performance issues are very dependent on implementation details.

The Greylisting method proposed in this paper is a complimentary method to other existing and yet-to-be-designed spam control systems, and is not intended as a replacement for those other methods. In fact, it is expected that spammers will eventually try to minimise the effectiveness of this method of blocking, and Greylisting is designed to limit options available to the spammer when attempting to do so.

The great thing about Greylisting is that the only methods of circumventing it will only make other spam control techniques just that much more effective (primarily DNS and other methods of blacklisting based on IP address) even after this adaptation by the spammers has occurred.

The Greylisting Method
High Level Overview
Greylisting got it's name because it is kind of a cross between black- and white-listing, with mostly automatic maintenance. A key element of the Greylisting method is this automatic maintenance.

The Greylisting method is very simple. It only looks at three pieces of information (which we will refer to as a "triplet" from now on) about any particular mail delivery attempt:

1. The IP address of the host attempting the delivery
2. The envelope sender address
3. The envelope recipient address

From this, we now have a unique triplet for identifying a mail "relationship". With this data, we simply follow a basic rule, which is:

If we have never seen this triplet before, then refuse this delivery and any others that may come within a certain period of time with a temporary failure.

Since SMTP is considered an unreliable transport, the possibility of temporary failures is built into the core spec (see RFC 821). As such, any well behaved message transfer agent (MTA) should attempt retries if given an appropriate temporary failure code for a delivery attempt (see below for discussion of issues concerning non-conforming MTA's).

During the initial testing of Greylisting, it was observed that the vast majority of spam appears to be sent from applications designed specifically for spamming. These applications appear to adopt the "fire-and-forget" methodology. That is, they attempt to send the spam to one or several MX hosts for a domain, but then never attempt a true retry as a real MTA would. From our testing, this means that currently, based on a fairly conservative interpretation of testing data, we see effectiveness of over 95%, and that is with no legitimate mail ever being permanently blocked.

This blocking comes with a minimal price from the terms of local resources. Assuming the use of a local datastore for the triplet and other metadata, there is no required network traffic caused by Greylisting other than that associated with the connection itself. Since we are not checking the contents of the message at all there is very little processing overhead, unlike many other spam blocking methods.

There is one effect that could be seen as either a positive or negative. Since the Greylisting method delays acceptance of unknown mail, that will generate a little more work for the sending MTA of legitimate mail. The flip side is that it generates a lot more work and smarts for the spammer's systems, hopefully enough to make the costs of spamming higher, possibly even to the point of making spamming unprofitable for some of them.

The best part is that since we never permanently fail a message delivery, as long as the delivering MTA's are well behaved, we should never cause a legitimate mail to bounce. There should never be a false positive!

Implementation Specification
In order to implement the Greylisting method, we will use some form of database to hold a few pieces of information about a specific mail relationship that is keyed off of the triplet described above:

* The time that the triplet was first seen (record create time)
* The time that the blocking of this triplet will expire
* The time that the record itself will expire (for aging old records)
* The number of delivery attempts that have been blocked
* The number of emails we have sucessfully passed

(Note: There are some additional pieces of information that are stored and used in the example implementation, and they will be discussed later, but for now we will disregard them. Also, the number of email attempts blocked and passed is not strictly necessary, but will be shown to be useful in making the process work better.)

With this data, we have everything necessary for a fully functional Greylisting implementation.

The proper place in the SMTP session to perform our checks is as soon as possible in the mail session when we have all of the needed information available. To remind those who are not familiar with the low level details of an SMTP session, a normal command sequence would look something like:

-> HELO somedomain.com
<- 250 Hello somedomain.com
-> MAIL FROM: <sender@somedomain.com>
<- 250 2.1.0 Sender ok
-> RCPT TO: <recipient@otherdomain.com>
<- 250 2.1.5 Recipient ok
-> DATA
<- 354 Enter mail
...
<- 250 2.0.0 Message accepted for delivery

This means, in order to minimize the network traffic required when a mail delivery may be rejected we should perform our checks as soon after the sending MTA has given us all the required information, which is to say, immediately after the RCPT command is received.

In the case where we would temporarily fail a particular delivery attempt, the mail transaction would look similar to this:

-> MAIL FROM: <sender@somedomain.com>
<- 250 2.1.0 Sender ok
-> RCPT TO: <recipient@otherdomain.com>
<- 451 4.7.1 Please try again later

One additional feature which has not yet been mentioned is the provision for some method to allow manual whitelisting of relays, recipients, and possibly even senders.

This manual whitelisting capability is not strictly necessary, but for several reasons, a minimum implentation pretty much requires at least manual whitelisting based on IP address for things like localhost, or primary/backup MX hosts for the domains being handled. Since those relays are presumably smart enough to retry, and should never be blocked anyway, there is little point to delaying mail delivery attempts from them.

Likewise, whitelisting recipients (or recipient domains) may be useful in an ISP or similar setting, where particular customers wish to exempt their domains from the possible mail delivery delays that Greylisting may cause.

Whitelisting based on sender address (or sender domain), while easily implemented, is discouraged. The reasons for this are that in most cases, whitelisting the IP addresses of the mail hosts that send for a particular domain is a much better solution because it is much more difficult to forge the IP address than the sending email address. Also, in most cases, domains or emails that would be likely to be whitelisted would also be very easily guessed or discovered, and spammers could take advantage of that to bypass the Greylisting blocks.

Whether these manual whitelisting entries are stored in the database, or are hardcoded into the application does not matter from the standpoint of Greylisting. But of course, an implementation that allows them to be easily updated is preferable.

The specific methodology for a fairly basic Greylisting implementation is as follows:

1. Check if the sending relay (or network) is whitelisted, and if so, pass the mail.
2. Check if the envelope recipient (or domain) is whitelisted, and if so, pass the mail.
3. Check if we have seen this email triplet before.
1. If we have not seen it, create a record describing it and return a tempfail to the sending MTA.
2. If we have seen it, and the block is not expired, return a tempfail to the sending MTA.
3. If we have seen it, and the block has expired, then pass the email.
4. If the delivery attempt should be passed and the delivery is successful:
1. Increment the passed count on the matching row.
2. Reset the expiration time of the record to be the standard lifetime past the current time.
5. If the delivery attempt has been temporarily failed:
1. Increment the failed count on the matching row.
2. If the sender is the special case of the null sender, do not return a failure after RCPT, instead wait until after the DATA phase.

(Note: For all checks, we ignore records whose lifetime has expired)

Issues Affecting The Proposed Implementation
There are a few issues that were found to be prevalent enough "in the wild" to make it necessary to slightly modify methods in the basic approach.

One issue is that some MTA software (Exim for example) attempts to limit the problem of forged sender addresses by attempting to verify that the claimed sender of an email is a valid address by doing an SMTP callback before accepting mail. Since it is desired to minimize the traffic when a mail may be rejected temporarily, the best course of action would be to issue a tempfail after the RCPT command. However, in the case of a SMTP callback, doing so at that point may cause our outgoing mail to be delayed unnecessarily.

Luckily, the mailers that do this use a sender address of the null sender "<>" to perform this check. This makes it fairly simple to workaround, since we can make a modification to the handling process so that in the special case of the null sender, we delay returning a temporary failure until after the DATA phase of a mail transaction. Since SMTP callbacks abort their test delivery attempt before getting to the data phase, the SMTP callback will succeed, and the outgoing mail should be accepted with no delay.

Another issue occurs when a large organization uses a pool of outbound mail servers for sending email to a system using Greylisting. If the pool is configured so that the same mailserver (with the same IP) will always retry deliveries for a particular mail, there is no issue.

But if that pool of mail servers happens to be configured in such a way that subsequent delivery attempts for a particular mail may be made from any one of several sending MTAs, then we have a possibility where legitimate mail deliveries may take significantly longer than expected. The possible maximum delay is dependant on the number of MTAs in the sending pool, and if the distribution of the retry attempts is random or deterministic. In a worst-case scenario, it is even possible that mail may be delayed long enough to cause it to bounce.

Other than adding a manual entry for networks of this type, one proposed method of dealing with this issue is to perform the IP address checks of the sending relay based on the subnet they are at rather than the specific IP. Since most of the sites that do this have most or all of their email servers on the same /24 subnet, this method works well in avoiding this issue without requiring manual intervention, at the expense of making it a little easier for spammers to circumvent the system.

One other potential issue is with mailing lists that use unique envelope sender addresses for mail sent to an end user, which is useful in order to better track bounces, since the formatting of bounces is not codified, and it is fairly common for mailers to return bounces that are formatted in such a way that it is very difficult, or even impossible, to programmatically discover which address caused the bounce.

This method of handling bounces is called VERP for Variable Envelope Return Paths, and one method of doing this is detailed here. Luckily, most mailing lists do this in a way similar to that described in that document, which is to use the same unique envelope sender for every mail sent to a particular recipient.

However, some mailing lists also try to track bounces to individual mails, rather than just individual recipients, which creates a variation on the VERP method where each email has it's own unique envelope sender. Since the automatic whitelisting that is built into Greylisting depends on the envelope addresses for subsequent emails being the same, this will cause each email sent to be delayed, rather than just the first email.

There is a simple workaround, which is to manually whitelist any hosts that deliver this sort of traffic. But luckily, even without manual whitelist entries, the impact is not that significant since mailing lists are usually not that timely in their delivery anyway, and the delay will generally not be very significant for most users.

Basic Configuration Parameters
In the spirit of giving the mail system administrators who choose to implement Greylisting as much choice as possible, there are several options which should be easily modified in order to tune the behavior of the Greylisting method on a per-case basis. Below, we detail these options, and some details to keep in mind if it is deemed necessary to change them from the default suggested values.

As a matter of fact, it may be desirable to vary these settings from installation to installation, since it will help keep the spammers guessing.

Initial delay of a previously unknown triplet: 1 Hour
Lifetime of triplets that have not yet allowed a mail to pass: 4 Hours
Lifetime of auto-whitelisted triplets that have allowed mail to pass: 36 Days

The initial delay of 1 hour was picked for several reasons:

1. An hour is short enough that in most cases, users will not notice the delay.
2. It is long enough to give time for administrators on a possibly compromised or abused mail server to discover the problem and hopefully correct it, before any of the offending email is able to be delivered.
3. It is long enough to provide a good chance that if the sending host is in fact a spammer, they will be listed in other IP-based blacklists that may be used in conjunction with Greylisting, so that even if a spamming relay later attempts a redelivery that would no longer be delayed by Greylisting, it may still be blocked by other methods.
4. It is also long enough that other types of traffic analysis could be designed and implemented such that spamming IP's could be easily identified and blocked by other methods, in such a way that even the first recipients (before a spamming pattern starts to emerge) would still not be bothered by the spam email.

The data collected during testing showed that more than 99% of the mail that was blocked with the tested setting of 1 hour would still have been blocked with a delay setting of only 1 minute. However, it is expected that as spammers become aware of this blocking method, they will change their software to retry failed deliveries. At that point, having a larger initial delay will definitely help, as it gives time for other blocking methods to act. For this reason, it is suggested that at least a one hour delay value be kept as a default, since spammers will start adapting as soon as this method becomes known and starts being used.

It is important to keep this delay smaller than a value where a significant number of MTA's will give up and bounce the message. Luckily, most MTA's have failure timeouts of several days. However, there are some special cases like certain financial institutions who want to know that it wasn't delivered in a fairly short period of time. Even in these special cases, the timeouts should be at least a few hours.

It is likely that some form of traffic analysis will be developed using the data from a Greylisting database in order to automatically identify the IP addresses of hosts that are attempting to deliver spam.

While this sort of functionality is not currently included in the example implementation, I would be very interested in seeing this come about, since spammer patterns were usually very identifiable after a few minutes, mainly due to many nearly simultaneous delivery attempts to a large number of different recipients from ethe same IP address or group of IP addresses, from which no (or very little) previous traffic had ever been observed. (If the organizers or maintainers of any of the DNS blacklists are interested in creating an automatic way of using this data to help update their lists, please contact me.)

Unfortunately, pattern analysis requires a fairly high level of traffic to be useful and accurate, so smaller systems will probably not help much unless the pattern analysis is distributed, which is difficult when you can't necessarily trust other potential collaborators.

The 4 hour initial life of records was picked because:

1. Almost all legitimate mail servers have a retry time that is less than this.
2. Having a small lifetime helps limit the number of relevant records that may have to be considered and maintained for very busy sites that may have enormous amounts of mail traffic and hundreds or thousands of queries a second. Small values for this are increasingly important as the spam problem grows, since each unique spam triplet will generate a record.
3. It was desired to keep the time window fairly small to limit when a possible spam might get through because a spammer may try to resend the message to their entire delivery list.

(Note that in the example implementation, this 4 hour limit includes the initial 1 hour delay, which means the effective window when an email will be accepted is 3 hours.)

There is another reason why this delay should be a small as possible. If a spammer discovers and uses a poorly maintained relay host, hopefully it will bog that relay down enough so that it gets very slow. That increases the possibility that the relay will be slowed down enough that it won't be able to process the queue fast enough for the spam to get through within this time window.

The lifetime limit of whitelisted records is updated every time an email is successfully passed, and was chosen to be 36 days to:

1. Help keep the database a manageable size by allowing entries for obsolete senders, recipients or relays to be aged off gracefully.
2. Make sure that records live long enough to avoid delaying subsequent mailings that may only come once a month (i.e. monthly mailing list notifications). Also, to live long enough for monthly mailings that may be sent only on a particular day of the week (for example, the first Monday of the months June and July in 2003 are 35 days apart).

Analysis of Effectiveness
Based on testing with the example implementation, over a testing period of about 6 weeks, we had raw numbers of:

* Unique triplets seen: 346968
* Unique triplets that passed email: 8950
* Effectiveness (based on triplets): 97.4%

So we have a better than 97 percent efficiency assuming that all email is spam, but it's actually better than that, since most of the email that got through was not spam. Unfortunately, telling exactly how much better we did is impossible without individually inspecting each email, which of course we did not do.

Now lets look at our inefficiency:

* Total emails passed: 85745
* Total deliveries deferred where email was eventually passed: 33586
* Percentage of emails delayed: 39.2%

Unfortunately, this is a pretty poor number. But let's correct it a bit. Almost all of these delayed emails were mailing list traffic which used a unique id for the sender address (see above note regarding VERP). So if we disregard all triplets that passed only one email, we should exclude that type of traffic, and we get a new set of numbers:

* Total emails passed: 85745
* Total deliveries deferred where more than one email was eventually passed: 3512
* Percentage of emails delayed (adjusted): 4.1%

This puts things in a much more favorable light, and merely disregards delays for emails that are generally not timely anyway.

Now let's see what effect greylisting would have on network bandwidth, based on some general averages.

* Average size of spam emails: 5000 bytes
* Average SMTP delivery attempt overhead: 500 bytes

These numbers are based on spam collected via various methods before the testing period. We picked these as nice round numbers that are pretty closely in line with analysis of previously seen spam. As for the SMTP overhead, in most cases it was less than 500 bytes, but we decided to err on the conservative side.

From this, it follows that for every spam blocked using Greylisting, we save enough bandwidth to "pay" for 10 deferred delivery attempts. If we total that up to give a real-world number (using the unadjusted numbers to give a worst case picture):

338018 (# spams) x 5000 bytes = 1.69 Gbytes of bandwidth saved
33586 (# blocks) x 500 bytes = 16.7 Mbytes of bandwidth wasted

This gives us a net gain of over 1.67 Gbytes of traffic that was saved by implementing Greylisting in our tests. And that's just on a fairly small site.

Suggestions for more effective protection of email domains
Greylisting will not be nearly as effective against spam unless ALL of the MX hosts for a particular domain use mail software that incorporates it.

A fair number of spamming software packages are already smart enough to retry delivery to other MX hosts for a domain if delivery through one MX fails. Since presumably all MX hosts will be whitelisted for each other (what is the point to delaying acceptance of email from a host that you know is a real MTA that will retry?) if the spammers can deliver to one of the MX's without a delay, then you have no more protection than you did before.

In addition, Greylisting, while already having a fairly minimal negative impact, can be made less intrusive if all of the MX hosts use a common database for tracking delivery attempts. To illustrate this, lets take an example where we have several hosts listed as mail exchangers for a domain, with seperate Greylisting databases.

A legitimate sending relay with a retry time of an hour attempts to deliver to one of the listed MX hosts. This host has never seen this triplet before, and so it generates a record in its own Greylisting database for the triplet, and refuses to accept the mail. An hour passes, and the sending MTA knows that the last attempt to deliver failed, so it decides not to retry delivery to the same MX host, and so it picks a different one and tries to deliver to it. This new MX host it picked is using a seperate database, and it does not know about the past attempt, and since it has not seen the triplet, it generates a new record in its own database for it, and refuses delivery again.

From this example, it can be seen fairly easily that there is the possibility that the delay in delivery of a legitimate piece of mail may get significantly longer than expected if there are enough MX hosts in the mix, even to the point that the sending server may give up and bounce the mail.

To avoid this possible problem, it is STRONGLY suggested that when there is a case of multiple MX hosts for a domain, they should all use a common database for tracking the mail triplets. There may be cases when the MX hosts are too widely seperated (network-wise) to be able to do this efficiently and robustly, but even in those cases it is possible that Greylisting will still be useful enough that this example worst-case scenario can tolerated or worked around to minimize the impact.

Common spammer attack methods
This section details a few of the most prevalent spammer attack methods that were observed during the testing period, and how the Greylisting system deals with them.
Method 1: The non-primary MX attack
A significant number of spam emails specifically target non-primary MX hosts for domains, for the simple reason that backup MX servers will usually accept and relay all of the spam to the primary MX host without checking it, which reduces the load on the spammers system, requires little or no additional processing for mails that are rejected, and usually results in faster delivery transactions because the recieving system has to do less work (in the short term while the attack is occuring).

Greylisting handles this attack very well, since the whole point of the attack is to minimize bounces and delivery delays.

Method 2: The spam troll/Dictionary attack
Many spammers are now resorting to "trolling", that is, sending spams to common usernames (tom@, harry@) at domains (also known as a dictionary attack), or sending to generated usernames made from real names harvested from other sources. They usually seem to be operating from a dictionary of common user names, but the "generated" usernames tactic may be getting more common.

The spammers probably use this method in order to reach people who have either taken steps to try to keep their email address from being harvestable from the web, or who are fairly novice users that may not have the resources or inclination to create their own web pages. Probably the latter case, since novice users are probably more likely to purchase something that has been advertised through spam.

This type of attack is very often combined with the non-primary MX attack, since most of these emails will result in bounces on domains that don't have a fairly large user population. Consequently, the spammers target the backup MX hosts. That way, they don't have to handle all the bounces and failures that these messages generate.

Greylisting handles these very well, since they almost always come from random short-lived dynamic IP addresses. And because most of these emails will ultimately generate bounces, it is costly for spammers to attempt redelivery of this type of attack. Also, since this attack is so distinctive (A high number of bounces generated in a short period of time from a particular IP address or set of addresses), it should be very easy to recognize and add to other blacklisting methods if given enough time to do so, which Greylisting provides.

Method 3: The organized distributed attack
Many spammer attacks seem to come in a pattern that looks very much like a moderated DDOS (Distributed Denial Of Service), lets call this type of spamming an "Organized Distributed Spammer Attack" (ODSA).

On the systems where spammer methods were evaluated, it was observed to be fairly common that there were spam delivery attempts that happened in a fairly short window of time, where the SMTP connections were originating from many different and seemingly unrelated IP addresses. Yet all of the envelope sender addresses were the same or similar, and the envelope recipient addresses were fairly sequential.

Obviously, Greylisting (as defined here) currently handles these attacks extremely well. However, if (when) the spammers adapt and learn to retry the delivery attempts, it may not be as effective by itself.

That being said, it is quite possible to adapt the Greylisting method to help thwart the described workaround. For example, at the cost of a little additional processing, it should be fairly simple to look at delivery attempts that have happened in a fairly recent time period, and after the first few attempts have been seen, submit all of the relays exhibiting this behavior to various blacklists as probable spam sites.

Method 4: The web proxy attack
A significant portion of spam seems to come from relays that appear to be CacheFlow Server or other types of proxies. These can usually be identified by returning "CacheFlowServer" to an ident probe.

Greylisting will block these particular attacks completely, since those servers are not "real" MTA's, and will never retry.

Possible methods of spammer adaptation
Greylisting as proposed is fairly immune to possible routes of adaptation by spammers to get around the blocking. The possible methods of adaptation may make Greylisting by itself less effective, but the ways of getting around it will only make other spamblocking methods more effective.

The normal spammer behavior is to change IP's when normal IP blacklists have listed their current IP. Unfortunately for the spammers, changing their IP does not help with our delaying method, as every mail (and it's delay) is tied to the IP address of the sending relay. If the IP address changes, it effectively "resets" the timer on the delay, even if the envelope sender and recipient addresses stay exactly the same.

The other adaptation that is expected will result in the current versions of client spam software becoming obsolete, since most of those spamming applications are not intelligent enough to retry a delivery after getting any type of error. Spammers will be required to either use more intelligent software that retries, or to relay through smart relays.

We may see spammers gravitate toward using open third party relays, but most of them are already locked down or are quickly becoming so. Or, they may setup their own relays. In either case, it does nothing to negate the likelihood that those relays are or will quickly become listed in blacklists, thereby reducing their effectiveness for sending spam.

If spammers setup their own relays, the fact that email transmissions are delayed and that they may each take several attempts to deliver, only increases the storage and bandwidth requirements on the spammers side, which also raises the costs to the spammer. And if we can make it less profitable, then we are well on the way to solving the spam problem.

Implementation Caveats
The delaying tactic that is the core of Greylisting may cause undesired delays if the host it is running on allows clients that will be using regularly changing IP's to relay mail through it. For example, if clients on non-local networks are allowed to relay through the server after doing a POP or IMAP auth, this implementation does not handle allowing these clients to deliver their mail for forwarding without incurring a probably undesired delay.

Workarounds for this issue exist, but are not implemented in the example code. Essentially all that is necessary to allow this without incurring a delay penalty is to simply insert a short-lived record into the Greylisting database at the same time that authorized relaying is enabled, which allows that originating IP address the ability to send mail for some small but sufficient amount of time.

Reception of mails from legitimate hosts that either do not pay attention to the temporary failure nature of the rejections, or never attempt any retries will be adversely affected by this system. Hopefully, any mailers that have these problems will be quickly fixed once Greylisting has been implemented at a significant number of sites.

Unfortunately, a few isolated systems with these issues have been discovered during testing. The affected systems either do a poor job of following the SMTP spec, or are outright violating it. Since SMTP is by nature an unreliable transport method, systems that do not retry deliveries are poorly advised and need to be fixed.

An SMTP session log generated by one specific example of a non-compliant MTA follows:

-> HELO somedomain.com
<- 250 Hello
-> MAIL FROM: <sender@somedomain.com>
<- 250 2.1.0 Sender ok
-> RCPT TO: <recipient@otherdomain.com>
<- 451 4.7.1 Please try again later
-> DATA
<- 551 No valid recipients

From this, it is fairly obvious that the sending MTA did not check the status from the RCPT command, and continued on to issue DATA, which caused a permanent failure code to be issued, which is not a valid step when no recipients addresses have been accepted. In the case of this particular mailer, it did pay attention to the later 551 error code, which is considered a "permanent" failure code. This caused the message to be bounced back to the sender. But that is incorrect behavior because it failed to observe the earlier "temporary" failure and abort the transaction at that point.

An Example Implementation
The provided example implementation (available here) is a perl based milter for Sendmail, using version 0.18 of the Sendmail::Milter interface (also available from CPAN) and has been tested with Sendmail 8.12.9, though it should work with all versions of sendmail after 8.12.5. Sendmail::Milter requires a threaded perl installation and was tested with perl 5.8.0 (available from perl.org or from CPAN).

Also available are database definitions used for this implementation, and a sample configuration file. Since the implementation is in perl, it is easily modifiable. Not available on CPAN (yet...).

The database used was Mysql 3.23.54, though it should work with any later version, and most likely will work with earlier versions as well. In addition, the test systems were also using amavisd-new with the amavisd-new-milter interface, which was configured to do additional spamblocking with the help of Spamassassin 2.53.

In the interests of keeping the example implementation simple and easy to understand, some features that could easily be optimized have been left in their unoptimized state. Even so, during testing under heavy spam loads, the added time for the checks was unnoticeable in most cases, and in the remaining cases, the cause was due to network delays accessing the database (which was remotely hosted).

One detail of the implementation will probably strike horror in the hearts of diehard "structured" programmers. In several places, goto is used. Because if the way that the milter interface works, this seemed more straightforward than other methods.

Other details on the example implementation
Successful mails that have an envelope sender of the null sender are considered a special case where we will expire the record immediately in order to avoid whitelisting it, once we allow the mail to go through. Mails from the null sender are (according to RFC 821) only to be used for special administrative mails like bounces. Consequently, they are almost never used for more than one legitimate email. For that reason, there is no need to maintain them any longer once an email has been passed.

Unfortunately, many spammers are misusing this sender address because it generally won't generate a bounce from the recipient server (there's no point in generating a bounce message for a mail that is already a bounce). Expiring these records immediately helps limit the possibility that spammers using this sender address incorrectly can send multiple spams to the same recipient in a small time frame.

In addition, there are several other small features incorporated into the example implementation that are not part of the Greylisting system itself, but are attempts at enhancing or refining the general purpose of spam blocking.

The database layout used is not normalized. This was a conscious choice so that people who may not be that familiar with database design could more easily understand it. However, reworking the database implementation to normalize it should be fairly trivial.

One thing that is not incorporated is any kind of database maintenance. There is no provided method of inserting manual whitelisting entries other than the example sql statements in the above dbdef.sql file. I expect that eventually a nice web cgi for maintaining the database will be written, but haven't had time to create one yet. Or maybe someone will create one and share it.

Links to Example Source and Information
To contact me about this project, please send email to eharris@puremagic.com

All of the example implementation files and information should be here. If there's something missing, let me know.

More to come...

Credits
If you have a Paypal account and found this idea, project or code useful, please make a donation so that we can have more time for development. Plus, it helps buy beer!
Thanks to the following people for providing valuable feedback and trying to shoot my theories full of holes:

* Brad Roberts
* Brian Michalk
* Corey Huinker
* Bob Apthorpe

Thanks to Paul Graham, whose paper on Bayesian filtering caused a revolution in the spam fighting world, and was also a bit of inspiration that helped to get me thinking about other ways to solve the problem.

Thanks to Charles Ying for developing Sendmail::Milter which made this project a lot easier to test, a whole slew of other developers for producing Sendmail, Mysql, Perl, Amavis, and Spamassassin, and all the administrators and people who help maintain various blacklists and other anti-spam software and databases that have helped in keeping email from becoming useless.

Hosting by Insomni Web Development Solutions

Re:In case of /.'ing (1)

deadsaijinx* (637410) | more than 11 years ago | (#6256503)

written with a type writer or something?

Tempfailing is not new and unique (5, Informative)

HiKarma (531392) | more than 11 years ago | (#6256198)

This idea isn't so new or unique. It's been discussed a fair bit on the ASRG [ietf.org] mailing list under the name "tempfailing".

First I heard of it was from Landon Noll and Mel Pleasant. It is noted in brief as one of the techniques in this plan to end spam [templetons.com] (though their plan, which did include the triplets, is not laid out in full there.)

It is a worthwhile technique for a little while, and if spammers were rational, would be worthwhile for some time to come. But spammers are not rational, and already this technique is not as useful as would be hoped.

Do a Google Search for Tempfailing [google.com] especially in ASRG to see statistics etc.

we are the robots (-1, Offtopic)

Anonymous Coward | more than 11 years ago | (#6256199)

twelfth post!

I am not sure what the spam filter is (1)

notque (636838) | more than 11 years ago | (#6256206)

Instead of filtering out email completely, we just add [spam] to the begining on anything that is potentially spam, have it forwarded to a folder, and go through it once a week. In 3 years of using it, I've only had 1 message that was accidently called spam. And I didn't care if i recieved it or not anyway.

Re:I am not sure what the spam filter is (1)

selfabuse (681350) | more than 11 years ago | (#6256455)

sounds like SpamAssassin [spamassassin.org] I work at an ISP, and we have it filtering incoming mail for several thousand people, and haven't hit any kind of problem that wasn't very easily fixable

Re:I am not sure what the spam filter is (1)

notque (636838) | more than 11 years ago | (#6256515)

Might be, but you are right. Any problem is easily fixable.

I don't understand what everyone's problem is regarding spam. It is a nonissue for me.

Maybe SpamAssassin is just so good that I don't notice how annoyed I might be otherwise.

Sarxpam (0, Troll)

Anonymous Coward | more than 11 years ago | (#6256212)

What about sarxpam? Is there any solution to the sarxpam (unsolicited and unwanted sexually-oriented E-mail) epidemic? I'm always receiving offers to enlarge certain parts of my anatomy, watch young teens perform disgusting acts, etc.etc.

Short of changing my email address, is there any way I can stop them?

Works for anything (0)

phorm (591458) | more than 11 years ago | (#6256281)

It should work for most methods of spam. If you're Joe average using Outlook (gasp), then you can even filter with it.

Tools->Message Rules->Mail
Where the Subject line contains specific words (add words, "enlarge", "penis") - That takes care of those
Add another "horse", "cum" (or something else common) byebye.

Make the default action to move such messages to "trash" or a "spam" folder.
You can also filter by message body...

Easy way to stop spam... (3, Informative)

Anonymous Coward | more than 11 years ago | (#6256216)

Just encode your e-mail address on web pages & don't sign up to any dubious mailing lists.

I haven't received 1 single spam in recent months from doing this!

Re:Easy way to stop spam... (0)

Anonymous Coward | more than 11 years ago | (#6256337)

I haven't received any spam in 3 years (except for ones that I apparently signed up for during my heavy drug using days) (ie. Freelotto).

Don't publically display it anywhere and you are safe.

Re:Easy way to stop spam... (2, Informative)

seangw (454819) | more than 11 years ago | (#6256341)

It isn't always possible to never publish your email address.

You can, however, establish classes of emails. Most people don't like this however, because you have to check multiple accounts, and it really doesn't stop the spam.

In order to sign up to certain services / sites you need to provide a valid email address.

While that email address can be a secondary email, if anything important is going to come in on the email (such as domain information via network solutions) you will still want to use your real email address.

It's a very difficult issue.

Easy for end-users, sure. (5, Insightful)

Medievalist (16032) | more than 11 years ago | (#6256402)

Just encode your e-mail address on web pages & don't sign up to any dubious mailing lists.
Many of us must maintain contact addresses in the global whois database - so that people can contact us when something is broken.

Look at it this way: you can stop crank calls by unlisting your phone numbers. But you can't unlist the hospital, the ambulance service, the fire department, etc.

We're not all end-users. Some of us are the plumbers.

clever hack for WHOIS contact addresses (5, Interesting)

phr1 (211689) | more than 11 years ago | (#6256549)

The registrar I use (jumpdomain.com) has a clever hack for despamming WHOIS contact email. Basically they change your published contact address once a week. The published address i automatically generated, looks like gibberish, and forwards to your real address. If someone wants to contact you by looking up your address by WHOIS and writing to you, it works fine. But if they add the address to a mailing list, it stops working in a week. That has eliminated almost all my WHOIS spam. Good scheme.

Re:Easy for end-users, sure. (0)

Anonymous Coward | more than 11 years ago | (#6256564)

We're not all end-users. Some of us are the plumbers.


And therefore have to deal with the crap?

I'm just wondering... (-1, Offtopic)

Anonymous Coward | more than 11 years ago | (#6256227)

...is anyone else looking forward to watching Jennifer Connolly's breasts on the big screen this weekend?

ive done better (0, Funny)

Anonymous Coward | more than 11 years ago | (#6256231)

ive invented pinklisting. i now just get all the good gay spam.

1 false positive is not acceptable. (3, Insightful)

Pop n' Fresh (411094) | more than 11 years ago | (#6256234)

This isn't very reassuring:

"it practically eliminates the main problem of other solutions: the false-positive."

What does 'practically eliminates' mean? If it gives false positives at all, it is just as useless as all those 'other solutions'.

Re:1 false positive is not acceptable. (5, Interesting)

pclminion (145572) | more than 11 years ago | (#6256426)

Wrong. 1 false positive can be acceptable, and in fact is probably better than how things are now.

At USENIX '03 there was a paper presented on artificial intelligence techniques for spam detection. I can't provide a link since only USENIX members can download the paper (at this point, at least). I was a coauthor of that paper.

One of the things we've discovered in our research is that some classes of filters (most notably, the one I have been developing along with a few other individuals) are actually more effective at correctly classifying email than humans are. That is to say, you can train the learning algorithm on mostly-correctly-classified data, then re-run it over the training data, and almost miraculously, it discovers all kinds of email in the training set that was incorrectly classified.

I.e., this filter has discovered mail that I myself incorrectly thought was spam. It's scary, because there's a lot of it.

To assume that a human will always be 100% accurate at classifying their own email isn't just arrogant, it's plain wrong. Newer filters that will be introduced in the near future might possibly be more accurate than you, a frail human, could ever be.

Re:1 false positive is not acceptable. (1)

JoelClark (150479) | more than 11 years ago | (#6256434)

The only reason it wouldn't allow an legitimate e-mail though is because the MTA sending it doesn't comform to the SMTP spec. That's the "practically".

If you think there is a single perfect solution, you're foolish at best.

Time critical (5, Insightful)

Synithium (515777) | more than 11 years ago | (#6256235)

Time critical mailing will go out the window. I can see how this might make any corporate user irate. The same thing goes for challenge-response, the time delay in the business world is unacceptable.

This would be great for personal mail, but that's about it. ISPs would have the same problems with it because their business-class users most likely use the same servers as their consumer-class users.

Re:Time critical (4, Informative)

eGabriel (5707) | more than 11 years ago | (#6256319)

This isn't true, actually. Once one mail gets through, the system lets in subsequent mails from that sender. So there is only the initial delay, after that CEO Joe can use his email as a fat instant messenger per usual.

Re:Time critical (0)

notque (636838) | more than 11 years ago | (#6256365)

Time critical mailing will go out the window.

Time critical mailing will NEVER go out the window.

This spam solution will never work because of it. If I don't respond to an email within 10 minutes, I get a call asking if I recieved it.

Has anyone suggested tracking down and hunting spammers for sport?

Re:Time critical (1)

rossjudson (97786) | more than 11 years ago | (#6256492)

Great. Then you can set up YOUR server to let everything right in. It's your option to get plenty of spam! Maybe the rest of us will use a delay, though.

Re:Time critical (1)

McDutchie (151611) | more than 11 years ago | (#6256375)

Time critical mailing will go out the window.
That would be mostly fixed by only imposing the delay on mail received from networks listed on blocklists such as the SBL [spamhaus.org] or SPEWS [spews.org] . Blocklists are just databases of IP ranges, they can be used for non-blocking purposes. Hopefully most of your business contacts use decent ISPs that don't harbor spammers (and if not, the delay would be a nice incentive for them to switch to a decent ISP that is friendlier than outright blocking).

Re:Time critical (1)

ketamine-bp (586203) | more than 11 years ago | (#6256470)

(and if not, the delay would be a nice incentive for them to switch to a decent ISP that is friendlier than outright blocking)

True, If not that the delay would be a nice reason for the company to lay off the administrator implementing the rules, for a even cheaper administrator.

Re:Time critical (2, Insightful)

SuiteSisterMary (123932) | more than 11 years ago | (#6256428)

Besides, if you're using SMTP for time-critical things, you have a problem, as SMTP is NOT a guarenteed delivery system.

The BEST way to fight spam? (-1)

Anonymous Coward | more than 11 years ago | (#6256242)

Stop using email! Who needs it!

spam.....hrmmm (5, Insightful)

chef_raekwon (411401) | more than 11 years ago | (#6256269)

with all of these solutions to spam..and all of the spam now flooding mail servers...

isn't it time to change the specification (RFC) and possibly the manner in which our current system works? i haven't come up with anything yet, but surely there must be some sort of handshaking/secure type connection that could be used - - some sort of postage (free) that is encrypted into the mail, that states that it is genuine....kind of like the hologram on those windows cds...

i dunno. file this story under redundant.

Re:spam.....hrmmm (1)

DaemonGem (557674) | more than 11 years ago | (#6256369)

Tell you what, we'll make it +4 Redundant.

-Dae

How about Habeas' haiku method? (3, Interesting)

siskbc (598067) | more than 11 years ago | (#6256372)

The best idea I've seen in YEARS was to have people start using a specific, original poem as their signatures. Then, the author granted license to anyone who WASN'T sending spam. Therefore, they could sue any spammer for copyright infringement if they used it, and you could train your mail filter to look for the signature. Once spamassassin took it up, it pretty much snowballed. See story here [wired.com]

Re:spam.....hrmmm (1)

cmburns69 (169686) | more than 11 years ago | (#6256417)

The problem with any postage is that it can be forged. If you have some sort of public/private key, though, then only the people you give your key to would be able to email you... Now that I've started thinking, that might be a really good way to allow authorized email...

It wouldn't an end all solution. I envision multiple strategies working in tandem, such as the above public/private key, a registered email sender verification service (live Verisign.. *shudder*), and even completely unsecure (for those people who want it).

Basically, I don't think there is any single strategy that will stop spam. But judicious use of technology could help a great deal!

I'm not sure about this... (3, Insightful)

BiteMeFanboy (680905) | more than 11 years ago | (#6256272)

These applications appear to adopt the "fire-and-forget" methodology

I thought it was generally understood that most spam was sent by abusing open relays, thus hiding it's origin. This could be wrong. However if it's not, those figures aren't appllicable. Nor is spam going to be diverted since an open relay is generally running a regular mta and will attempt a retry. For instance, if qmail were running on an open relay and was abused by a spammer it would try again and again with an increasing delay (calculated logarithmically if memory serves) between attempts. So the mail will still get through.

When you further consider that if a spammer hits an open relay and hammers your mailserver from it and all of the "triplet's" are new, you're increasing your traffic, because all of that mail will be attempted again.

KERNEL 2.6 RELEASED (-1)

Anonymous Coward | more than 11 years ago | (#6256279)

Does anyone know where i can get some free info on cheap viagra or refinancing my house?

Bayesian Filtering (3, Interesting)

Dr Rick (588459) | more than 11 years ago | (#6256297)

I'm finding that use of the Outclass interface to POPfile is surprisingly effective at dealing with my spam problem (and I get a lot of it) - since training POPfile I haven't had a single spam message get into my inbox no false positives. Of course I could just be very, very lucky and with this post the email gods will punish me...

How does the effectiveness of Greylisting compare with what others are seeing with existing techniques (such as Bayesian filtering)? Is it a false positives problem, such as digests and opt-in mailing lists getting incorrectly tagged as spam?

Re:Bayesian Filtering (1, Informative)

Anonymous Coward | more than 11 years ago | (#6256431)

Been using the Bayesian filering in ASSP
http://sourceforge.net/projects/assp/
With a week of "training" it I now have most excellent results.

As of Fri Jun 20 13:56:18 2003 the mail logfile shows:
4402 messages, 2850 were spam (64.7%) in 24 days
for 183.4 messages per day or 118.8 spams per day
431 additions to / verifications of the whitelist (18.0 per day)
2541 were judged spam by the bayesian filter (89.2% of spam)
279 were to spam addresses (9.8% of spam)
30 were rejected for executable attachments (1% of spam)
were sent from local clients (0.0% of nonspam)
483 were from whitelisted addresses (31.1% of nonspam)
1069 were ok after a bayesian check (68.9% of nonspam)

Re:Bayesian Filtering (1)

seanmeister (156224) | more than 11 years ago | (#6256450)

Ever since I started using Bayesian filtering (via Mozilla Mail [mozilla.org] and SpamBayes [sourceforge.net] ), I haven't even cared how any other techniques compare. It's that good!

Re:Bayesian Filtering (1)

Casca (4032) | more than 11 years ago | (#6256525)

Sounds pretty good. Mind posting your email address here and reporting back next week to let us know how it is going?

I have my own algorithm (1, Insightful)

crovira (10242) | more than 11 years ago | (#6256327)

I parse the content before I read it (isn't php great? :-)

Any email with HTML in it, any email with .exe attachments, any email with the words viagra or penis (or some other words in my list, like "second mortgage" when I don't own a home,) in it gets purged as soon as I pull it off the server.

It never gets to my mail program.

I could also filter on subject lines containing any word whi isn't in thdictionary but since some of my friends don't spell too well...

Re:I have my own algorithm (2, Insightful)

dprovine (140134) | more than 11 years ago | (#6256460)

But what happens when you try to have an email
discussion about stopping spam, and someone in
the discussion says "Well, I filter out any
message with the words viagra or penis..."?

Does that get flagged as spam and discarded too?

My spam filtering (1, Interesting)

Drummer_Dan (648348) | more than 11 years ago | (#6256355)

I have one filter that blocks at least 90% of my spam. If a message contains the word "offer" it's toast. Works for me anyways.

Re:My spam filtering (2, Funny)

McDutchie (151611) | more than 11 years ago | (#6256443)

If a message contains the word "offer" it's toast.
Remind me never to offer you any friendly help with anything. :)

Copy of spam logged? (2, Insightful)

spuke4000 (587845) | more than 11 years ago | (#6256386)

Question about this system: if it sends a temporary unavailable message or whatever it does, does it log the original message? Where I'm going with this is what happens if a legitimate message is blocked but never resent? Most anti-spam software allows you to view the spam folder, or something equivalet, to check for false positives. How are false positives handled here?

spammesilly@gt.rr.com (1)

pair-a-noyd (594371) | more than 11 years ago | (#6256389)

I *WANT* spam.
Not shit, mailto:spammesilly@gt.rr.com [mailto]
Send it to me baby!

I'm teaching my PC how to deal with it and I've been on a mission to sign up for every bullshit mailing list I can find, all the typical trash that pesters people to death.

Once I get this worked out I'll install it on my dad's PC and all my friends too. Then they can say bye-bye to spam.

I told my friend, "Your shit works because my shit is always broke!".. In other words, I am the guinna pig for everyone else. I test it then they get the working version..

So, mailto:spammesilly@gt.rr.com [mailto]

Re:spammesilly@gt.rr.com (0)

Anonymous Coward | more than 11 years ago | (#6256538)

This is the right place... (1)

Dave21212 (256924) | more than 11 years ago | (#6256543)


This is the right place to test your anti-spam tool (is it a graylist?)

If it's "bullshit...all the typical trash that pesters people to death." that you are looking for, you found it !~

Smile, it's Friday EST.

Re:spammesilly@gt.rr.com (0)

Anonymous Coward | more than 11 years ago | (#6256562)

I'm sure roadrunner will be incredibly pleased with your 'experiment.'

RFC 3514 (4, Funny)

pizen (178182) | more than 11 years ago | (#6256393)

How about in the spirit of RFC 3514 (the evil bit) we create a spam header in email. Spam will set this header so we can easily filter it out.

Filtering out spam (1)

Ann Coulter (614889) | more than 11 years ago | (#6256394)

Why don't people just ask their friends to encrypt incomming email? That is one of the simplist way to eliminate spam, and it works too.

Re:Filtering out spam (1)

bobtheheadless (467304) | more than 11 years ago | (#6256563)

Until somebody emails you who doesn't know the deal, or if you have friends who don't even know what encryption is...

meh

Published a paper? (4, Informative)

Call Me Black Cloud (616282) | more than 11 years ago | (#6256400)

Where? To me, publishing a paper means your writing appeared in some peer-reviewed journal (where the "peers" are acknowledged as domain experts). What you did was put up a web page. With a donation link at the bottom.

For others looking for a solution, try POPFile [sourceforge.net] . Open source, cross platform, gives me 96% accuracy.

One more thing: "practically eliminates" is not the same as "eliminates".

Re:Published a paper? (1)

Effugas (2378) | more than 11 years ago | (#6256476)

Yeah. Welcome to the web. We do things a little differently around here. 'round these parts, source code release isn't novel.

--Dan

"Practically eliminates" (1)

homemademissiles (629240) | more than 11 years ago | (#6256479)

Maybe its like condom's.

99% effective, 1% totally ineffective....

i managed (0)

Anonymous Coward | more than 11 years ago | (#6256432)

to stop all spam by blocking the dollar sign $ (my country doesnt use them)
i havent had any spam for months as all the spam i have ever got is from usa based spammers, works a treat

Waiting for Article Title (4, Funny)

notque (636838) | more than 11 years ago | (#6256454)

The Next Step in Fighting Spam: Death Penalty

In many countries SPAM is illegal and... (1)

xutopia (469129) | more than 11 years ago | (#6256466)

isn't a problem. The people I hear most often complain about SPAM are North American and UK people where SPAM isn't illegal.

Re:In many countries SPAM is illegal and... (1)

Bull999999 (652264) | more than 11 years ago | (#6256541)

May North American and UK people would complain less if those "many countries" also outlawed open relays.

Wouldn't it be nice if (2, Informative)

abe_is_fun (320753) | more than 11 years ago | (#6256482)

Spammers are notoriously resiliant. Within a few days/weeks/nanoseconds the spammers would realize they need to retry after a delay, and they would stop with the fire-forget mentality.

I wish your plan would work but I just don't think it will.

Plus the spammers can get their viagra at wholesale cost!

Delaying email by one hour! (5, Insightful)

pjrc (134994) | more than 11 years ago | (#6256484)

From the linked paper:

An hour is short enough that in most cases, users will not notice the delay.

I'm wondering how I'm going to explain that to a new customer over the phone who says "I'll just email that file right now so we can go over it together".

something important seems to be missing from your (-1)

Anonymous Coward | more than 11 years ago | (#6256517)

[ezboard.com]
Trolltalk v 2
> General Discussion
> How to post redirects from sourceforge

Next Topic >>
Author Comment
rkz
Member
Posts: 3
(6/20/03 9:30 am)
Reply How to post redirects from sourceforge http://sourceforge.net/project/admin/?url=http://g oatse.cx/#hatid=354421&group_id=4421

everything after # is just for padding and doesnt matter. have fun

One good point about this proposal (5, Insightful)

Anonymous Coward | more than 11 years ago | (#6256539)

It deals with spam at the server level. All the wonderful user-level solutions don't do jack to stop spam from being sent. Look at the numbers the spammers show for return rate, and look at how fast spam programs can go, and you'll see that the only solutions that will work are those that make it expensive to send spam. Anything else will just make the spammers send more spam to try and get the hit rate they need.

The lesson to learn is (0)

Anonymous Coward | more than 11 years ago | (#6256546)



1) Steal regular Expressions from spamassasin
2) write perl to usa a MySQl db
3) put a 1 hour timer on incoming mail (forget those important mails)
4) put up a webpage and tell slashdot
5) insert "donate" paypal button
6) Profit !!!

mostly worthless, sorry (1)

autopr0n (534291) | more than 11 years ago | (#6256551)

If we have never seen this triplet before, then refuse this delivery and any others that may come within a certain period of time with a temporary failure.

Since SMTP is considered an unreliable transport, the possibility of temporary failures is built into the core spec (see RFC 821). As such, any well behaved message transfer agent (MTA) should attempt retries if given an appropriate temporary failure code for a delivery attempt (see below for discussion of issues concerning non-conforming MTA's).

During the initial testing of Greylisting, it was observed that the vast majority of spam appears to be sent from applications designed specifically for spamming. These applications appear to adopt the "fire-and-forget" methodology. That is, they attempt to send the spam to one or several MX hosts for a domain, but then never attempt a true retry as a real MTA would. From our testing, this means that currently, based on a fairly conservative interpretation of testing data, we see effectiveness of over 95%, and that is with no legitimate mail ever being permanently blocked.


Any attempt to build technology to work around current spammer's techniques is pretty much a waste of time. They'll just adapt. A well written SPAM program can fetch $10,000 per license. It wouldn't be that hard for them to act like a legit MTA to get around this problem, if the patch were widespread.

Sure, it prevents 95% of spam now, but if it becomes widespread it won't.

That said, it will make spammers lives much more difficult, and requires them to identify themselves. So this could be helpful, in concert with other tools. One option would be to use this along with sender-verification, so that people who run legitimate MTAs don't need to worry about getting a verification message (for now).

The real solution (4, Funny)

mrseigen (518390) | more than 11 years ago | (#6256554)

We should grab some of the guys who get 1000+ spams per day, point them to the physical location of the spammers, and then step back. I can guarantee you that vigilante justice is entirely appropriate here, considering we want the gov to step back from the 'net instead of entering new "secret arrests of spammers"(?) laws.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>