×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Asynchronous Programming for Spam Elimination

timothy posted more than 7 years ago | from the you-do-this-while-they-do-that dept.

63

ttul writes "Stas Bekman (formerly the maintainer of mod_perl) has been quietly building an asynchronous programming framework to build high performance network applications in Perl. His recent Perl.com article describes how he has used the Event::Lib module (that lives on top of the popular libevent library) to write a traffic-shaping email proxy to get rid of spam. Asynchronous programming is challenging at the best of times. Read on to find out how to do it the easy way in Perl."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

63 comments

Not ultimately a solution (5, Insightful)

frenetic3 (166950) | more than 7 years ago | (#16417723)

so they wrote an asynchronous proxy that slows down connections. cool trick, but not any kind of scalable solution.

the core assumption, and the only thing that makes this work, is that botnet spam software will _always_ just give up after 30 seconds; if this throttling technique ever became commonplace, spammers would just write their own asynchronous mailer -- it's not THAT hard. windows has the same kind of async networking support (either through the winsock API and/or IO completion ports, or what have you) and i'm sure the spam/botnet software authors have no qualms about holding open a couple thousand sockets on the rooted windows machine (times a few hundred thousand machines.) furthermore, i bet there are some shitty legitimate MTAs that would just give up too, causing actual mail to get discarded :)

(that, and they shoulda used twisted [twistedmatrix.com] or something :) -- using a pool of apache/mod_perl instances to handle connections is grossly inefficient.)

ok, ok, maybe this sounds overly critical. it's a clever, thinking-out-of-the-box idea, but certainly not the panacea we're looking for to stop spam.

-fren

Second post! (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#16417801)

Where have all the trolls gone? :(

Re:Second post! (-1, Offtopic)

Anonymous Coward | more than 7 years ago | (#16417889)

we are bored.

Even easier.... (1)

woolio (927141) | more than 7 years ago | (#16418297)

Forget async io (completion stuff) in Windows...

They can just make the SPAM program multithreaded and start a new thread for each new connection (each using *synchronous* IO).

Theres no interprocess communication involved, it should be trivial.

Re:Even easier.... (0)

Anonymous Coward | more than 7 years ago | (#16419541)

and if you RTFA, they specifically point out that this goes to hell if you're dealing in more then a trivial number of connections, especially since their approach is based on forcing a delay for the connection...

Context switching between a few hundred processes, or few thousand (for a thousand connections) starts to break down pretty quickly...

Re:Not ultimately a solution (3, Insightful)

A beautiful mind (821714) | more than 7 years ago | (#16419149)

It's an arms race. Graylisting, higher MX spam traps, etc.

They all rely on the "we only have to be better than the neighbour's mailserver" principle. Until everyone starts doing it these things work and then new methods get invented to combat spam. Not that suprising, but saying no to this approach is basically silly. There is NO good way to eliminate spam, because stupid people exist. So people hack around the problem.

Re:Not ultimately a solution (1)

Halo1 (136547) | more than 7 years ago | (#16420579)

They all rely on the "we only have to be better than the neighbour's mailserver" principle.
Greylisting in combination with blacklisting also has another advantage: by the time the message is no longer greylisted, there is a higher chance that the spamming server is already blacklisted.

Yes, it does in fact work (5, Informative)

ttul (193303) | more than 7 years ago | (#16419857)

[full disclosure and shillery alert: I work with Stas at MailChannels]

You make some very good points -- and these are all concerns we had when we set out to build this software.
Fortunately for the world, these concerns have turned out to be unwarranted. Furthermore, our experience in actually deploying this technology has been far more breathtaking than we had imagined -- both in terms of spam mitigation and improvements in scalability.

> the core assumption, and the only thing that makes this work, is that botnet spam software will _always_ just
> give up after 30 seconds;

I have a theory that spammers will always be impatient. I believe this theory for several reasons:

1. Spam campaigns are now recognized by anti-spam companies in minutes or hours. New campaigns therefore have a very short life expectancy and have to be completed as fast as possible. If mail can't get delivered fast, it's time to move on to a new domain to get it moving again. With collaborative filters like Cloudmark recognizing campaigns in less than 60 seconds, spammers obviously have to move traffic fast.

2. Botnets are not unlimited in their size or bandwidth capacity. Typicaly botnets these days are between 1,000 and 10,000 hosts. Any larger and the command and control channels are very quickly noticed and shut down by service providers. Botnets cost money too -- $250/hour for a 10K botnet is typical.

3. Spammers raison d'etre is to send lots of mail and hope that a small percentage of recipients buy something. The only way to make the business profitable is to send huge amounts of mail. If all zombie traffic in the world was magically being slowed down, spamming would no longer be profitable and spammers would tend to focus more on things like highly targeted phishing instead. Not surprisingly, we're already starting to see this.

4. Because #3 isn't going to happen any time soon, and in light of the technical constraints (1 and 2), spammers have no choice but to abort their connections within a very short time frame. It's just the nature of the economic beast. Hanging on is just for posterity. It doesn't make economic sense.

5. It works. And it's very very scalable. By slowing down traffic and multiplexing what remains, mail server load drops by 90%. In big installations, that means no more being paged in the middle of the night because your cluster of 4-way Xeons with 8GB of RAM is borked by a distributed spam burst.

Oh -- and of course you can't just slow everything down. It's important to be very selective so as not to delay everything.

> if this throttling technique ever became commonplace, spammers would just write their
> own asynchronous mailer -- it's not THAT hard...

Actually, it is that hard. Even Stas got a headache working on this project.

But even if it was easy, it would be pointless for a spammer to launch more than one connection per zombie. If a sender is marked as suspicious, the sender's concurrency is severely limited. One connection per zombie, at 5 bytes per second -- that's just not economic.

> furthermore, i bet there are some shitty legitimate MTAs that would just give up too, causing actual
> mail to get discarded :)

Let's just say the gap between the patience of spammers and the patience of legitimate MTAs is very large indeed. And by carefully fingerprinting and assessing sender reputation, this problem can be minimized to the point where it is a far smaller problem than content filter false positives.

I also want to point out that this technology does not make email suck by slowing it down. It in fact speeds up delivery of legitimate mail in most cases because the load is so reduced on the rest of the infrastructure.

Just talk to our customers. One of them was running four 4-way Xeon boxes with 8GB of RAM each -- all this to service the spam filtering needs of just 10,000 end users. He told us he hadn't slept a full night in months because of load-based outages. Since installing the software Stas built, the only alert he's received is a notification that the load level dropped below the panic threshold!

Re:Yes, it does in fact work (3, Informative)

caseih (160668) | more than 7 years ago | (#16425175)

Unfortunately I've seen a marked decrease in the effectiveness of grey-listing lately, which is similar in intent to your ideas. What I'm finding is that a lot of spam is now coming from RFC-compliant mail servers. Stock spams in particular always come through after faithfully waiting out the greylist timeout. So obviously some spammers are able to wait, even up to 45 minutes, to send their spam to me. So despite your arguments spammers will find a way to still economically spam while tolerating delays, holding connections open, etc.

Re:Yes, it does in fact work (1)

SEAL (88488) | more than 7 years ago | (#16429685)

I've seen a marked decrease in the effectiveness of grey-listing lately

Agreed. My ISP *finally* added greylisting this year. This is the account I use on my domain registrations, so the email address shows up in whois. It therefore gets an insane amount of spam. After testing out the greylisting for a couple of weeks, I saw no perceptible difference in the amount of spam I was receiving.

When you greylist, you're basically using SMTP rules to tell the sender "try again later". As this became more common, spammers simply started handling it with a proper response.

So the idea in this article - slowing down their connection to a trickle - is actually easier for them to handle because they don't need to implement a full MTA. The only advantage to this solution is if enough people were throttling connections, a spammer's bots might very well crash under the load of their own outgoing connection attempts.

Re:Yes, it does in fact work (1)

Paradise Pete (33184) | more than 7 years ago | (#16529359)

Stock spams in particular always come through after faithfully waiting out the greylist timeout.

That penny stock spam is the most successful I've seen. More than half of it gets past gmail's filters and into my in box, and then more than half of that gets past my own filters. It's just about the only spam that makes it through, but I get several of those a week. (I also checked the stocks, and not a single one has risen significantly, despite the spam's assurances ;-)

Re:Yes, it does in fact work (1)

hawg2k (628081) | more than 7 years ago | (#16583196)

I was just about to post the exact same thing. Mod the parent up.

I don't even use greylisting anymore because it gets in the way of me troubleshooting mail problems, and has negligable affect on SPAM anymore.

Re:Not ultimately a solution (0)

Anonymous Coward | more than 7 years ago | (#16421199)

> so they wrote an asynchronous proxy that slows down connections. cool trick

Not really, I suspect that most of the magic is courtesy of libevent.

/ducks

Re:Not ultimately a solution (1)

TheRaven64 (641858) | more than 7 years ago | (#16424791)

So how does this compare to OpenBSD's spamd, which does tar-pitting (and things like setting the TCP window size to 1 so you can really slow things down), but is designed for very low resource usage? This presentation [openbsd.org] by the spamd guys last year should, I think, address some of your questions about the long-term effectiveness of greylisting. In summary; spammers adapt, but so does spamd.

Re:Not ultimately a solution (1)

ttul (193303) | more than 7 years ago | (#16460389)

[shillery notice: I am CEO at MailChannels [mailchannels.com] ]

spamd gave us our initial inspiration. I talked with Bob Beck at the Cansecwest security conference [cansecwest.com] after he presented on spamd and was -- to put it mildly -- blown away.

It's important to understand that spamd does not actually deliver mail. It just responds r e a l l y s l o w l y and then returns a 400-series code to force the sender to try again. After the first time, a packet filter rule is added that redirects that sender to a real MTA, which receives the message.

So in essence spamd is (primarily) used as a grey-listing system.

Traffic Control [mailchannels.com] actually delivers the mail in addition to efficiently slowing down connections from _certain_ senders.
In that way it's a lot more sophisticated and less prone to deliverability problems. Deliverability is a major concern for corporate customers -- even though spam is also a big deal.

oblig. checklist :) (5, Funny)

frenetic3 (166950) | more than 7 years ago | (#16417779)

Your post advocates a

(X) technical ( ) legislative ( ) market-based ( ) vigilante

approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

( ) Spammers can easily use it to harvest email addresses
(X) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
(X) It will stop spam for two weeks and then we'll be stuck with it
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
( ) The police will not put up with it
( ) Requires too much cooperation from spammers
(X) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business

Specifically, your plan fails to account for

( ) Laws expressly prohibiting it
( ) Lack of centrally controlling authority for email
( ) Open relays in foreign countries
( ) Ease of searching tiny alphanumeric address space of all email addresses
( ) Asshats
( ) Jurisdictional problems
( ) Unpopularity of weird new taxes
( ) Public reluctance to accept weird new forms of money
(X) Huge existing software investment in SMTP
( ) Susceptibility of protocols other than SMTP to attack
( ) Willingness of users to install OS patches received by email
(X) Armies of worm riddled broadband-connected Windows boxes
(X) Eternal arms race involved in all filtering approaches
( ) Extreme profitability of spam
( ) Joe jobs and/or identity theft
( ) Technically illiterate politicians
( ) Extreme stupidity on the part of people who do business with spammers
( ) Dishonesty on the part of spammers themselves
( ) Bandwidth costs that are unaffected by client filtering
( ) Outlook

and the following philosophical objections may also apply:

( ) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
(X) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
(X) Killing them that way is not slow and painful enough

Furthermore, this is what I think about you:

(X) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, asshole! I'm going to find out where you live and burn your
house down!

Re:oblig. checklist :) (0)

GeorgeS069 (956679) | more than 7 years ago | (#16418187)

ROFLMAO That was awesome Best laugh I had reading /. in a long time

Re:oblig. checklist :) (0)

Anonymous Coward | more than 7 years ago | (#16418445)

Dude, they post that checklist for every spam-solution related article on slashdot. That's why it says "oblig" in the topic. It means "obligitory."

Re:oblig. checklist :) (0)

FreeIX (1011833) | more than 7 years ago | (#16418913)

Indeed, so the GP's "a long time" is a relatively short period of time. But I have seen this checklist many times, and it's still extremely funny.

Please provide details (1)

oneiros27 (46144) | more than 7 years ago | (#16422227)

I've seen the checklist used many, many times, and it's typically funny. But I'm not sure you've selected the correct values in this instance. Please provide details of why you selected the following:
(X) Mailing lists and other legitimate email uses would be affected
(X) Requires immediate total cooperation from everybody at once

Specifically, your plan fails to account for

(X) Huge existing software investment in SMTP
(X) Armies of worm riddled broadband-connected Windows boxes

and the following philosophical objections may also apply:

(X) Countermeasures should not involve sabotage of public networks

Specifically, the article did take into account botnets, and they're just forcing good SMTP compliance. It shoudln't affect well-designed mailing list software, nor would it sabotage public networks.

So yes, there is the whole issue of the arms race -- people will just correct their botnets to handle this quirk. But your other categorizations are grossly unfair.

Re:oblig. checklist :) (1)

gurps_npc (621217) | more than 7 years ago | (#16424301)

Your answers were bullcrap. Here are my counters.

(X) Mailing lists and other legitimate email uses would be affected

And your point is? If I have to give up 'mailing lists', or (far more likely) force mailing lists to change so that they are NOT so similar to spam that they get caught by anti-spam stuff that is not a real issue. We do NOT owe Mailing Lists the right to exist if they can't change to deal with the reality of a spam-free world, tough luck. The effect on other legitimate email uses would be minimal, if any.

(X) It will stop spam for two weeks and then we'll be stuck with it

No. The technique listed here increases the actual COSTS to the spammers. It forces them to use more computers to get the same throughput. Every time we double their costs, we get a permanent reduction in the amount of spam. It may not by itself kill spam, but it will have a negative effect. (X) Requires immediate total cooperation from everybody at once

Nope. This is just false. In fact, people that do it by themselves have a GREATER effect than if everyone does it. (X) Huge existing software investment in SMTP

This is a relatively simple add on. We keep the existing software, we just upgrade it. (X) Armies of worm riddled broadband-connected Windows boxes

Not relavent. (X) Eternal arms race involved in all filtering approaches

Again, so what. Here you simply say "They will try to counter us, so we should not even try to counter them? Very Foolish argument. (X) Countermeasures should not involve sabotage of public networks

Slow down of 30 seconds per legitimate email is not sabotage. I don't expect my email to be read within one hour. (X) Sorry dude, but I don't think it would work.

Sorry dude, but your reasons suck.

TLDR (-1, Redundant)

Anonymous Coward | more than 7 years ago | (#16417849)

TLDR. Too long didn't read.

As far as I can tell... (1)

KillerCow (213458) | more than 7 years ago | (#16418137)

Asynchronous Programming = programming with futures

must we rename everything every time that someone "discovers" it?

AJaX (2, Informative)

tepples (727027) | more than 7 years ago | (#16418255)

Asynchronous Programming = programming with futures

Except "asynchronous programming" is already a well-known term among many web developers:

Asynchronous Programming with
JavaScript, HTML DOM,
and
XMLHttpRequest

Re:As far as I can tell... (1)

RAMMS+EIN (578166) | more than 7 years ago | (#16420707)

``must we rename everything every time that someone "discovers" it?''

Yes, because, that way, you get publicity. If you just quietly sat and implemented it, it would be every bit as great, but nobody would hear about it.

As if PERL wasn't hard enough to read... (3, Funny)

0kComputer (872064) | more than 7 years ago | (#16418199)

This guy goes and makes it multithreaded... Great just what we need.

Re:As if PERL wasn't hard enough to read... (1)

VGPowerlord (621254) | more than 7 years ago | (#16419601)

It's easy to do threads [cpan.org] in perl 5.8.

Re:As if PERL wasn't hard enough to read... (2, Informative)

ttul (193303) | more than 7 years ago | (#16419961)

[full disclosure: I work with Stas at MailChannels]

We looked at using the new Perl threads, but Perl 5.8 threads suffer from a few severe limitations.

1. When you create a new thread, a complete copy of the interpreter is made. The new thread makes use of this new interpreter instance and cannot communicate with the original thread except via the threads::shared module or some traditional IPC mechanism. In short, they're no better than forking a new process and in many ways, they are far worse than this.

2. Perl threads are still quite unstable.

Yes, we could have used Python. Or Ruby. Both these languages have better threading support by leaps and bounds. Additionally, they have great asynchronous libraries like Twisted. Why'd we use Perl? Well, I suppose it's in our blood. Between Stas and the rest of the dev team, we have a good cross-section of Perl talent.

Re:As if PERL wasn't hard enough to read... (2, Interesting)

Ed Avis (5917) | more than 7 years ago | (#16421075)

Did you consider some event-driven thing using POE [perl.org] ?

Re:As if PERL wasn't hard enough to read... (1)

ttul (193303) | more than 7 years ago | (#16460457)

Yes, we looked at using POE. We concluded that POE is just far more than we needed for this application.
It would have been too difficult to make POE rock performance-wise in addition to ensuring that POE used an efficient event library like libevent.
And in this kind of application, you need awesome performance. We profiled the app with strace for weeks to get rid of unnecessary system calls.

Re:As if PERL wasn't hard enough to read... (3, Informative)

kimanaw (795600) | more than 7 years ago | (#16423043)

Yes, we could have used Python. Or Ruby. Both these languages have better threading support by leaps and bounds.

Er, how ? Because they don't really use threads ? Sure, they're fast and lightweight...but since they don't use the underlying OS's threads implementation (ie, kernel-compatible threads), they're only marginally useful on multiCPU and/or multicore systems.

2. Perl threads are still quite unstable.

Whats your basis for that statement ? Have you tested the latest versions of the threads [cpan.org] and threads::shared [cpan.org] modules ? Some significant effort has been applied in the past year to improve stability, as well as reduce footprint...you might want to give it a look...

Perhaps if your org can get some funding, you might throw some money at the TPF to get iCOW implemented ? Which should vastly improve thread startup and reduce footprint. threads::shared remains a bit of a challenge, but that issue can be addressed by some carefully crafted XS (which I'm told Stas is pretty good at ;^).

Re:As if PERL wasn't hard enough to read... (1)

stas_bekman (986420) | more than 7 years ago | (#16460859)

stable perl threads? you must be kidding...

it works for basic light things, but if anything complex is used, like mod_perl, it segfaults all over and if it doesn't it takes dozens of seconds to start a new thread under heavily loaded machine (due to lack of CoW as you've mentioned, but even then I doubt it'd be much of help, since it'll still need to copy a lot of data)

And yes, someone needs to work on fixing those and a TPF grant would be very helpful.

Re:As if PERL wasn't hard enough to read... (1)

losec (642631) | more than 7 years ago | (#16421335)

Actually, threads is something perl has got right, compared to most other languages.
Perl threads is also very easy to understand.
Simply put, nothing is shared between threads.
If you want to share data between perl threads you must explicitly say so:
my $foo : shared = 1;

though if you're stuck with perl version 5.6.0, dont use threads.

Re:As if PERL wasn't hard enough to read... (0)

Anonymous Coward | more than 7 years ago | (#16515321)

Perl threads is also very easy to understand.
Simply put, nothing is shared between threads.
If you want to share data between perl threads you must explicitly say so:
my $foo : shared = 1;


Threads *by definition* share data. That's the major difference (and often the only difference) between threads and processes. Processes need to specifically allocate shared memory for the stuff they want to share, threads just share everything.

"... the easy way in Perl." (1)

sakusha (441986) | more than 7 years ago | (#16418827)

Isn't that an oxymoron?

Re:"... the easy way in Perl." (1)

FreeIX (1011833) | more than 7 years ago | (#16420069)

Nah, Perl is very easy to do things in...the first time. Unfortunately what is not so easy is understanding what you did six months ago.

Re:"... the easy way in Perl." (1)

chromatic (9471) | more than 7 years ago | (#16420179)

Consider this an opportunity to learn how to write maintainable code.

Re:"... the easy way in Perl." (1)

FreeIX (1011833) | more than 7 years ago | (#16420673)

Assuming for the sake of argument that I don't know how, perhaps I'd rather use a language that doesn't by its very motto make it difficult to learn how.

Re:"... the easy way in Perl." (1)

chromatic (9471) | more than 7 years ago | (#16426735)

There's more than one way to do it, in Perl, so choose the most maintainable. Problem solved.

Before you counter "But I have to maintain code written by monkeys, and it's hard to read," consider not hiring monkeys to write code you care about. Not even Haskell or Java or Ruby prevents monkeys from writing bad code. The problem is, they're monkeys, not that they're using the wrong language.

Re:"... the easy way in Perl." (1)

hondo77 (324058) | more than 7 years ago | (#16424751)

Unfortunately what is not so easy is understanding what you did six months ago.

If a programmer cannot go back into code he wrote six months ago and figure out what is going on, the blame rests with the programmer. The language is irrelevant.

Re:"... the easy way in Perl." (1, Insightful)

Anonymous Coward | more than 7 years ago | (#16426515)

He didn't say it couldn't be done, he said it was not easy with Perl. The language is entirely relevant to this assertion.

Clever, but... (2, Interesting)

deepb (981634) | more than 7 years ago | (#16419249)

The article is correct - mail servers do not mind waiting a few minutes/hours/days to deliver their mail. Unfortunately, end-users do mind. The inherent delays for just about every message would be particularly painful for business email users, but even residential ISP customers are constantly opening tickets when they observe a delay (I work closely with several large ISPs, which is how I know).

Delays aside, I just can't buy into network-layer rate limiting when it comes to email. The metric for anti-spam success is measured in "messages" (or more accurately, "recipients"). Nobody ever calls their local email admin to say, "hey, I've received 1.3 megabytes of spam this week, what gives?"; instead, the problem is always quantified by the number of individual messages the end user had to look at and consider before deciding what to do.

Because of this, rate-limiting should be done per-recipient. That way, there's no question what a particular sender is going to get through. Once they pass the limit you've specified for their class of IP (known mail server, dynamic IP, etc) during whatever timeframe, they receive an SMTP 4xx error until that timeframe is up. That still slows them down, but you can't get around it with smaller messages, etc.

Re:Clever, but... (1)

ttul (193303) | more than 7 years ago | (#16419901)

[full disclosure: I work with Stas at MailChannels]

> The inherent delays for just about every message would be particularly painful for business email users, but
> even residential ISP customers are constantly opening tickets when they observe a delay (I work closely
> with several large ISPs, which is how I know).

That would be a problem if every single message was slowed down, but it's not. The system uses sender reputation and behaviour to ensure that only malicious senders are slowed down. Our customers have found more often that end-users notice better deliverability when this technology is in place -- because the load on the spam filters is so much reduced that queues don't back up causing really bad delays.

One way or another, you have to delay some of the traffic. You either do it up front and selectively -- applying the pain to the bad senders -- or you do it after the messages are queued, which hurts the recipients.

Re:Clever, but... (1)

deepb (981634) | more than 7 years ago | (#16420677)

That would be a problem if every single message was slowed down, but it's not. The system uses sender reputation and behaviour to ensure that only malicious senders are slowed down.
I don't recall any mention of that in the article, but I guess it may have been a bit outside the scope. Either way, I didn't realize that - makes sense.

One way or another, you have to delay some of the traffic. You either do it up front and selectively -- applying the pain to the bad senders -- or you do it after the messages are queued, which hurts the recipients.
I'm certainly not suggesting that any rate limiting take place after a message is accepted -- I'm just stating my preference for SMTP-layer rate limiting, as opposed to network-layer rate limiting. With SMTP-layer rate limiting, the pain is still applied to bad senders in the form of "4xx You hit your limit" SMTP responses, but there's no possibility for size-related workarounds (e.g., completely blank messages, an odd annoyance for quite some time, would still come through pretty quickly). There's also no need to allow for a huge number of concurrent connections (but I suppose there's no downside to doing it anyway).

Don't get me wrong, I'm not saying your approach won't work -- we're talking about fundamentally different ways of achieving the same goal (rate limiting). I simply prefer the SMTP-layer approach, because it seems to me like the place to do that sort of thing.

let me know how it works out (0)

Anonymous Coward | more than 7 years ago | (#16419749)

Got on Vans but they look like sneakers!!!

Perl fetish? (0)

Anonymous Coward | more than 7 years ago | (#16421321)

Reading TFA, I couln't help but notice that everything looks like a nail to these guys. I know this is the mod_perl author but did it ever occur to him that Perl and Apache (APR) are the wrong tools for this job? Doing this in (eg) Erlang would have negated any threading or concurrency issues, but I digress; doing it in Perl may be cool in a geeky masochistic kind of way that I cannot relate to.

<sarcasm>
Now I must return to writing my device driver, I've never really done any kernel hacking before and I'm a web designer so I'm doing it all in javascript.
</sarcasm>

Freq. distribution of mail transfer agents (1)

SgtChaireBourne (457691) | more than 7 years ago | (#16422355)

Most if not all mail transfer agents no longer operate as open relays by default, a problem which used to be the main contribution to spam. People blamed the complexity of Sendmail for that and other problems, so many distros moved to other mail transfer agents for their default. A few years ago Sendmail was still about 65% of the mail servers.

What is the current marketshare of Sendmail now and what is the frequency of others like Exim, qmail, and Postfix?

Re:Freq. distribution of mail transfer agents (1)

ttul (193303) | more than 7 years ago | (#16460597)

I can actually comment on that. We've surveyed 400,000 mail servers at organizations around the world and have found that Sendmail still holds on to 13% of the market.

Re:Freq. distribution of mail transfer agents (1)

foxylad (950520) | more than 7 years ago | (#16497831)

Care to comment on the other mail servers? Sendmail at only 13% is a big suprise. Hopefully this statistic will help pursuade our dinosaur sysadmin that we should switch to postfix.

"High performance" , "perl" , sorry? (1)

Viol8 (599362) | more than 7 years ago | (#16424197)

Perl is good for scripting but 24/7 high performance apps?
Don't make me laugh. Something this CPU and I/O intensive should
be written in C/C++ or even assembler at a push , not a scripting
language. Seems to me this project has been written in perl for
the sake of writing it in perl , not because it confers any
advantages over doing it in a lower level language.

Re:"High performance" , "perl" , sorry? (2, Insightful)

chromatic (9471) | more than 7 years ago | (#16426795)

Remember kids, if your process is IO-bound, you want the fastest possible code ever to make sleeping on those system calls as efficient as possible!

Re:"High performance" , "perl" , sorry? (1)

BrianRoach (614397) | more than 7 years ago | (#16428727)


You must be really amazing to be able to determine that a given application can't possibly be usable when written in language X and would be much better in language Y without any data or firsthand experience using the application.

Sometimes, things work just fine even though they'd be 20ms faster if written in C/C++.

- Roach

Re:"High performance" , "perl" , sorry? (1)

Viol8 (599362) | more than 7 years ago | (#16450267)

If you're dealing with large data dumps you want something that can
process that data fast.

Re:"High performance" , "perl" , sorry? (0)

Anonymous Coward | more than 7 years ago | (#16431543)

> or even assembler

You have never ever written real-world scalable code in your life, have you?

Re:"High performance" , "perl" , sorry? (1)

Viol8 (599362) | more than 7 years ago | (#16450285)

Yeah , you're right. I've only been doing IB trading system
links to major stock exchanges such as LSE, NYSE, Euronext etc
for the last 3 years, what would I know.

Go and play with your little Perl toy pal , and leave the real
coding to those of us who have a clue.

Re:"High performance" , "perl" , sorry? (2, Insightful)

angel'o'sphere (80593) | more than 7 years ago | (#16439935)

You seem to have never used PERL? PERL is in most regards in a speed range of > 85% of C/C++. For the stuff PERL is optimzed for it is nearly at 95% of C++ with a FAR shorter development cycle.

At least your comment is the msot silliest I have ever seen. What will a mail filter/forwarder do 90% of its time? NOTHING, being blocked listening on a socket. It realyl does not matter if the listening process is written in assembler (granted, which is very portable from sparc to i386 to PowerPC) and jsut waits "faster" or is written in PERL ....

angel'o'sphere

Re:"High performance" , "perl" , sorry? (0)

Anonymous Coward | more than 7 years ago | (#16451149)

C and C++ are portable and MUCH faster than perl:
http://shootout.alioth.debian.org/gp4/benchmark.ph p?test=all&lang=gcc&lang2=perl [debian.org]
http://shootout.alioth.debian.org/gp4/benchmark.ph p?test=all&lang=gpp&lang2=perl [debian.org]

And if we are going to go high level then there's no reason to use perl either because there are much more readable/maintainable languages in the same speed class:
http://shootout.alioth.debian.org/gp4/benchmark.ph p?test=all&lang=python&lang2=perl [debian.org]

Case in point - google uses C for their fast stuff and Python for the non-performance related stuff. I am sorry, but you can't make a case for perl to anyone but perl programmers these days whereas languages like ruby and python have strong cases to be made for them.

Re:"High performance" , "perl" , sorry? (1)

angel'o'sphere (80593) | more than 7 years ago | (#16552172)

I said: For the stuff PERL is optimzed for it is nearly at 95% of C++ with a FAR shorter development cycle.
That is file io, process management and text processing.

It makes no sense to pull out a set of benchmarks where some "nerds" wrote mandelbrot programs and n-body gravity simulations to prove that C is faster. Of course a portable assembler language, using native datastructures (arrays!) is faster than PERL, no one doubts that. but my parent was of the opinion that PERL is so slow that it is suicide to use it in the given context. Also I challanged the portablity of assembler, not C ...

If I had to write such a program I wold check for which language I have libraries that solve most of the issues I would have. And cerainly I would not use a non OO language. Java e.g. has a very good library to work with email ...

angel'o'sphere

 

Greylisting is the answer (1)

o517375 (314601) | more than 7 years ago | (#16458895)

We implemented greylisting. It is the answer. I watch as tens of thousands of emails per day are bounced away into oblivion. At first, ham had to wait a a while, but now that the database is built, no one waits anymore. Not only that, server CPU is neglible because Spamassassin doesn't run on resent mail that has been marked as ham. Combine this with a few scripts that do some basic purging of spam addresses from the database, and we're good to go. Let's not reinvent the wheel. Why don't we just build greylisting right into the SMTP protocol? And while we're at it, let's build encryption in too -- feeling challenged?
Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...