Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

AWS Load Balancer Sends 2 Million Netflix API Reqs To Wrong Customer

Soulskill posted more than 2 years ago | from the close-does-not-in-fact-count dept.

Cloud 58

rsk writes "Amazon Web Services' Elastic Load Balancer is a dynamic load-balancer managed by Amazon. Load balancers regularly swapped around with each other which can lead to surprising results; like getting millions of requests meant for a different AWS customer. Using ELBs can result in AWS unintentionally introducing a man-in-the-middle (attack) into your application environment. Most AWS users do not realize this can happen and have not secured against it."

cancel ×

58 comments

Sorry! There are no comments related to the filter you selected.

TTL value (2)

SharkLaser (2495316) | more than 2 years ago | (#37880310)

It looks more like some client aren't respecting the DNS TTL value, so technically it's not Amazon's fault. You should stick to standards, and if TTL says it's 60 seconds, then it is.

Re:TTL value (2)

Florian Weimer (88405) | more than 2 years ago | (#37880382)

Browsers are sometimes forced to disregard TTL values to prevent certain type of attacks which involve quickly changing DNS records.

Re:TTL value (1)

mysidia (191772) | more than 2 years ago | (#37881114)

Browsers are sometimes forced to disregard TTL values to prevent certain type of attacks which involve quickly changing DNS records.

No, they are not "forced" to do so. They have chosen an improper method to "workaround" a security issue that violates other internet standards and causes issues, because they are not implementing DNS resolution in a valid way.

The TTL in DNS is not an "advisory" value, it is a time after which the old RR in the previous authoritative DNS response must be expunged, a TTL of 0 prohibits caching altogether.

There are other methods of preventing attacks that involve quickly changing DNS records, like, oh, fixing their trust policy, or throwing up an error requiring the user to reload their page.

Re:TTL value (1)

DarkOx (621550) | more than 2 years ago | (#37885524)

The browser makers playing fast a lose with standards, outside of html sucks! They all suck, try an find a browser that does PASV ftp *correctly*. They all either as part of a very misguided security attempt or based on the assumption FTP servers are behind NAT and can't be configured to send a correct address in the PASV response don't use the address value returned and stupidly use control sockets remote address as the address.

That breaks all but the very most common use case and all the browsers do it. I have seen other examples of DNS FAIL out there as well.

Re:TTL value (1)

Ben Hutchings (4651) | more than 2 years ago | (#37889884)

If browsers don't impose such a minimum, devices with embedded web servers (think printers and home routers) become vulnerable to Cross-Site Request Forgery. They can potentially defend against this by checking the Host header on requests, but since these devices are only manageable through the web there's no good way to establish what the correct value is.

Security is NOT an issue with The Cloud. (2, Funny)

Anonymous Coward | more than 2 years ago | (#37882212)

Wait a minute. I'm a manager, and I've been reading a lot of case studies and watching a lot of webcasts about The Cloud. Based on all of this glorious marketing literature, I, as a manager, have absolutely no reason to doubt the safety of any data put in The Cloud.

The case studies all use words like "secure", "MD5", "RSS feeds" and "encryption" to describe the security of The Cloud. I don't know about you, but that sounds damn secure to me! Some Clouds even use SSL and HTTP. That's rock solid in my book.

And don't forget that you have to use Web Services to access The Cloud. Nothing is more secure than SOA and Web Services, with the exception of perhaps SaaS. But I think that Cloud Services 2.0 will combine the tiers into an MVC-compliant stack that uses SaaS to increase the security and partitioning of the data.

My main concern isn't with the security of The Cloud, but rather with getting my Indian team to learn all about it so we can deploy some first-generation The Cloud applications and Web Services to provide the ultimate platform upon which we can layer our business intelligence and reporting, because there are still a few verticals that we need to leverage before we can move to The Cloud 2.0.

Re:Security is NOT an issue with The Cloud. (1)

Prosthetic_Lips (971097) | more than 2 years ago | (#37882242)

... and here I am without any mod points.

Pretend that I marked you Two Thumbs Way Up!, Mr. PHB.

PS: For those of you without an irony chip installed ... pretend I started my post with </irony>

Re:Security is NOT an issue with The Cloud. (1)

AlienIntelligence (1184493) | more than 2 years ago | (#37882934)

... and here I am without any mod points.

Pretend that I marked you Two Thumbs Way Up!, Mr. PHB.

PS: For those of you without an irony chip installed ... pretend I started my post with </irony>

Pretend you started your post with Irony, off?

-AI

Re:Security is NOT an issue with The Cloud. (1)

Prosthetic_Lips (971097) | more than 2 years ago | (#37883070)

To end the irony of the previous post ...

Re:Security is NOT an issue with The Cloud. (1)

Galestar (1473827) | more than 2 years ago | (#37883462)

me thinks you need to go back to xml school

Re:Security is NOT an issue with The Cloud. (1)

ATMAvatar (648864) | more than 2 years ago | (#37884324)

A single tear rolled down my cheek as I compared this satire against real, starry-eyed reactions of my company's management with "the cloud".

You know, this mythical beast that solves all scalability and maintenance issues while simultaneously having absolutely zero downsides...

Re:Security is NOT an issue with The Cloud. (0)

Anonymous Coward | more than 2 years ago | (#37884408)

This was funny the first time.... but I have seen this posted before.

Re:Security is NOT an issue with The Cloud. (1)

marcello_dl (667940) | more than 2 years ago | (#37887900)

cool story, bro [google.com] - but maybe it was submitted once and some faulty load balancer spread it out.

Re:TTL value (0)

Anonymous Coward | more than 2 years ago | (#37882964)

Excuse me, but what browser implements it's own DNS resolver?

Re:TTL value (1)

Narcocide (102829) | more than 2 years ago | (#37883482)

They don't need their own resolver to cause problems. Many popular programs cache DNS requests well longer than is appropriate. Firefox, for one caches DNS records internally (some versions on some platforms even for HOURS beyond the TTL unless you restart it) and so does Mac OS X itself.

Re:TTL value (0)

Anonymous Coward | more than 2 years ago | (#37884888)

"Browsers are sometimes forced to disregard TTL values"

Browser don't disregard TTLs, they never see them. Gethostbyname only returns an IP. The browsers then implement their own caches (because most system APIs are dumb, and don't maintain a cache) using a default TTL. Squid, and various other software does the same thing. It's a problem that should be well known to anyone dealing with DNS based load balancing, but the vendors tend to stay mum on it. Otherwise they'd greatly reduce the deployment scenarios for DNS load balancing.

Re:TTL value (1)

Florian Weimer (88405) | more than 2 years ago | (#37885072)

Browser don't disregard TTLs, they never see them.

Good point. There are APIs that provide TTL information (such as res_query), but Firefox does not seem to use them. Interesting.

Re:TTL value (0)

Anonymous Coward | more than 2 years ago | (#37880428)

Amazon knows better than to trust Microsoft to honor the TTL. It's been broken for more than a decade and a half. Even Netware gets the minimum wrong even though it doesn't default to nearly as bad a value as Microsoft. Also, some ISPs like AOL will not honor TTLs below a certain value. 1800 seconds is the absolute minimum of a TTL you should use if you want to only have Microsoft and AOL screw you over a little.

Re:TTL value (2)

girlintraining (1395911) | more than 2 years ago | (#37880644)

It looks more like some client aren't respecting the DNS TTL value, so technically it's not Amazon's fault.

"Technically", no. But two people pointing a finger at each other and saying "He did it!" doesn't solve anything, and all the customer gets is the finger.

Re:TTL value (1)

arglebargle_xiv (2212710) | more than 2 years ago | (#37880696)

It looks more like some client aren't respecting the DNS TTL value, so technically it's not Amazon's fault.

"Technically", no. But two people pointing a finger at each other and saying "He did it!" doesn't solve anything, and all the customer gets is the finger.

Thus Elastic Load Balancer's other name, Erratic Load Balancer.

Re:TTL value (2)

hedwards (940851) | more than 2 years ago | (#37880796)

If the customer's getting the finger, wouldn't that make it more of an Erotic Load Balancer?

Re:TTL value (1)

mysidia (191772) | more than 2 years ago | (#37881134)

Pointing the figure and screaming very loudly would be very good, especially if Amazon does it, as it will help bring attention to broken behavior in DNS and browser software.

I will agree it hurts Amazon, but it helps the community, for large players like Amazon to help bring attention to broken software, so that it can be fixed.

Re:TTL value (4, Interesting)

JWSmythe (446288) | more than 2 years ago | (#37880668)

    From what I've seen, it's frequently the client's DNS servers, not the client itself.

    I've used a short TTL (5m) for quite a while. It's intentional, because I've needed to switch things rather quickly in the past, and it's better for it to "just work", rather than waiting hours for everyone to pick up the change.

    I used to work for a place that had a huge traffic load. Our slow days were still millions of unique visitors. When we took a machine out of DNS (DNS round robin between 15+ machines), we'd see the traffic drop significantly in the first 5 minutes. When AOL finally saw our change, it would drop more. There would still be lingering people for about an hour, and then it would finally be idle.

    That was a pretty regular thing for us to do. We scaled our traffic to our various datacenters this way. We'd also load test lines and individual servers with it. If it looked like we were running into a bandwidth limitation, I'd throw a few hundred Mb/s down the line, and see how it performed. If it really was, we'd then switch everything away from it to other datacenters until the provider fixed it.

    In all those circumstances, in 5 minutes most (but not all) of the traffic moved. An hour from the change, the remainder had moved.

    I've seen this with my home provider. I let them handle DNS for my home machine, rather than doing it myself. I've made changes, and they don't respect it within 30 minutes. Within about an hour, the new DNS records show properly.

    Google's public DNS servers seem to do pretty well in that respect. Our changes are reflected properly there in just a few minutes. AOL, TimeWarner/RoadRunner, and a few others are pretty bad. I know why they do it (reducing load on their DNS servers), but it becomes a pain in the ass for places that need to make changes quickly.
   

Re:TTL value (2)

Kattare (528707) | more than 2 years ago | (#37880978)

Problem with any of these scenarios is that according to the AWS forum post, he's been getting rogue Netflix traffic for 4 days. No dns server or mainstream client is going to keep a 60 second TTL record for 4 days. It's either an issue at AWS completely unrelated to DNS, or an issue in Netflix clients. With it being in TV's, BluRay players, Xboxes, IOS, Wii's, etc... who knows what client the issue could be in... I wonder if the forum poster could capture the browser string and help debug?

Re:TTL value (0)

Anonymous Coward | more than 2 years ago | (#37883402)

> No dns server or mainstream client is going to keep a 60 second TTL

Sounds like you've never dealt with Microsoft garbage. You're lucky. We've found that the Microsoft Windows NT crappy DNS server has cached 5m TTL's for more than a month. Microsoft doesn't give a damn about the Internet or Internet standards. They will cache a 5m TTL for months.

Re:TTL value (0)

Anonymous Coward | more than 2 years ago | (#37887288)

Your servers are configured wrong. They sure do respect TTLs. I run over 100 of them.

Re:TTL value (1)

bsane (148894) | more than 2 years ago | (#37885826)

There are some clients that cache dns records until they're restarted. I've removed internet facing vips from dns and weeks later there are still 100+ clients making connections, the only thing that would stop them is a client restart.

Re:TTL value (0)

Anonymous Coward | more than 2 years ago | (#37889834)

And i have seen ISPs ignoring TTLs and setting their own for 24hours, leaving sites in the top 10 most visites(where I live) dead because we switched datacenters one evening. A call to them didn't help much as they were unwilling to flush those records. Although the next morning it was fixed so I assume they had some calls or their own people noticed something.

Re:TTL value (1)

JWSmythe (446288) | more than 2 years ago | (#37889920)

I think that was the primary motivation for Google setting up their public DNS servers (8.8.8.8, 8.8.4.4).

http://code.google.com/speed/public-dns/ [google.com]

This hasn't been fixed yet because... (0)

Anonymous Coward | more than 2 years ago | (#37880348)

Amazon still charges for the bad requests. They have no incentive to fix it.

Why no proxy? (1)

Florian Weimer (88405) | more than 2 years ago | (#37880370)

Why doesn't Amazon use a reverse proxy which performs additional checks and routes the requests to the right customer? (With Server Name Indication, that would work for TLS, too.) Without that, it's simply not possible to switch IP addresses quickly between non-cooperating targets.

Re:Why no proxy? (1)

SharkLaser (2495316) | more than 2 years ago | (#37880402)

Because Elastic Load Balancer isn't just for HTTP traffic, you can use it with any kind of traffic.

Re:Why no proxy? (1, Insightful)

TooMuchToDo (882796) | more than 2 years ago | (#37880532)

On top of that, their "Elastic Load Balancer" (just another bullshit "cloud" marketing term for their cluster of F5 load balancers at each availability zone) is just, as I mentioned, an array of F5 load balancers. They either a) don't support the functionality OP is speaking about, or, more likely, Amazon chooses not to support handling traffic in that way to simply operations.

Re:Why no proxy? (2)

cript2000 (2496368) | more than 2 years ago | (#37881474)

F5 supports that functionality. EC2 is not built on any commercial LB vendor.

Re:Why no proxy? (0)

Anonymous Coward | more than 2 years ago | (#37883568)

You're pretty good at being wrong

Re:Why no proxy? (0)

Anonymous Coward | more than 2 years ago | (#37880608)

Why doesn't Amazon use a reverse proxy which performs additional checks and routes the requests to the right customer? (With Server Name Indication, that would work for TLS, too.) Without that, it's simply not possible to switch IP addresses quickly between non-cooperating targets.

Better question, why not setup your own reverse proxy cluster on EC2 and have the ELB route traffic to your RPs. Then do all the WAF/URL rewriting/caching/etc you want before sending it to your app servers.

Re:Why no proxy? (1)

Florian Weimer (88405) | more than 2 years ago | (#37880866)

Does this really help if ELB misdirects requests? Or would this setup result in stable ingress IP addreses, so that ELB worked perfectly?

Re:Why no proxy? (1)

user32.ExitWindowsEx (250475) | more than 2 years ago | (#37882204)

Simple. You most likely still pay for misdirected traffic in that case.

Re:Why no proxy? (0)

Anonymous Coward | more than 2 years ago | (#37883430)

Simple. You most likely still pay for misdirected traffic in that case.

Yeah, but there's a lot of crap traffic out there anyhow...nothing can be done about that really. At least you can head it off at the pass before it gets back to your main operations though.

Charge both ways! (1)

abirdman (557790) | more than 2 years ago | (#37880434)

1. Write a load balancer
2. Sell it to customers until it breaks
3. Patent software anomaly
...
Profit!

Re:Charge both ways! (1, Informative)

TooMuchToDo (882796) | more than 2 years ago | (#37880512)

Actually, they didn't write the load balancer. They just bought F5s and integrated them with their infrastructure to change their configurations programmatically.

Re:Charge both ways! (0)

Anonymous Coward | more than 2 years ago | (#37881696)

You are 100% wrong. ELB is not built on a commercial LB platform.

Next you'll be claiming that EC2 is "using Amazon's spare server capacity."

Re:Charge both ways! (1)

TooMuchToDo (882796) | more than 2 years ago | (#37884298)

Which exactly is what they do, using Xen instances. Duh. RedHat built out their environment for them. This is not rocket science, and is all out on the web if you know how to Google, use LinkedIn, etc.

Re:Charge both ways! (1)

Kalriath (849904) | more than 2 years ago | (#37902328)

Just googled it - if Amazon were using F5, F5 don't know about it [f5.com] . And even if the original design was just using spare capacity, that simply is not the case now (after all, that would imply that if Amazon itself needed to ramp up demand it could - and would - simply annex the entire EC2 capacity to cover it. This is, obviously, not the case).

Re:Charge both ways! (1)

TooMuchToDo (882796) | more than 2 years ago | (#37902360)

They could've migrated away from them as part of their platform. My knowledge about it is 18-24 months old.

Re:Charge both ways! (1)

Kalriath (849904) | more than 2 years ago | (#37902244)

If they were F5s, they'd actually work. We use F5 here, and from looking at the config, Amazon would have to be literally incompetent to get such basic functionality wrong.

Easy fix below (1)

TSHTF (953742) | more than 2 years ago | (#37880602)

Use rewrite rules to do a 301 redirect to goatse.cx when the host is api.netflix.com!

Re:Easy fix below (1)

mysidia (191772) | more than 2 years ago | (#37881150)

Use rewrite rules to do a 301 redirect to goatse.cx when the host is api.netflix.com!

Why do that when the person to erronously receive the traffic could maybe do something profitable with that? Such as co-opt the Netflix API calls and display "video" or "messages" to convince the user to subscribe to a different service, netting $$ to the unintended target who received Netflix's requests

Re:Easy fix below (0)

Anonymous Coward | more than 2 years ago | (#37881214)

Because the goatse.cx redirect is fast to code up and works for all misdirects.
The netflix api co-opting is going to take more time than it will for all the misdirect clients to eventually pick up the new DNS entries and, obviously, only works with netflix misdirects not all misdirects.

Re:Easy fix below (0)

Anonymous Coward | more than 2 years ago | (#37882784)

Well, it would certainly help a joke that has been continuously funny for over a decade keep going!

Re:Easy fix below (0)

Anonymous Coward | more than 2 years ago | (#37883508)

You'd have to respond with a response the client was expecting and will actually interpret correctly, which is probably wrapped in an XML envelope.

For example:

<video>
      <pic>goatse.cx/some.jpg</pic>
      <url>goatse.cx/</url>
</video>

IPv6... (1)

Junta (36770) | more than 2 years ago | (#37880606)

In this scenario, IPv6 would alleviate the need to so aggressively reuse IP addresses in that scenario.

Of course, one wonders given the high amount of traffic if amazon is needlessly changing addresses. They probably should make more effort to have a tendency to be more persistent even beyond the 'promise' of the ttl. Sort of how in most DHCP servers, even when your lease expires you'll still often get the last address you had because the DHCP server retained it anyway unless pool exhaustion forces a change.

It seems every day an ugly wart of public 'cloud' hosting crops up. People with remotely interesting workloads should be wary.

fir5t.. (-1)

Anonymous Coward | more than 2 years ago | (#37880756)

'Yes' to any people 4lready; I'm exemplified by who are intersted non nigger patr0ns or a public club, Your own towel in

DNS caches for 4 days. (2)

Kattare (528707) | more than 2 years ago | (#37880764)

No dns server (or mainstream browser) caches something for 4 days when given a low TTL. I've seen some that cache for a few hours, maybe up to a day, but 4 days? Really? Something else is going on. I kind of wonder about the Netflix clients built into all those TV's, Mobile Phones, and DVD players.

AWS charges based on load (1)

sandytaru (1158959) | more than 2 years ago | (#37881984)

So if you're getting millions of requests that aren't actually meant for you, that could drive up your monthly bill as well as your traffic usage. Good thing they caught that...

Whodunnit? (1)

autocracy (192714) | more than 2 years ago | (#37883042)

Does this story come with any indication that their isn't a mixup on Netflix's part?

Re:Whodunnit? (1)

autocracy (192714) | more than 2 years ago | (#37883050)

... "there" isn't a mixup on their part. Honestly, it'd be great if the Slashdot API reacted in the same year that I clicked on preview.

Re:Whodunnit? (1)

pjt33 (739471) | more than 2 years ago | (#37884722)

The preview goes via Netflix.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>