Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Remus Project Brings Transparent High Availability To Xen

timothy posted more than 4 years ago | from the when-servers-go-south-a-song dept.

Software 137

An anonymous reader writes "The Remus project has just been incorporated into the Xen hypervisor. Developed at the University of British Columbia, Remus provides a thin layer that continuously replicates a running virtual machine onto a second physical host. Remus requires no modifications to the OS or applications within the protected VM: on failure, Remus activates the replica on the second host, and the VM simply picks up where the original system died. Open TCP connections remain intact, and applications continue to run unaware of the failure. It's pretty fun to yank the plug out on your web server and see everything continue to tick along. This sort of HA has traditionally required either really expensive hardware, or very complex and invasive modifications to applications and OSes."

cancel ×

137 comments

Sorry! There are no comments related to the filter you selected.

Already done by VMware (5, Interesting)

Lurching (1242238) | more than 4 years ago | (#30066950)

They may have a patent too!!

Re:Already done by VMware (2, Insightful)

palegray.net (1195047) | more than 4 years ago | (#30067242)

I'll bet a paycheck that prior art in various incarnations would handily dispatch any such patent. As for it already being done by VMware, a lot of organizations prefer a purely open source solution, and Xen works extremely well for many companies.

Re:Already done by VMware (1, Insightful)

illegibledotorg (1123239) | more than 4 years ago | (#30067294)

Yeah, and at a great price point. *rolleyes*

IIRC, to get this kind of functionality from ESX or vSphere you have to pay licenses numbering in the thousands of dollars for each VM host as well as a separate license fee for their centralized Virtual Center management system. I'm glad to see that this is finally making it into the Xen mainline.

Re:Already done by VMware (1, Insightful)

Anonymous Coward | more than 4 years ago | (#30067416)

To anyone who actually needs this kind of uninterrupted HA the cost of a VMware license is an insignificant irrelevance. Of course, it's nice that people can play around with HA at home now for free.

Re:Already done by VMware (0)

Anonymous Coward | more than 4 years ago | (#30067512)

Agreed. A specialized support team a phone call away is how to run a business. Time is the most expensive things to waste.

Re:Already done by VMware (1)

smash (1351) | more than 4 years ago | (#30068284)

+1 to this. And vmware support is *actually good*.

Re:Already done by VMware (1)

ckaminski (82854) | more than 4 years ago | (#30069484)

No its not. We had four 2.x hosts running VCenter 1.3 which would randomly hang every couple weeks. The ONLY solution was a hard power-cycle. They never could resolve the issue, saying upgrade to 3.x and Vcenter 2. Right. We're going to upgrade all 20 of our ESX hosts just because you can't resolve this problem...

Eventually we had to... retired the servers and the SAN it was connected to - problems never recurred. Great support, VMware.

Re:Already done by VMware (0)

Anonymous Coward | more than 4 years ago | (#30069590)

So you're saying that they gave you advice, you followed their advice, and the problem is gone? Sounds like terrible support to me.

Re:Already done by VMware (2, Insightful)

Bert64 (520050) | more than 4 years ago | (#30070570)

They bought a particular version of vmware, and paid vmware to support the setup they had bought and paid for...
VMware's method of providing support was to tell them to buy new expensive products... They failed to provide adequate support for the version they were actually being supported for...
If their product fails, then an upgrade to a working version should be free at the very least.

Re:Already done by VMware (1)

Anonymous Coward | more than 4 years ago | (#30067920)

I think your forgetting academic institutions, Startups, research groups and all the other organizations that would MUCH rather spend their money on other things than VMware when a free alternative is available.... Or any place that just wants to keep a pure open source environment.

For that matter why would anyone NOT want HA if they can get it easy and cheap?

Just because VMware has it does not in anyway reduce the significance of Remus making it easily available in Xen.

Re:Already done by VMware (0)

Anonymous Coward | more than 4 years ago | (#30067940)

Thus driving up the cost of everything, just to pad VMwares profit margin.

Go fuck yourself.

Re:Already done by VMware (1)

Per Wigren (5315) | more than 4 years ago | (#30070394)

To anyone who actually needs this kind of uninterrupted HA the cost of a VMware license is an insignificant irrelevance.

But now, we who don't actually need completely uninterrupted HA can have it anyway and as a bonus it will probably be easier to setup and maintain than a semi-custom "only one minute downtime"-HA solution. This is a good thing indeed.

Re:Already done by VMware (0)

Anonymous Coward | more than 4 years ago | (#30067956)

I just went through this at my company. VMWare calls it "Fault Tolerance" You're looking at $2K+ per CPU socket. A rack of 24 server with dual socket is over $50K in licences. Of course, that's just the hypervisor license with no support. Plus you need the management server licenses add another $xxK for 24 servers (I don't remember the cost on that. It was $6K for three hosts or something like that).

Re:Already done by VMware (1)

smash (1351) | more than 4 years ago | (#30068288)

And if downtime costs you 100k/hr, its a bargain. The support is also excellent, which is worth the price of admission, if FT is important to you.

That's at the point where in-house support works? (1)

Futurepower(R) (558542) | more than 4 years ago | (#30068752)

For $50,000 maybe you should develop in-house technical support, since it won't be just $50,000 in licenses, it will eventually be another $50,000 in support, perhaps.

Re:That's at the point where in-house support work (1)

smash (1351) | more than 4 years ago | (#30069344)

If you can get in house technical support available 24x7 that has the programmers of the product on hand to deal with it in a timely fashion, sure - go for it.

Nope (4, Insightful)

Anonymous Coward | more than 4 years ago | (#30067410)

Remus presented their software well before VMware came out with their product.

What's different now is that the Remus patches have finally been incorporated into the Xen source tree.

If VMware has any patents, they'll have to jump over the hurdle of being before the Remus work was originally published, which was a while ago.

Besides, Remus can be used in more ways than what VMware offers, since you have the source code.

Re:Nope (1)

Eli Gottlieb (917758) | more than 4 years ago | (#30070778)

What's different now is that the Remus patches have finally been incorporated into the Xen source tree.

Hear, hear! I spent my summer research internship this year incorporating Remus patches into the Xen source tree for use on a departmental project. It was two months of bloody hacking to make the patched source, the build system, and the use environment cooperate well enough to actually get a Remus system running and backing up its VMs over the network. We never got it perfect.

Re:Nope (0)

Anonymous Coward | more than 4 years ago | (#30070910)

That's plain WRONG. VMWARE demoed this at VMWARE 2007 while the REMUS paper wasn't published till 2008.

Re:Already done by VMware (3, Interesting)

TheRaven64 (641858) | more than 4 years ago | (#30067622)

I know that a company called Marathon Technologies owns a few patents in this area. A few of their developers were at the XenSummit in 2007 where the project was originally presented.

Re:Already done by VMware (1)

Howard Beale (92386) | more than 4 years ago | (#30068504)

We use our product with Marathon's everRun FT. Just starting to do load testing using the Xen with their 2g product. It looks nice, but the second layer of management gets to be a pain.

Re:Already done by VMware (1)

nurb432 (527695) | more than 4 years ago | (#30067718)

And it didn't require any "really expensive hardware, or very complex and invasive modifications" to do it. Not saying its going to run on some old beat up Pentium Pro from 10 years ago, but the hardware i see it run on every day isn't out of line for a modern data-center.

And it requires ZERO changes to the OS.

( at risk here of sounding like a Vmware fanboy, but come on.. at least they can present facts when tooting their horn )

Re:Already done by VMware (0)

Anonymous Coward | more than 4 years ago | (#30067760)

This? http://www.vmware.com/products/fault-tolerance/

Re:Already done by VMware (1)

smash (1351) | more than 4 years ago | (#30068276)

beaten. ESX 4.0 has vmware FT, and "lockstep" is patented i believe...

Re:Already done by VMware (0)

Anonymous Coward | more than 4 years ago | (#30068320)

This feature sounds a lot like what Tandem NonStop systems used to do...

Re:Already done by VMware (1)

uncqual (836337) | more than 4 years ago | (#30068548)

IIRC, NonStop applications had to be very involved in FT - initiating checkpoint type operations to the "backup" etc. Absolutely nothing like Remus.

And, IIRC, NonStop SQL wasn't one of those applications - that amused me.

Re:Already done by VMware (1)

Cheaty (873688) | more than 4 years ago | (#30068382)

This isn't lockstep. In storage terms, if you think of lockstep as synchronous replication, this is more akin to asynchronous snapshot-based replication. The metaphor falls apart a bit because the primary does wait for acknowledgment before modifying its external state (sending network packets or writing to disk), but can otherwise continue execution.

Re:Already done by VMware (1)

jipn4 (1367823) | more than 4 years ago | (#30068618)

This sort of stuff is far older; it goes back to mainframe days and supercomputing.

Furthermore, the idea of running two machines in lockstep and failing over shouldn't be patentable at all. Specific, particularly clever implementations of it might be, but those shouldn't preclude from others being able to create other implementations of the same functionality.

Re:Already done by VMware (1)

ckaminski (82854) | more than 4 years ago | (#30069510)

Prior art by HP which used to do this in Pentium-based Netservers?

Granted real hardware, as opposed to software, but perhaps?

Re:Already done by VMware (1)

oreaq (817314) | more than 4 years ago | (#30070722)

No. IBM has done this kind of things on mainframes 20 years ago. This stuff is actually pretty old.

RONALDO (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#30066974)

Brilha muito no Corinthians!

It's pretty fun (1)

ickleberry (864871) | more than 4 years ago | (#30067002)

It's pretty fun to yank the plug out on your web server and see everything continue to tick along. "

Or an ordinary, every day run of the mill 'off the shelf' plain jane beige UPS. or a Ghetto one [dansdata.com] , if you'd like.

Still its pretty cool, just wondering how much overhead there is by setting up this system

Re:It's pretty fun (1, Insightful)

Fulcrum of Evil (560260) | more than 4 years ago | (#30067032)

if it's a webserver, what's the big deal? Run 4 and if 1 drops off, stop sending it requests. For an app server, I can see the advantages.

Re:It's pretty fun (4, Informative)

Hurricane78 (562437) | more than 4 years ago | (#30067328)

Uuum... session management? Transaction management? The server dying in the process of something that costs money?
Even if it's something as simple as losing the contents of your shopping cart just before you wanted to buy, and then becoming angry at the stupid ass retarded admins and developers of that site.
Or losing the server connection in your flash game, right before saving the highscore of the year.

Webservers are far less stateless than you might think. Nowadays they practically are app servers. (Disclosure: I did web applications since 2000, so I know a bit about the subject.)

When 5 minutes downtime mean over a hundred complaints in your inbox and tens of thousands of dropped connections, which your boss does not find funny at all, you don't do that error again.

Re:It's pretty fun (0)

Anonymous Coward | more than 4 years ago | (#30067498)

Not sure how we do the coding for it, but we have a 4 web server setup, it round robins through them for everything. If the ports 80, 443 goes down, it gets marked as down in the CSS and it stops taking traffic. The customer might see an error page if they were on the server that failed, but a single press of F5 and they are back working, no lost sessions, no lost transactions. It can be done, otherwise thar be magic in them machines.

Re:It's pretty fun (3, Insightful)

Fulcrum of Evil (560260) | more than 4 years ago | (#30067536)

Webservers are far less stateless than you might think. Nowadays they practically are app servers. (Disclosure: I did web applications since 2000, so I know a bit about the subject.)

Webservers have no business being the sole repository for these things - the whole point of separating out web from app is that web boxes are easily replaceable with no state.

Session mgmt: store the session in a distributed way at least after each request. Transactions: they fail if you die half way through. Shopping cart: this doesn't live on a web server.

If you require all that state, how do you ever do load balancing? Add a web server and it's another SPOF.

When 5 minutes downtime mean over a hundred complaints in your inbox and tens of thousands of dropped connections, which your boss does not find funny at all, you don't do that error again.

That's right, you move the state off the webserver so nobody ever sees the downtime and tell your boss that you promised 99.9 and damnit, you're delivering it!

Re:It's pretty fun (1)

shmlco (594907) | more than 4 years ago | (#30069736)

"Session mgmt: store the session in a distributed way at least after each request."

Bingo. With your solution, a submitted page request will fail. In fact, every page request and connection being handled by that server when it fails will fail.

With the article's solution, things automagically switch over and everyone gets the data they requested. Users notice nothing.

"... so nobody ever sees the downtime..."

Except all of the users that clicked register or buy and get nothing at all.

Re:It's pretty fun (1)

Fulcrum of Evil (560260) | more than 4 years ago | (#30069928)

it's a choice between reliability and complexity, and complexity has its own reliability problems. Ideally, the HA solution is best, but it relies on a lot more than the simple solution. The users that get an error can try again and it will work. I did say that it's mostly useful for the app server layer, right?

Re:It's pretty fun (0)

Anonymous Coward | more than 4 years ago | (#30067640)

Not meaning to picking on you, but it sounds like bad coding practices are at fault. Have the session id get offloaded to a server that keeps track of the client and ties it to any transactions / what not that they need to use the web server for. Have the transactions logged into yet another server that allows the transactions to queue up. Have both of these session and transaction servers replicate to standby servers should they fail. In a group of web servers, as long as one web server is still up, you'll continue on, albeit slow, but at no lose of customer... anything.

The biggest inconvenience is teaching your customers to refresh the page if they get an error so they will be moved over to the working web server. I'll end this by saying I have no idea what I'm talking about when it comes to handling a flash game and getting it to handle these kinds of failures. But just plain old web servers that handle e-commerce or whatever, it is very doable to have it track session and transaction management not local to the web server it is running on. It adds to the complexity, but it will be more fault tolerant.

Re:It's pretty fun (1)

radish (98371) | more than 4 years ago | (#30067710)

Web servers are stateless and sit in front of app servers, which are stateful but which have their sessions propagated to at least one other instance. When a web server dies no-one cares, if an app server dies you just need to have some logic that allows the box which gets the next request in the session to either (a) redirect the request to the app server which was the back up for that session or (b) pull the session into it's own cache from the backup.

Re:It's pretty fun (0)

Anonymous Coward | more than 4 years ago | (#30067800)

LOL, remind your boss he needs to fire you and get a competent programmer. Doing web apps since 2000... your comment would be spot on if this was still 2000, but web apps these days are a lot more robust than you seem to realize. Sounds like you are having major outages when you have a single server down... and you seem to think that this is expected and unavoidable... scary.

Re:It's pretty fun (0)

Anonymous Coward | more than 4 years ago | (#30067914)

remind your boss he needs to fire you and get a competent programmer

Hell, in 2000 I was writing shitty webapps and if the webserver wasn't also running the database and if we had owned more than one webserver, the webserver could have died and nobody would notice.

Of course, nobody noticed when the company died either, so...

Re:It's pretty fun (1)

BitZtream (692029) | more than 4 years ago | (#30068066)

I don't know about you, but my web apps don't let the web server handle session and transaction management. Thats what I have a database server for, thats capable of dealing with thoses issues in a known way, that I can recover from to some extent. My important web apps use clusters of databases that take care of each other. Theres a reason Oracle costs a fortune, and MySQL is free. I can't stand working with Oracle, but theres a reason it exists. Of course you don't have to use Oracle, thats just one example, there are plenty of alternatives from other vendors and middleware to do the job.

Server dies in the middle of a process? Did the transaction complete? no? rollback. Yes? Good, the database is in a known good state with everyone being updated with proper information, live goes on, just a bit delayed.

You designed your system so your game state is totally client dependent or something? So if the web server dies theres no acceptable failure mode to revert to a previous relatively recent state? Must not be that important. If it is so important, why are you not saving the state are acceptable intervals to allow the user to restart from a reasonable point? Sounds like you're doing too much on the client, which probably means some massive security issues as well. Clients are never trusted for anything.

Nothing about a web app is new. We solved all of these issues years ago. I think the problem is that you're just learning about them and don't realize that this problems are actually rather old and there are known ways to deal with them.

I can turn off one of my web servers or database servers, literally killing tens of thousands of connections, and the worst case is a half a second of delay or so while the cluster removes it from the loop. The most the user sees is some web pages don't load some content. Any application that uses the web servers for access to the database will retry the request, seeing a different server which will be more than happy to handle the request. If the request was completed and the client couldn't be notified in time, the client will retry the request and be notified it was already completed and move on.

This stuff is 40 years old, not anything new and exclusive to web apps. In short, the problems you listed are because you're doing it wrong.

Re:It's pretty fun (1)

shmlco (594907) | more than 4 years ago | (#30069822)

"I can turn off one of my web servers or database servers, literally killing tens of thousands of connections, and the worst case is a half a second of delay or so while the cluster removes it from the loop. The most the user sees is some web pages don't load some content."

So if that server is running a shopping cart, then "thousands" of users might just have had their credit card submissions fail. They don't get confirmations and they don't know if the order went through or not. And I'd almost guarantee that your web server code sending the payment info to the payment processor and then updating the order database isn't atomic and will NOT fail gracefully, as two different systems are involved. Sure, your database may (repeat, may) have rolled back on a dropped connection, but that payment processor request to their server was applied and was NOT recorded.

In short, your scenario wasn't "worst case" at all.

If I were you I'd THINK about the consequences of such events occuring and not snidely assume that you have all the bases covered and that someone else is automatically "doing it wrong". In fact, I'll bet you a hundred bucks I could walk into your server room right now, pull just one power cord to the right switch, router, or load-balancer, and bring down the entire house.

Re:It's pretty fun (3, Insightful)

stefanlasiewski (63134) | more than 4 years ago | (#30067730)

In many cases, the webserver IS the app server.

This sort of feature could be very useful for those smaller shops and cheap shops who haven't yet created a dedicated Web tier, or for all those internal webservers which host the Wiki, etc.

Webservers also help with capacity. Run 4 and if 1 drops off, not a big problem. But what if half the webservers drop off because the circuit which powers that side of the cage went down? And the 'redundant' power supplies on your machines weren't really 'redundant' (Thanks Dell)?

Re:It's pretty fun (1)

mikep554 (787194) | more than 4 years ago | (#30068096)

In many cases, the webserver IS the app server.

This sort of feature could be very useful for those smaller shops and cheap shops who haven't yet created a dedicated Web tier, or for all those internal webservers which host the Wiki, etc.

If they are smaller/cheaper shops, they probably aren't playing around with heavy virtualization to begin with. If you are virtualizing your example box, you're doing it wrong.

But what if half the webservers drop off because the circuit which powers that side of the cage went down? And the 'redundant' power supplies on your machines weren't really 'redundant' (Thanks Dell)?

Get a better UPS setup. If you have entire racks of systems that fill a cage, and your servers all shut down because their power died, you're doing it wrong. Rather than plugging all of the servers into individual UPS systems, get a UPS that covers all the circuits for the cage. And a generator.

Re:It's pretty fun (1)

Fulcrum of Evil (560260) | more than 4 years ago | (#30068852)

or just hire someone to host your boxes, depending on what they're for.

Re:It's pretty fun (1)

smash (1351) | more than 4 years ago | (#30069350)

a UPS does not protect against CPU/motherboard/ram hardware failure. This sort of HA does.

Re:It's pretty fun (1)

Jeremi (14640) | more than 4 years ago | (#30069868)

Or an ordinary, every day run of the mill 'off the shelf' plain jane beige UPS. or a Ghetto one, if you'd like.

Sure, but power failure isn't the only thing that can stop your server from running -- it's just the easiest one to reproduce without permanently damaging anything. If you'd like a better example, yank the CPU out of your web server's motherboard instead. Your UPS won't save you then! :^)

Himalaya (2, Interesting)

mwvdlee (775178) | more than 4 years ago | (#30067042)

How does this compare to a "big iron" solution like Tandem/Himalaya/NonStop/whatever-it's-called-nowadays.

Re:Himalaya (1)

teknopurge (199509) | more than 4 years ago | (#30067168)

It doesn't. HP Non-Stop is a beast.

Re:Himalaya (2, Informative)

Jay L (74152) | more than 4 years ago | (#30067194)

I was just thinking that...

Tandems may still have other advantages, though; back in the day, we built a database on Himalayas/NSK because, availability aside, it outperformed Sybase, Oracle, and other solutions. (They implemented SQL down at the drive controller level; it was ridiculously efficient.) No idea if that's still the case.

But Tandem required you to build their availability hooks into your app; it wasn't transparent. OTOH, Stratus's approach is;a Stratus server is like having RAID-1 for every component of your server. I gotta think this will cut into their business.

Re:Himalaya (5, Interesting)

teknopurge (199509) | more than 4 years ago | (#30067282)

VM replication like this still has an IO bottleneck. This isn't magic: unless you move to infiniband you're not going to touch something like a Stratus or NonStop machine. By the time you add in the cost of the high-perf interconnects, you're on-par with the real-time boxes. All this convergence going on with people redesigning the mainframe but ass-backward with client/server gear. Makes little sense to me other than it being a gimmick.

By the time you get all the components that provide the processing and I/O throughput of those high-end boxes, the x86/64 commodity hardware cost advantage has evaporated.

Re:Himalaya (1)

Vancorps (746090) | more than 4 years ago | (#30067390)

Huh? We have a SAN son, you need more throughput? Add another 4 or 8gig trunk and bam you've added significant bandwidth. With individual blades having dual 8gig HBAs you have quite a bit of IO available to you assuming proper PCI-E. There is a upper limit where you shouldn't be virtualizing infrastructure but that limit is moving ever higher. I don't know about you, but I have a NetApp based storage array with redundant switching gear that is more than capable of keeping up with the IO of having 20 servers on a single physical host and that includes Oracle, Reporting services, Exchange, and a few other high IO applications. My security server recording our multi-megapixel security cameras and a backup Oracle database will stay outside the virtual environment for obvious reasons. Then of course there is our DR setup for basic business continuity.

Re:Himalaya (1, Informative)

Anonymous Coward | more than 4 years ago | (#30067532)

The IO bottleneck in this case is the interconnect between the two machines, not disk, so the SAN isn't relevant. VMware FT needs at least a dedicated GbE NIC for replay/lockstep traffic, I think the recommendation is 10Gb, and is still limited to using a single vCPU in the VM.

Re:Himalaya (1)

teknopurge (199509) | more than 4 years ago | (#30067548)

The fact you're comparing NonStop/Stratus to the IO of a SAN is comical. There's a reason you don't virtualize large RDBMS in production environments: they fall over.

Exchange is not a "high IO application". A high IO application is something like all the ATM transactions for Chase bank in North America. If you can have 20 servers on a single physical host you're doing it wrong: your apps aren't heavy by a long shot.

Re:Himalaya (3, Insightful)

Vancorps (746090) | more than 4 years ago | (#30069348)

Were you replying to my comment? Because it doesn't sound like you read my comment. I specifically said there are cut-off points where virtual infrastructure doesn't make sense.

Also, the fact that you think the IO of SAN is any different than that of an HP Non-Stop setup is where things get really comical because you're talking about Infiniband which is used in x86 hardware as well. As I said, the threshold is moving into higher and higher workloads.

I'm also not sure where you get your information about Exchange not being IO intensive. Exchange setups easily handle billions of transactions just like the big RDBMS out there. That's why when you evaluate virtual platforms they always ask you about your Exchange environment as well as your database environment. They are both considered to be high IO applications as all they do practically is read and write from disk.

I find the whole concept of your argument funny considering the Non-stop setups were early attempts at abstraction from the hardware to handle failure and be able to spread the load. In essence it was the start of virtual infrastructure. There is a reason Non-Stop isn't primarily part of HP's business anymore, people are achieving what they need to with commodity hardware. Sorry, but you do indeed save a lot of money that way too. Enterprise crap used to cost boat loads, now it is accessible to much smaller players with smaller workloads but the same demands for up-time.

Re:Himalaya (1)

Jeremi (14640) | more than 4 years ago | (#30069882)

By the time you get all the components that provide the processing and I/O throughput of those high-end boxes, the x86/64 commodity hardware cost advantage has evaporated

I think the potential savings comes not so much from the hardware as from not having to redesign/re-write your low-availability (tm) software from scratch in order to make it highly-available. Instead you just slap your existing software in to the new Remus VM environment, connect the backup machine, and call it done.

(Whether or not that method actually works in real life remains to be seen, of course, but that's the idea)

Re:Himalaya (0)

Anonymous Coward | more than 4 years ago | (#30067316)

I think Stratus still has some differentiation versus this approach since there's no hypervisor involved. However, this is very similar to what Marathon was already doing with Xen in their latest everRun products and this doesn't require the VM to be running windows.

Re:Himalaya (3, Informative)

Cheaty (873688) | more than 4 years ago | (#30067672)

Actually, after reading the paper, this is no threat to Stratus or other players in the space like Marathon or VMWare's FT. The performance impact is pretty significant - by their own benchmarks there was a 50% perf hit in a kernel compile test, and 75% in a web server benchmark.

This is an interesting approach and seems to handle multiple vCPU's in the VM which I haven't seen done by the software approaches like Marathon and VMware FT, but I think it will mainly be used in applications that would have never been considered for a more expensive solution anyway.

Re:Himalaya (1)

Anonymous Coward | more than 4 years ago | (#30067248)

How does this compare to a "big iron" solution like Tandem/Himalaya/NonStop/whatever-it's-called-nowadays.

Precisely.

It's actually pretty cool from a computing history aspect. Once upon our time, the mainframes were the bad-assed machines. Hot-swapping power supplies and core modules. Several nines of uptime. Now we're doing it in software.

I see it as a mirror to what's happening with data storage and the whole "cloud computing" thing. Going back and fourth between big hosted machines and dumb clients to smaller smarter machines. It's like we flip back and fourth every few years when it comes to computer ideology.

I guess, what I'm trying to get at is...I can't think of anything too insightful to say. The only thing that comes to mind is: It's pretty damn cool how old ideas become new ideas. How the archaic way of doing things suddenly finds a place with new technology.

Re:Himalaya (1)

mwvdlee (775178) | more than 4 years ago | (#30067738)

I'm not comparing this to mainframes in general, only to the "redundant" types.

This isn't going to compare to a general mainframe simply because it doesn't have the massive resources (cpu's, disk space, memory, bandwidth, etc).

A lot of the those Tandems aren't used like a typical mainframe though. Sure, they may offer more resources than this Remus project solution, but many Tandem applications don't need those resources, they only need the redundancy and as-near-to-100%-as-possible-at-any-expense uptime.

Another reply pointed to ATM machines. Indeed, these generally communicate with Tandems. The Tandems don't really do much with it, though. Mostly they just register the transactions, do some basic checks against non-realtime data and prepare the transaction to be handled by the actual bank systems housed on a different machine. The reason for this is simple economics; when comparing just performance, a Tandem is very expensive. It's wise to minimize the workload so as to minimize the amount of Tandem hardware required to run it.

The question is; will the Remus project be able to handle some of those traditiional Tandem workloads with similar quality?

Question (0)

Anonymous Coward | more than 4 years ago | (#30067076)

Not immediately clear on the Remus page... Is this like a constantly going "live migration" (without actually switching hosts) in that it _only_ keeps a copy of the memory of the guest? Or does this also keep a copy of the disk image? It'd be nice to not need shared storage just to be able to migrate without downtime...

Re:Question (1)

Kjella (173770) | more than 4 years ago | (#30067154)

I'd think that'd be the easy part, much easier than having shared storage. The synchronization to make sure writes against shared storage happened exactly once would be much harder.

Answer (5, Informative)

Anonymous Coward | more than 4 years ago | (#30067310)

I've worked with Remus, so I can answer your question.

It's not "constantly going" into live migration. The backup image is constantly kept in a "paused" state. It doesn't come out of the paused state until communication with the original is broken.

Until the backup goes live, the shadow pages for memory are updated, via checkpoints. The checkpointing interval is somewhat variable, but it's actually hardcoded into the Xen software (at present - this will change), regardless of what the user level utility tells you.

As it is, the subsecond checking doesn't work too well. But intervals of about 1-2 seconds works great. Getting subsecond checkpointing can be done (I've done it), but you need extra code than what Remus currently provides.

Similar comments are applicable to the storage updating. This works absolutely superbly if you're using something like DRBD for the storage replication.

Remus is pretty cool technology, and it serves as a very solid foundation for taking things to the next level.

The folks at UBC have done a superb job here, and should be well congratulated.

Intact? (4, Informative)

Glock27 (446276) | more than 4 years ago | (#30067078)

Intact is one word, O ye editors...

Re:Intact? (1, Informative)

Anonymous Coward | more than 4 years ago | (#30067664)

Infact, you're right!

Re:Intact? (1)

stefanlasiewski (63134) | more than 4 years ago | (#30067756)

Your complaint shows a lack of tact ;)

Re:Intact? (2, Funny)

martin-boundary (547041) | more than 4 years ago | (#30068386)

Intact is one word

That was before someone gave Romulus a shovel!

state transfer (3, Insightful)

girlintraining (1395911) | more than 4 years ago | (#30067110)

... Of course, this ignores the fact that if it's a software glitch, it'll happily replicate the bug into the copy. Also, there are certain hardware bugs that will also replicate: Mountain dew spilled on top of the unit, for example. There's this huge push for virtualization, but it only solves a few classes of failure conditions. No amount of virtualization will save you if the server room starts on fire and the primary system and backup are colocated. Keep this in mind when talking about "High Availability" systems.

On a different note, nothing that's claimed to be transparent in IT ever is. Whenever I hear that word, I usually cancel my afternoon appointments... Nothing is ever transparent in this industry. Only managers use that word. The rest of us use the term "hopefully".

Re:state transfer (0)

Anonymous Coward | more than 4 years ago | (#30067290)

On a different note, nothing that's claimed to be transparent in IT ever is. Whenever I hear that word, I usually cancel my afternoon appointments... Nothing is ever transparent in this industry. Only managers use that word. The rest of us use the term "hopefully".

You should know better than to make a claim like this.

http://www.newegg.com/Product/Product.aspx?Item=N82E16811166006

Re:state transfer (4, Funny)

Garridan (597129) | more than 4 years ago | (#30067424)

Mountain dew spilled on top of the unit, for example.

FTFS:

Remus provides a thin layer that continuously replicates a running virtual machine onto a second physical host.

Wow! This software is *incredible* if mountain dew spilled on top of one machine is instantly replicated on the other machine! I'm gonna go read the source immediately, this has huge ramifications! In particular, if an officemate gets coffee and I also want coffee, only one of us needs to actually purchase a cup!

Re:state transfer (2, Funny)

girlintraining (1395911) | more than 4 years ago | (#30069862)

Wow! This software is *incredible* if mountain dew spilled on top of one machine is instantly replicated on the other machine! I'm gonna go read the source immediately, this has huge ramifications! In particular, if an officemate gets coffee and I also want coffee, only one of us needs to actually purchase a cup!

I told them quantum computing was a bad idea, but nobody listened...

I told them quantum computing was a bad idea, but nobody listened...

I told them...

Re:state transfer (3, Interesting)

Vancorps (746090) | more than 4 years ago | (#30067442)

If your primary and secondary systems are physically located next to each other then they aren't in the category of highly available. Furthermore with storage replication and regular snapshotting you can have your virtual infrastructure at your DR site on the cheap while gaining enterprise availability and most importantly, business continuity.

I'll agree with being skeptical about transparency although how many people already have this? I went with XenServer and Citrix Essentials for it, I already have this fail-over and I can tell you that it works. I physically pulled a blade out of the chassis and sure enough, by the time I got back to my desk the servers were functioning having dropped a whole packet. Further tweaking of the underlying network infrastructure resulted in keeping the packet with just a momentary rise in latency.

Enterprise availability is fast coming to the little guys.

Re:state transfer (3, Informative)

bcully (1676724) | more than 4 years ago | (#30067516)

FWIW, we have an ongoing project to extend this to disaster recovery. We're running the primary at UBC and a backup a few hundred KM away, and the additional latency is not terribly noticeable. Failover requires a few BGP tricks, which makes it a bit less transparent, but still probably practical for something like a hosting provider or smallish company.

Re:state transfer (1)

Bender0x7D1 (536254) | more than 4 years ago | (#30067766)

How much bandwidth is needed for the connection on a per-machine basis? Asked another way - if I had 10 machines that I wanted to use this approach on, how fast of a connection would I need? At what levels of latency do problems start?

Re:state transfer (5, Informative)

bcully (1676724) | more than 4 years ago | (#30067844)

It depends pretty heavily on your workload. Basically, the amount of bandwidth you need is proportional to the number of different memory addresses your application wrote to since the last checkpoint. Reads are free -- only changed memory needs to be copied. Also, if you keep writing to the same address over and over, you only have to send the last write before a checkpoint, so you can actually write to memory at a rate which is much higher than the amount of bandwidth required. We have some nice graphs in the paper, but for example, IIRC, a kernel compilation checkpointed every 100ms burned somewhere between 50 and 100 megabits. By the way, there's plenty of room to shrink this through compression and other fairly straightforward techniques, which we're prototyping.

Re:state transfer (1)

Bender0x7D1 (536254) | more than 4 years ago | (#30069022)

Cool. Thanks for the info.

Re:state transfer (1)

Vancorps (746090) | more than 4 years ago | (#30069284)

Plenty of room for a Riverbed or Cisco WAAS in between to accelerate transfers as well. Sounds like you and I want to use the tech in similar ways.

For me, I don't mess with BGP yet, I can accomplish what I need through virtual links with OSPF. Won't be as smooth as my per site fail-over since I have two locations on site. It's a temporary setup so I have three locations, a primary at our event, a secondary at our event, and a third back at HQ with a fourth on its way for DR purposes. Sucks moving your network from city to city but at least it makes for some interesting problems.

Re:state transfer (2, Interesting)

shmlco (594907) | more than 4 years ago | (#30069682)

"If your primary and secondary systems are physically located next to each other then they aren't in the category of highly available."

High availability covers more than just distributed data centers. Load-balancing, fail-over, clustering, mirroring, reduntant switches, routers, and other hardware: all are zero-point-of-failure, high availability solutions.

Blakes 7 (0)

Anonymous Coward | more than 4 years ago | (#30067302)

Xen ? The computer of the Liberator?

How does it deal with replication latency? (2, Interesting)

melted (227442) | more than 4 years ago | (#30067398)

I'm pretty sure that if I just yank the cable, not everything will be replicated. :-)

Re:How does it deal with replication latency? (5, Informative)

bcully (1676724) | more than 4 years ago | (#30067480)

Hello slashdot, I'm the guy that wrote Remus. It's my first time being slashdotted, and it's pretty exciting! To answer your question, Remus buffers outbound network packets until the backup has been synchronized up to the point in time where those packets were generated. So if you checkpoint every 50ms, you'll see an average additional latency of 25ms on the line, but the backup _will_ always be up to date from the point of view of the outside world.

Re:How does it deal with replication latency? (1)

shentino (1139071) | more than 4 years ago | (#30067526)

How does remus handle things if it mispredicts the packets?

Supposing that it sends packet X, crashes, and then when it's restored from checkpoint it decides to send packet Y instead?

Schroedinger

Re:How does it deal with replication latency? (5, Informative)

bcully (1676724) | more than 4 years ago | (#30067582)

The buffering I mentioned above means that packet X will not escape the machine until the checkpoint that produced X has been committed to the backup. So when it recovers on the backup, X will already be in the OS send buffer. There's no possibility for misprediction. If the buffer is lost, TCP will handle recovering the packet.

Re:How does it deal with replication latency? (2, Interesting)

BitZtream (692029) | more than 4 years ago | (#30068200)

No it won't.

VMWare claims the same crap and its simply not true.

You have a 50ms window between checkpoints that can be lost, in your example . The only way to ensure no lost is to ensure that every change, every instruction, every microcode executed in the CPU on machine A is duplicated on B before A continues to the next one. You simply can't do that without specialized hardware since you don't even have access to the microcode as its executed on standard hardware.

50ms on my hardware/software can mean thousands of transactions lost. That can wreak havoc on certain network protocols and cause database operations to fail completely as you replay portions of transactions that the database has already seen.

I can come up with situations all day long as to how this isn't as seamless as you make it out to be. Sure, xclock transitions to the other machine in what appears to be a perfect no loss transition, or solitaire on a windows machine, but thats not exactly useful.

Remus has plenty of uses, but it has plenty of pitfalls and regardless of claims does require consideration when developing systems unless you're introducing latency that to me, would just be completely unacceptable and would require applications to be aware of the latency. Hell, thats 6.25MB of data that can be transmitted over a gigabit pipe between checkpoints. That can kill performance.

I know what you're saying, I know what you mean, and I just don't think you realize how much that latency can effect certain classes of applications.

Re:How does it deal with replication latency? (5, Insightful)

bcully (1676724) | more than 4 years ago | (#30068236)

I think you're missing the point of output buffering. Remus _does_ introduce network delay, and some applications will certainly be sensitive to it. But it never loses transactions that have been seen outside the machine. Keeping an exact copy of the machine _without_ having to synchronize on every single instruction is exactly the point of Remus.

Re:How does it deal with replication latency? (1)

convolvatron (176505) | more than 4 years ago | (#30068662)

this isn't true. a fully recoverable abstraction can be maintained without digging into
the architecture. you just need a point periodically where you flush everything and define a
consistent checkpoint

personally i prefer doing this in the database, or operating system, or application, but suggesting
that you cant do this underneath is simply wrong. it just comes down to performance

Re:How does it deal with replication latency? (4, Insightful)

Antique Geekmeister (740220) | more than 4 years ago | (#30068950)

If your application cannot tolerate a 50 msec pause in outbound traffic (which is what Remus seems to introduce, similar to VMWare switchovers) then you have no business running it over a network, much less over the Internet as a whole. Similar pauses are introduced in core switching and core routers on a fairly frequent basis, and are entirely unavoidable.

There are certainly classes of application sensitive to that kind of issue: various "real-time-programming" and motor control sensor systems require consistently low latency. But for public facing, high-availability services, it seems useful, and much lighter to implement than VMWare's expensive solutions.

Re:How does it deal with replication latency? (1)

Gazzonyx (982402) | more than 4 years ago | (#30069222)

Indeed. With the right (or more accurately, wrong) file system, IO scheduler, RAID layout, and workload, you can push your disk latency to well over 50 ms before it has a chance to get to the wire's buffer. The objective is to avoid hours of latency, not milliseconds. TCP/IP will take care of the road bumps if you make sure that the road doesn't stop at the edge of a cliff.

make dom0 support for recent kernels first (0)

Anonymous Coward | more than 4 years ago | (#30067616)

it is absolutely unbelievable that the official xen kernel is still 2.6.18. there's a lot of modern hardware that isnt supported by it. this is an absolute show stopper.

Wrong place to put a failsafe? (3, Insightful)

mattbee (17533) | more than 4 years ago | (#30068232)

Surely there is a strong possibility of a failure where both VMs run at once- the original image thinking it has lost touch with a dead backup, and the backup thinking the master is dead, and so starting to execute independently? If they're connected to the same storage / network segment, it could cause data loss, bring down the network service and so on. I've not investigated these types of lockstep VMs, but it seems you have to make some pretty strong assumptions about failure modes, which always break eventually commodity hardware (I've seen bad backplanes, network chips, CPU caches, RAM of course, switches...). How can you possibly handle these cases to avoid having to mop up after your VM is accidentally cloned?

Re:Wrong place to put a failsafe? (4, Informative)

bcully (1676724) | more than 4 years ago | (#30068270)

Split brain is a possibility, if the link between the primary and backup dies. Remus replicates the disks rather than requiring shared storage, which provides some protection over the data. But there are already a number of protocols for managing which replica is active (e.g., "shoot-the-other-node-in-the-head") -- we're worried about maintaining the replica, but happy to use something like linux-HA to control the actual failover.

Re:Wrong place to put a failsafe? (4, Interesting)

dido (9125) | more than 4 years ago | (#30068462)

This is something that the much simpler Linux-HA environment deals with by using something they call STONITH, which basically means to Shoot The Other Node In The Head. STONITH peripherals are devices that can completely shut down a server physically, e.g. a power strip that can be controlled via a serial port. If you wind up with a partitioned cluster, which they more colorfully call a 'split brain' condition, where each node thinks the other one is dead, each of them uses the STONITH device to make sure, if it is able. One of them will activate the STONITH device before the other, and the one which wins keeps on running, while the one that loses really kicks the bucket if it isn't fully dead. I imagine that Remus must have similar mechanisms to guard against split brain conditions as well. I've had several Linux-HA clusters go split brain on me, and I tell you it's never pretty. The best case is that they only both try to grab the same IP address and get an IP address conflict, in the worst case, they both try to mount and write to the same fiberchannel disk at the same time and bollix the file system. If a Remus-based cluster split brains, I can imagine that you'll get mayhem just as awful unless you have a STONITH-like system to prevent it from happening.

Re:Wrong place to put a failsafe? (1)

CharlyFoxtrot (1607527) | more than 4 years ago | (#30069178)

Sounds like a godawful mess, glad I've never had to deal with a split-brain. We manage mostly Solaris clusters and they're pretty good about panicking a node when there's a chance the cluster risks becoming inconsistent (loss of quorum). If you're already syncing disks like in this case it shouldn't be too difficult to set up a quorum device or HACMP-like disk heartbeats. Doesn't Linux-HA support this type of setup ?

So it replicates the state to the new machine (0)

Anonymous Coward | more than 4 years ago | (#30068266)

So it replicates the state to the new machine and then the new machine executes the same instructions and crashes the same way....

Re:So it replicates the state to the new machine (1)

JustinRLynn (831164) | more than 4 years ago | (#30069480)

This technology is meant to guard against physical layer problems (power, hardware failure) and not against software bugs.

Left VMware ESX for Xenserver 5.5 (0)

Anonymous Coward | more than 4 years ago | (#30068454)

I left VMware ESX 3.5 for XenServer 5.5 and I have never been happier.

I am running 4 DL585 servers with (so far) 42 production guests (Linux & win2k3)and have really great, more predictable performance .

If someone is running VMware and is worried about the cost or performance they need to consider Citrix XenServer.

I don't know how Dr. Breen is doing it. . . (1)

MagusSlurpy (592575) | more than 4 years ago | (#30069318)

but taking transparent high-availability to Xen [wikipedia.org] can't bode well for Gordon or the Vortigaunts. . .

Xen (0)

Anonymous Coward | more than 4 years ago | (#30070374)

Remus Project Brings Transparent High Availability To Xen

But does it solve those awful jumping puzzles?

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>