Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Virtualizing a Supercomputer

kdawson posted more than 4 years ago | from the slicing-up-the-pie dept.

Operating Systems 57

bridges writes "The V3VEE project has announced the release of version 1.2 of the Palacios virtual machine monitor following the successful testing of Palacios on 4096 nodes of the Sandia Red Storm supercomputer, the 17th-fastest in the world. The added overhead of virtualization is often a show-stopper, but the researchers observed less than 5% overhead for two real, communication-intensive applications running in a virtual machine on Red Storm. Palacios 1.2 supports virtualization of both desktop x86 hardware and Cray XT supercomputers using either AMD SVM or Intel VT hardware virtualization extensions, and is an active open source OS research platform supporting projects at multiple institutions. Palacios is being jointly developed by researchers at Northwestern University, the University of New Mexico, and Sandia National Labs." The ACM's writeup has more details of the work at Sandia.

Sorry! There are no comments related to the filter you selected.

virtualizing a first post! (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#31067620)

just finished materializing an obama!

Oblig. (0, Funny)

Anonymous Coward | more than 4 years ago | (#31067692)

Imagine a beowulf cluster of beowulf clusters of those! Pwoar.

Other way (5, Funny)

Wrexs0ul (515885) | more than 4 years ago | (#31067774)

This is virtualization... Imagine someone Imagining a beowulf cluster of those!


Re:Other way (1)

Hurricane78 (562437) | more than 4 years ago | (#31068856)

main = print ("Imagine" ++ si ++ " a beowulf cluster" ++ obc ++ " of those.")
si = " someone imagining" ++ si
obc = " of beowulf clusters" ++ obc

Oh, that's just super! (0)

Anonymous Coward | more than 4 years ago | (#31067748)

So if you're virtualizing a supercomputer on a supercomputer, would it not be better to call the host a "super-duper" computer?

Cool. (4, Funny)

John Hasler (414242) | more than 4 years ago | (#31067752)

Now we'll never need to build another expensive supercomputer. We'll just "virtualize" them on cheap desktops.

Oh. Wait...

Re:Cool. (1)

Mitchell314 (1576581) | more than 4 years ago | (#31067924)

Why virtualize a supercomputer when you can virtualize two for the same price of $19.95?

Hey, you're right! (1)

John Hasler (414242) | more than 4 years ago | (#31067956)

Imagine a Beowulf cluster...

Re:Hey, you're right! (1)

creimer (824291) | more than 4 years ago | (#31068638)

... at $19.95. I'll take a couple of those. :P

Re:Cool. (3, Interesting)

TubeSteak (669689) | more than 4 years ago | (#31068014)

Now we'll never need to build another expensive supercomputer. We'll just "virtualize" them on cheap desktops.

I think you've got it backwards.
Now we're virtualizing cheap desktops on supercomputers.

What they're doing only makes sense if 5% of 4096 nodes* is cheaper than coding your app to run natively on the supercomputer.
Like really big hard drives, when you get up to supercomputer levels of performance, 5% is a lot to give away.

*Anyone know exactly what a node entails?

Re:Cool. (3, Informative)

Tynin (634655) | more than 4 years ago | (#31068210)

*Anyone know exactly what a node entails?

A node is generally just a fancy name for a computer in a cluster. Nodes don't always need a OS locally (getting it via PXE), and may have some special hardware. But honestly in my experience, a node is a node if the systems architect wants to call it one.

Re:Cool. (1)

creimer (824291) | more than 4 years ago | (#31068654)

A supercomputer running 4096 copies of Windows will probably take a significant performance hit of more than 5%.

Re:Cool. (1)

LoRdTAW (99712) | more than 4 years ago | (#31069148)

*Anyone know exactly what a node entails?

At the very least: CPU + RAM. Also of course some glue logic (chip set), firmware (BIOS) and an interface to the rest of the cluster (networking).

so they are 'only' wasting 200 machines (1, Insightful)

Anonymous Coward | more than 4 years ago | (#31067782)

5% may not sound like mubh, cut with 4096 nodes that's over 200 nodes that they are wasting.

Re:so they are 'only' wasting 200 machines (3, Interesting)

Barny (103770) | more than 4 years ago | (#31067832)

Well, not sure how good they are now, but back when I studied at Uni we examined a few super-computer clusters and the rule of thumb in most cases was 1 CPU core per node was stuck doing IO for that node anyway, this was all before the move to Hypertransport with AMD though, so it may be much different for them now.

The fact was, it was a number that was constant, it wouldn't get worse with more nodes, it was always x nodes lost per y nodes, as this is. Just add more nodes :)

A worse problem would be if it was x^2 nodes per y nodes, then you're just throwing away money adding more.

Re:so they are 'only' wasting 200 machines (1)

dbIII (701233) | more than 4 years ago | (#31068740)

It depends if the job is cpu bound or I/O bound.
My skepticism comes from overhead being "only" 5% is likely to be "only" an extra eight hours for a week long job to run. With CPU bound stuff you want to be as close to the metal as you can get and still have the stuff run.

Re:so they are 'only' wasting 200 machines (1)

Barny (103770) | more than 4 years ago | (#31082242)

Yeah, but if its IO bound, it should probably be re-written :)

Why? (1)

Darkness404 (1287218) | more than 4 years ago | (#31067798)

What is the point of virtualizing a supercomputer? A 5% performance loss is a pretty big loss, in say a cluster of 100 computers, 5 of them would be wasted translating to thousands of dollars lost with little to show for it.

Re:Why? (0)

Anonymous Coward | more than 4 years ago | (#31067806)

Because you don't have to spend weeks adapting code specific to that machine. Use the same program you run at home in 1/10000 the time.

Re:Why? (0)

Anonymous Coward | more than 4 years ago | (#31067850)

You are aware that these supercomputers are running Linux? If you can already run an app on Linux and you are able to compile a static binary you should be able to run it. So, answer again. Why?

Re:Why? (1)

bridges (101722) | more than 4 years ago | (#31072236)

ACSI Red Storm normally runs a dedicated lightweight kernel called Catamount, not Linux. Similarly, the IBM BlueGene systems run the IBM compute node kernel, not Linux. Linux is used on some supercomputers, even some of the biggest ones (e.g. ORNL's Jaguar system) but the performance penalty of using Linux as opposed to a lightweigher kernel for some applications can be substantial(e.g. > 10%).

Re:Why? (0)

Anonymous Coward | more than 4 years ago | (#31076064)

[I'm the AC you replied to]

Yeah, there are some microkernels, but so what. Red storm may be catamount/qk, but the rest of the XTs out there are pretty much Linux. And even so, the compatibility between the apps running on the microkernels (be it Sandia's, IBMs, or another) and Linux are fairly decent. BSD style posix/libc stuff is there, and so a static binary will take you pretty far. The types of applications that are run on these machines were meant to be run in these environments, so I'm still very confused about the reason for virtualization. And there you go, pointing out that these microkernels can sometimes perform better than Linux on the compute nodes, but that just makes things more confusing. If they are concerned about the small performance gains, then surely they won't be virtualizing. So, what is the point virtualization in these environments?

Re:Why? (1)

bridges (101722) | more than 4 years ago | (#31078684)

Palacios lives inside the lightweight kernel host. Applications that want to run natively on the lightweight kernel without virtualization can at *no* penalty. Applications that are willing to pay the performance penalty of Linux can run Linux as a guest at a nominal additional virtualization cost. That way, applications that demand peak hardware performance get it, applications that need more complex OS services get it, and the downtimes associated with a complete system reboot are avoided.

In addiiton, the costs of something like Linux to a scientific application can be much higher for than many might expect. Cray's target was to get application performance on their Compute Node Linux within 10% of Catamount performance; they did so for most (but not all) of their apps as I understand it, but had to spend a significant effort to even get within 10%.

We're happy to leverage their hard work, however, so that users who want CNL can boot it on top of our VMM, while users who don't can get done faster or save some of their allocated cycles. I sometimes wonder if ORNL wished they had been running a VMM/LWK on Jaguar when Roadrunner beat them on the SC 2008 Top 500 list by 0.5%. Being able to use the lightweight kernel for Top500 Linpack runs and CNL for running apps that needed it might have come in handy for them then. :)

Finally, our experience has been that a small, simple, open-source LWK/VMM combination is a very powerful platform for OS and hardware HPC research - it provides a simple, understandable, and powerful base for addressing HPC systems problems (e.g. fault tolerance) without the complexity of trying to do that in, for example, Linux.

Re:Why? (1)

bridges (101722) | more than 4 years ago | (#31078734)

Doh, my mistake, Roadrunner beat Jaguar by a little less than 5% in the SC08 Top500 list, not 0.5%. Still, I do wonder. :)

Re:Why? (5, Interesting)

Spazed (1013981) | more than 4 years ago | (#31067854)

Most of them would be running an application done in C/C++ or some other low level language with threading. The whole advantage of super computers isn't that they have an absurd ghz rating, but an insane amount of cores. This could be useful for testing how a network of desktop computers would work, which it sounds like from the summary they are doing.

TL:DR; Normal desktop software doesn't run faster on a super computer than on your 4 year old laptop.

Re:Why? (1)

the linux geek (799780) | more than 4 years ago | (#31068266)

It would be far more likely to be FORTRAN than a C derivative. Also, plenty of supercomputers, especially IBM pSeries based ones, do have very high clock speeds (4-5GHz) and a relatively small number of cores; recent Nehalem systems follow the same trend.

Re:Why? (1)

afidel (530433) | more than 4 years ago | (#31068502)

Uh, this was run on ASCI Red, a 38,400 core Opteron based system with each node having a dedicated communication processor attached to a 3D torus for flat 1:1 communications.

Re:Why? (1)

joib (70841) | more than 4 years ago | (#31069514)

Actually, no. ASCI Red [] was retired from service in 2005.

Re:Why? (1)

afidel (530433) | more than 4 years ago | (#31071538)

Sorry, Red Storm, my duh.

Re:Why? (0)

Anonymous Coward | more than 4 years ago | (#31068334)

Hey, finally a way to get around the N connections per browser and test your website.

Re:Why? (1, Interesting)

Anonymous Coward | more than 4 years ago | (#31067812)

Perhaps those 5 nodes only cost 50k.

How much would it cost to rewrite your one of a kind software and retest and verify it? There are other costs here that they are not letting us in on.

Re:Why? (1)

Darkness404 (1287218) | more than 4 years ago | (#31068030)

Not much if you run the program with an existing OS such as Linux. As for testing and verifying, I'd imagine for larger supercomputers it would be less and less of an issue while the 5% becomes more and more of an issue.

Re:Why? (1)

Anpheus (908711) | more than 4 years ago | (#31068122)

I have to admit to, ahem, "loling" at your response. I know open source has the benefit of driving down costs, but adapting your software from commodity hardware to enterprise hardware, and, to go even further and run it on esoteric and specialized hardware is expensive. Whether it's proprietary or not. In fact, it might even be cheaper to get a vendor to rewrite their proprietary code because they've got teams of devs that already know the software in and out. Paying an outside team to write an existing application is always cost prohibitive.

If they can make a supercomputer appear to be a huge cluster of commodity machines, that's pretty big. It's big because it enables that easy scale-up from commodity to esoteric hardware.

Who knows, if it works well enough we might see Google change their minds and deploy a supercomputer because of the higher bandwidth interconnects than commodity hardware currently supports. The reason no one runs line of business on a supercomputer is because they're very nearly one-off deals. At least with a mainframe you know IBM (or whoever) will allow you to keep writing them checks to maintain and provide an upgrade path. Supercomputers are far more rarely upgraded, I think they typically run until they're obsolete.

Re:Why? (1)

afidel (530433) | more than 4 years ago | (#31068548)

ASCI Red was upgraded twice for a performance increase of 685%-564% depending on if you want to talk Peak or usable.

Re:Why? (1)

Anpheus (908711) | more than 4 years ago | (#31071338)

And that's a relatively isolated example. Most of the entries on the top 100 supercomputers today will not be there in five years or ten years. They will probably not even be on the top 500 list at all within ten fifteen.

No one wants to run their business apps on such volatile hardware. For scientists doing one-off simulations, one-off hardware is fine.

Re:Why? (1)

PopeRatzo (965947) | more than 4 years ago | (#31068602)

There are other costs here that they are not letting us in on.

Pizza and 2-liter bottles of Nos, for example.

Re:Why? (4, Insightful)

John Hasler (414242) | more than 4 years ago | (#31067974)

> What is the point of virtualizing a supercomputer?

They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.

Re:Why? (1)

mhajicek (1582795) | more than 4 years ago | (#31068458)

Plus they could simulate a system of multiple computers communicating and analyze the behavior of the system as a whole.

Re:Why? (1)

JBird (31996) | more than 4 years ago | (#31070440)

They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.

Sounds like the supercomputer in Greg Egan's short story Luminous. It was basically built from light and was reconfigured specifically for each different application.

Re:Why? (0)

Anonymous Coward | more than 4 years ago | (#31070954)

I have always thought virtualization would be good in that I could deploy the packages I needed on the fly, turn-key, as the job requires. Every job requires a specific set of libraries and parameters in many cases. If the underlying interconnects are dealt with at a base level, all I need to do is send a config out to the cluster that matches the job I want to run to as many nodes as I need it to run on. Also, many times a super computer is not utilizing all resources for one job, you have dozens of jobs runnign at the same time, maybe each with a different set of requirements. In fact i could see performance improvements in the jobs because the jobs dictate the os and infrastructure and not the other way around.

Just my two cents.

Re:Why? (1)

LeadSongDog (1120683) | more than 4 years ago | (#31080288)

They'll be able to reload the image of your stellar evolution simulation in a few seconds after the guy doing nuclear weapons simulations has had his time. Never mind that the two simulations don't even run under the same OS.

His parents let him set off nuclear weapons in their basement? Woaw!

Re:Why? (1)

PopeRatzo (965947) | more than 4 years ago | (#31068580)

What is the point of virtualizing a supercomputer?

So that if the supercomputer crashes, it won't bring down uTorrent running in the background and mess up their seeding of Animal Collective's Merriweather Post Pavilion.

Why do you think?

Re:Why? (1)

Nite_Hawk (1304) | more than 4 years ago | (#31072034)

I work for a supercomputer institute and am our resident grid/cloud junky. One of the reasons you might want to do this is to allow researchers to create virtual supercomputers on the supercomputer via advanced reservations for simulation runs. There's a variety of reasons that this can be useful. Some times software doesn't play nicely with other software on the system or requires specific versions of libraries (or even specific OSes). You may also want to test in an environment where you have control over the (virtualized) mpi stack so you can see how screwing around with it changes how your job runs. Having amazon EC2 compatibility on traditional clusters would be interesting as well.

Anyway, if you are interested in more, here's the globus (teragrid, open science grid, etc) project's entry into this arena: []

OSS ftw. (2, Interesting)

Asadullah Ahmad (1608869) | more than 4 years ago | (#31067838)

It is really pleasant to see more and more OSS projects which are being deployed at national level and large infrastructures.

Hopefully some less greedy company who benefit from such projects will start paying the volunteer developers. But then again, I have found that a lot of times if you are doing something as a hobby/interest/challenge, rather than because you were employed to do it, the outcome will be more refined and efficient. Though I have yet to experience the latter part first hand.

Re:OSS ftw. (0)

Anonymous Coward | more than 4 years ago | (#31067990)

meh, it'll never take off

Development is NOT open source, runs on VMware (0)

Anonymous Coward | more than 4 years ago | (#31082972)

Yeah, open source Palacios development would indeed be FTW, if it existed, but it doesn't.

While the Palacios code itself is open, the development image runs under VMware, which is closed tighter than a tight thing.

If you're looking for an open source development platform for VMMs, this isn't it.

Re:Development is NOT open source, runs on VMware (1)

bridges (101722) | more than 4 years ago | (#31085116)

Palacios can run on real x86 hardware or on QEMU. In fact, most of our development is done on QEMU, which is open source. The VMWare image was something we did on the original 1.0 release just to help people get started running it and haven't done since, but VMware has *never* been required for development.

first 4ost (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#31067976)

of America (GnAA) they st4rted to

not a good idea. (1, Interesting)

Anonymous Coward | more than 4 years ago | (#31068062)

Virtualizing a Supercomputer is never the correct solution. supercomputers have in their nature a system of managing lesser processes. that system could be extended rather than adding another virtual management system to run parallel to the existing management system burdened with maintaining it as another running process.

Re:not a good idea. (-1, Troll)

Anonymous Coward | more than 4 years ago | (#31068492)

So... eggheads at Sandia thinking virtualizing a supercomputer could be a useful tool versus some AC on /. who says virtualizing a supercomputer is never the correct solution... who to believe?

Think I'll go with the guys at Sandia.

Re:not a good idea. (0)

Anonymous Coward | more than 4 years ago | (#31068838)

I work in HPC and I agree with the anonymous parent. I don't get what these guys are doing. Even after skimming their docs I can't figure it out. None of the arguments made make much sense. They just don't present useful advantages, especially considering the owners of these machines and the types of applications they run. Do you know what the advantages are, or are you just blindly agreeing with the huge DOE lab?

Re:not a good idea. (2, Informative)

bridges (101722) | more than 4 years ago | (#31073838)

Virtualization offers a number of potential advantages. A paper we have had accepted to IPDPS 2010 that enumerates more of them, but a few advantages quickly:

1. The combination of a lightweight kernel and a virtualzation layer allows applications to choose which OS they run on and how much they pay in terms of performance for the OS services they needs. Because Palacios is hosted inside an existing lightweight kernel that presents minimal overhead to applications that run directly on it, applications that don't need the services (and overheads) of full-featured OS like Linux can run directly on the LWK/VMM with minimal overhead. On the other hand, apps or app frameworks that need higher-level OS services (e.g. shared libraries) can run the OS they need as a virtualized guest on top of the LWK/VMM. Because doing an actual kernel reboot on a machine like Red Storm is very time-consuming, (compared to a guest OS boot), this is a substantial advantage.

2. Mean-time-to-interrupt on some of the most recent large-scale systems is much less than a single day, and virtualization is potentially useful technique for addressing fault tolerance and resilience issues in HPC systems, assuming that its overhead at scale can be kept small.

3. A small open-source LWK/VMM combination enables a wide range of OS and hardware research on HPC systems both by being a small, understandable, low-overhead platform, and by providing a way to support existing HPC OSes and applications while enabling OS and hardware innovation.

4. A number of others I won't mention right now as they're being actively researched here at UNM, and by my colleagues at Northwestern and Sandia. ;)

SHIT (-1, Flamebait)

Anonymous Coward | more than 4 years ago | (#31068886)


Let me get this straight.... (1)

hesaigo999ca (786966) | more than 4 years ago | (#31071354)

The way virtualization works is it is a virtual layer spread across many nodes to avoid any down time when you get
one node that fails, the rest pick up the slack, and without having to stop the running systems. This is using linux architecture to
cluster many computers on the bottom layer, so as to have the look of one mega computer, when it actually is 100 computers or more...etc...

Then we get into supercomputing, which again uses clusters and usually uses linux, to be able to make all the computers act as if it was one big computer, giving the advantage of multi-processors to be able to calculate much faster common operations, etc....

Now combining the 2, we could ....what is the advantage again, of putting a cluster on top of a cluster, I need to understand, because I don't see it, either one of these are used to make a supercomputer per se, but one is virtual, the other is physical....
either case, the advantage is the same from both, but merging the 2, would have too much of a slow down if you ask me, with all the
backend needing to monitor the other backend to load balance , raid, etc.... it just seems like it is a test to see if you can do it, but would you get any real advantage out of it, I am not so sure....someone with knowledge of vms, and supercomputers , please enlighten me.

The untold story (0)

Anonymous Coward | more than 4 years ago | (#31071384)

If you look up their research paper, you will quickly find that important performance issues remain in the area of high performance communication.
Typically this is where supercomputers should excell at, e.g., with a point-to-point latency down to a microsecond,
medium-size message throughputs of tens of Gigabits per second, and really low overheads. You get what you pay for.

However, when you look up this aspect in the paper, they mention a 5 to 11 microsecond absolute overhead (not mentioning the relative one!)
and the graphs showing actual bandwidth measurement comparisons are suddenly log-based..
Agreed, virtualizing high performance communication is a difficult issue. No need trying to hide it this way.

Re:The untold story (1)

bridges (101722) | more than 4 years ago | (#31073196)

We're not trying to hide anything, and so I will admit to being surprised by this (anonymous) accusation. To address the anonymous coward's concerns, however:

1. Actual users of supercomputers care most about application run time because applications are what scientists run, not micro-benchmarks. As a result, our paper and research more generally focuses on the runtime penalty to real applications (e.g. Sandia's CTH code) as opposed to focusing on optimizing micro-benchmarks that aren't what real users of these systems care about.

2. Micro-benchmarks do provide useful information about the exact costs of various low-level operations, however, to the extent that they can show you what is causing the application slowdowns you do see. They also can potentially help understand how proposed changes might impact applications other than the ones we were able to run in our limited access to the production Red Storm system. Because of this, the paper the anonymous coward above refers to explicitly measures and presents micro-benchmark latency and bandwidth overheads. Specifically, it cites the latency cost on both Red Storm's SeaStar NIC (5 or 11 microseconds, depending on how you virtualize paging) and QDR Infiniband (0.01 microseconds). It also presents a bandwidth curve to fully characterize virtualization's cost over the full range of potential message sizes on SeaStar. (IB is less expensive to virtualize than SeaStar, because IB doesn't have interrupts that Palacios must virtualize on the messaging fast path where as SeaStar does, at least when running Cray's production firmware).

We're very up front about the costs of virtualization because we are well aware that there is no such thing as a free lunch. Virtualization provides a number of potential advantages in supercomputing systems, for example in terms of dealing with node failures, providing a small open-source platform for OS research and innovation on supercomputing systems, handling applications with different OS feature and performance requirements, and a variety of other things. However, it does come with a cost to applications and application scientists that has to be weighed against its potential benefits.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?