Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Red Hat & AMD Demo Live VM Migration Across CPU Vendors

kdawson posted more than 5 years ago | from the dude-where's-my-virtualization-business dept.

Red Hat Software 134

An anonymous reader notes an Inquirer story reporting on something of a breakthrough in virtual machine management — a demonstration (not yet a product) of migrating a running virtual machine across CPUs from different vendors (video here). "Red Hat and AMD have just done the so called impossible, and demonstrated VM live migration across CPU architectures. Not only that, they have demonstrated it across CPU vendors, potentially commoditizing server processors. This is quite a feat. Only a few months ago during VMworld, Intel and VMware claimed that this was impossible. Judging by an initial response, VMware is quite irked by this KVM accomplishment and they are pointing to stability concerns. This sound like scaremongering to me ... All the interesting controversy aside, cross-vendor migration is [obviously] a good thing for customers because it avoids platform lock-in."

cancel ×

134 comments

Sorry! There are no comments related to the filter you selected.

Bravo! (4, Funny)

Cornwallis (1188489) | more than 5 years ago | (#25676515)

I love to see things like this that give me a greater freedom to migrate off the major players.

Re:Bravo! (1)

greenhuey (1401575) | more than 5 years ago | (#25676603)

Give me liberty or give me death!

Re:Bravo! (0)

harry666t (1062422) | more than 5 years ago | (#25677039)

OK, sure

<aims a gun at greenhuey's head>

You know, it certainly doesn't bother me that I don't have the source code for this gun.

Re:Bravo! (1, Informative)

Anonymous Coward | more than 5 years ago | (#25680563)

With Obama at the helm, you may not have guns to protect your liberty, so death is more likely. ;)

Re:Bravo! (0)

diefuchsjagden (835254) | more than 5 years ago | (#25681285)

With Obama at the helm, you may not have guns to protect your liberty, so death is more likely. ;)

A gun only protects your liberty if you shoot first, there fore you aren't protecting anything(although the best defensive is often the offensive, you are taking someone else's liberty and no-one not even you has the right to do that Anonymous! Grow a set and open your eyes before you open your mouth!!)

Re:Bravo! (3, Informative)

2names (531755) | more than 5 years ago | (#25676849)

We have certainly come a long way when a Cornwallis supports freedom of the people. :)

Um (1, Insightful)

Colin Smith (2679) | more than 5 years ago | (#25676917)

The VM software vendor becomes "the major player".

As The Who's so insightfully titled song said "Meet the new boss. Same as the old boss."

 

Re:Um (4, Insightful)

Korin43 (881732) | more than 5 years ago | (#25678859)

Except they're doing it with KVM [wikipedia.org] , which is open source..

Re:Um (1)

Bearhouse (1034238) | more than 5 years ago | (#25680625)

Damn. Please someone re-write that Wiki entry to make it more friendly for our non-tech friends. It starts...

"Kernel-based Virtual Machine (KVM) is a Linux kernel virtualization infrastructure. KVM currently supports native virtualization using Intel VT or AMD-V. Limited support for paravirtualization is also available for Linux guests and Windows in the form of a paravirtual network driver[1], a balloon driver to affect operation of the guest virtual memory manager[2], and CPU optimization for Linux guests. KVM is currently implemented as a loadable kernel module although future versions will likely use a system call interface and be integrated directly into the kernel[3].

Architecture ports are currently being developed for s390[4], PowerPC[5], and IA64. The first version of KVM was included in Linux 2.6.20 (February 2007)[6]. KVM has also been ported to FreeBSD as a loadable kernel module[7]."

Re:Um (3, Insightful)

abdulla (523920) | more than 5 years ago | (#25683547)

Why should it be dumbed down? I don't go reading Biology articles and expect to know everything. That's why there are links to other articles explaining each bit in more detail.

This is still unreleased test demo's (4, Insightful)

Beached (52204) | more than 5 years ago | (#25676569)

The real beauty of this will come when the system automatically moves VMs to machines in case of hardware problems or when a system is underutilized. It would let you power down servers during non-peak times and save oodles of cash.

Re:This is still unreleased test demo's (3, Insightful)

Hercynium (237328) | more than 5 years ago | (#25676685)

Well, that kinda *is* the purpose of live VM migration... it's already being done, just not between systems with different processor types.

Re:This is still unreleased test demo's (4, Insightful)

TheRaven64 (641858) | more than 5 years ago | (#25677599)

They don't seem to have released many details of this. Migrating between x86-with-SSE and x86-without-SSE, for example, is pretty simple - you just need the OS or hypervisor to trap the illegal instruction exception and emulate. Migrating from x86 to x86-64 is pretty easy too - you just don't get any advantages from the 64-bit chip. Going the other way is really hard, and would need the hypervisor to trap the enter-64-bit-mode instruction and emulate everything until the mode was exited (difficult, slow, and probably pointless).

I read TFA when it first came out and couldn't work out exactly what they were claiming was novel. Migrating between very-slightly-different flavours of x86 is not really that hard. Migrating between ARM and x86 would be incredibly hard - Xen can actually do this with the P2E work (not sure if it ever made it in to trunk), which migrated a VM from real hardware in to QEMU but, again, that's not an ideal solution unless the emulator has traps that userspace can use - for example a Java VM might get a signal after migration, flush its code caches, and re-JIT as x86 code instead of ARM.

Re:This is still unreleased test demo's (2, Interesting)

sirsnork (530512) | more than 5 years ago | (#25679077)

Between different vendors is actually quite hard. Live migration requires saving the CPU state exactly, including all registers. Going to a different vendors CPU means all this saved state may not match up and then you have to do something so the VM won't just crash. This is actually becoming _harder_ as more and more virtualization technology is being put into the CPU silicon (Intel VT, AMD-V etc). Each new series has a few more features to make virtualization simpler, and you have to deal with making sure what was available to the VM on one CPU is identical to whats available on the new CPU without destroying performance (which is what will happen if you start emulating).

In saying that, VMWare are very very VERY careful with the tech they introduce, to give you an example round robin network teaming is still "experimental". I'm fairly sure they have played with this internally already and not done it either because it would make support harder or because of the changing CPU landscape with regard to the integrated virtualization features on new CPU's they would need to release a new version for each new CPU release for this to continue working.

Make no mistake, this is big news for KVM and well done to them, but if they can make it work reliably so can anyone else, and that includes VMWare

Re:This is still unreleased test demo's (1)

ampman (91479) | more than 5 years ago | (#25682189)

As posted before, it seems to have been done along time ago see: http://www.byte.com/art/9407/sec6/art1.htm

Re:This is still unreleased test demo's (1)

Chris Snook (872473) | more than 5 years ago | (#25682399)

The really cool thing is the hardware support for masking CPUID calls from guests, so you don't have to emulate them in the hypervisor, which the VMware people in this thread have pointed out adds measurable overhead on some workloads. This lets you present a generic x86_64 CPU to the guest, which will run most non-HPC enterprise apps just fine. SSE2 is a mandatory part of the x86_64 instruction set, so all x86_64 processors will be able to get decently optimized math that can be live-migrated between different sub-architectures and processor revisions. You incur a slight performance hit on the older hosts, but this feature makes it easier to migrate away from them, which makes migrating to the new AMD processors much more attractive to people with large virt farms.

Re:This is still unreleased test demo's (-1, Troll)

Anonymous Coward | more than 5 years ago | (#25676925)

No, there is no real beauty in this. It's a hack intended to work around a kludge.

Why in the *hell* would you even think of putting different vendors in any (let alone virtualization) kind of cluster? That's rule number one in large-scale computing--make all the hardware identical, and if that's not possible, make it as similar as you can. Consistent platforms are reliable platforms.

This is like blowing the engine in a Ford and electing to put a Chevy engine in to replace it. Stupid.

Re:This is still unreleased test demo's (3, Interesting)

voidptr (609) | more than 5 years ago | (#25677047)

This is like blowing the engine in a Ford and electing to put a Chevy engine in to replace it.

While still driving down the highway at 60 mph.

Re:This is still unreleased test demo's (-1, Troll)

Anonymous Coward | more than 5 years ago | (#25677465)

I would agree with your analogy except that if I was stupid enough to buy a Ford I would deserve it to break and then I could redeem myself with buying a much better motor. (Chevy) Moving to a more reliable platform (Chevy) is never stupid.

Re:This is still unreleased test demo's (4, Interesting)

Comatose51 (687974) | more than 5 years ago | (#25677133)

You mean like VMware's VMotion, HA, and DRS functionalities?

Re:This is still unreleased test demo's (3, Insightful)

JEB_eWEEK (549975) | more than 5 years ago | (#25677703)

Yes, except without requiring identical hardware.

Re:This is still unreleased test demo's (3, Informative)

nabsltd (1313397) | more than 5 years ago | (#25678987)

VMware doesn't require "identical" hardware to do live migration, either.

It does have to be similar enough, which at this point pretty much means just the same processor manufacturer. As long as the processor supports the hardware virtualization, then VMware will allow you to set up a cluster that will allow live migration with no issues.

Re:This is still unreleased test demo's (1)

drachenstern (160456) | more than 5 years ago | (#25677823)

Er yeah, but by a proprietary vendoooorrrr, eh... I see what you did there ;)

I think the goal is to eventually open-source the concepts, and sell the wrappers. And the support, always sell the support...

I have to say tho, that I thought the whole point of CPU ISA was to be able to do just this sort of thing. If you're not writing code that absolutely depends on the underlying CPU hardware (why would you, isn't that the point of the kernel) then you should be able to move to any other platform... Okay okay, so there's the whole 32-bit -> 64-bit snafu, but that's because we're talking paradigm shifts.

What I'm curious about is the Xeon -> Itanium2 shift... And naturally the reverse as well =D

Re:This is still unreleased test demo's (1)

AJWM (19027) | more than 5 years ago | (#25678309)

Since Itanium2 will run x86 code sort of natively, going Xeon->Itanium shouldn't be that hard. Migrating a VM that's running IA-64 code to a Xeon could be a little tougher.

Re:This is still unreleased test demo's (1)

virtualboy (1402343) | more than 5 years ago | (#25679439)

Save your self some money and check out Virtual Iron. It does not require identical hardware.

Umm... (2, Interesting)

frodo from middle ea (602941) | more than 5 years ago | (#25676647)

All the interesting controversy aside, cross vendor migration is [obviously] a good thing for customers because it avoids platform lock-in Well almost all VM products barring VirtualPC do indeed supoort running the same VM image on across various vendor platforms, in fact that is the whole point of a VM , isn't it ?

The fact to highlight is that the migration was done of a live VM without disrupting the VM's operations.

Re:Umm... (1)

MBGMorden (803437) | more than 5 years ago | (#25677025)

It's not a matter of it RUNNING on multiple platforms. The issue here is live migration. Moving a running VM from one machine to another without skipping a beat. On most other setups you'd have to shut the VM down and then restart it on the other machine for it to work correctly.

Re:Umm... (2, Informative)

TheRaven64 (641858) | more than 5 years ago | (#25677649)

On most other setups you'd have to shut the VM down and then restart it on the other machine for it to work correctly

Do you? I first saw Xen demo live migration in 2005, and I don't think it was new then. Their demo had a Quake server being thrown around a cluster without clients noticing. Downtime was well under 100ms. You can read the paper [cam.ac.uk] for more information.

They were claiming that you can move between processor types, but they didn't specify how much different they could be. If it's just a matter of SSE or 3DNow! support disappearing then that's not a hard problem - just trap-and-emulate any of the old instructions. Relaunching programs that use these will cause the new values of CPUID to be picked up.

Re:Umm... (1)

MBGMorden (803437) | more than 5 years ago | (#25678331)

VMotion has been around for quite a while. The specialty here is between different processor types, and it's apparently not as trivial as you state. For one, there are different extensions and such between various processor types. Sure everything can be compiled for i386 and run on anything, but we're talking about arbitrary code that can be running on these VM's. There's a whole lot that can be different beyond their commonality, and if you resort to trapping and emulating all those instructions then you end up with as much as an emulator as a hypervisor at that point, and you've basically defeated the purpose.

Re:Umm... (2, Informative)

nabsltd (1313397) | more than 5 years ago | (#25679085)

And, when you think about it, any instruction that you would have to trap if the VM used to be running on a different processor must be trapped at all times.

This is because you have no way of knowing which processor type the VM was first started on. When this happened, it's likely the OS did some hardware checking and figured out which instructions it could (and could not) use. Moving the VM isn't going to change what the OS believes is the processor, and that's the problem.

Overall, VMware's Enhanced VMotion Compatibility method of lying to the OS about the capablilities of the processor seems to be the easist way of doing this. But, they only do it within one CPU manufacturer, because otherwise you'd end up with a very low-featured virtual processor.

Re:Umm... (1)

online-shopper (159186) | more than 5 years ago | (#25681301)

Isn't this like what transmeta did, except in software?

Xen 3.3 supports this already (3, Informative)

stabe (1133453) | more than 5 years ago | (#25676659)

Xen supports this feature since Xen 3.3, it is called CPUID: http://www.nabble.com/Xen-3.3-News:-3.3.0-release-available!-td19106008.html [nabble.com] No real breakthrough here...

Re:Xen 3.3 supports this already (1)

Vendetta (85883) | more than 5 years ago | (#25676781)

Xen supports this feature since Xen 3.3, it is called CPUID: http://www.nabble.com/Xen-3.3-News:-3.3.0-release-available!-td19106008.html [nabble.com] No real breakthrough here...

Looks to me like Xen supports migration between different CPU models, not entirely different CPU manufacturers. So yes, there is a breakthrough here.

Xen does migration, but not Live... (4, Informative)

LinuxGeek (6139) | more than 5 years ago | (#25677085)

This is a demo of a Live migration, no shutdown or reboot involved. Xen does not support the live migration of a running VM between an AMD and Intel server. Watch the video, they are running a video in the VM that keeps playing during the migration. Very impressive stuff.

Re:Xen does migration, but not Live... (0)

Anonymous Coward | more than 5 years ago | (#25677509)

There is no difference between live migration and save\restore in this regard.
So yes, you can do the same demo now using Xen 3.3.

Re:Xen does migration, but not Live... (0)

Anonymous Coward | more than 5 years ago | (#25678559)

Maybe Xen doesn't officially support cross-platform live migration, but I did live migration using Xen over two years ago (I think on Fedora Core 5!) back and forth between a 32 bit Intel CPU and an Opteron. And it was live, very live. I preserved several ssh sessions, and several cpu-intensive tasks.

Still x86 only (3, Insightful)

boner (27505) | more than 5 years ago | (#25676715)

Real magic would have been demonstrating a move between ANY processor architecture - Power, SPARC, x86_64 etc..

Between x86 processors is nice, but not unexpected.

Re:Still x86 only (1)

Hercynium (237328) | more than 5 years ago | (#25676745)

No problem! Just run x86 linux under qemu on all physical platforms, then run your applications under x86 linux inside a kvm inside qemu with migration between the qemu instances on each physical system!

Re:Still x86 only (1)

Atti K. (1169503) | more than 5 years ago | (#25679167)

Putting aside the huge performance penalty, I wonder if qemu can emulate the cpu virtualization support needed for kvm...

Ok, yes, I know, whooosh ;)

Re:Still x86 only (1)

corsec67 (627446) | more than 5 years ago | (#25676769)

That is true, but wouldn't you run into a major performance hit when running x86 software on other processors, assuming it didn't just blow up?

Seems like this would work between processors with a very similar ISA [wikipedia.org] .

If they could run stuff compiled for one processor on another processor with a different ISA at near full speed,... that would change more than just virtualization. Run Wine on a PowerPC, emulate old consoles easily on a Pandora [openpandora.org] , etc..

Re:Still x86 only (1)

NormalVisual (565491) | more than 5 years ago | (#25677525)

That is true, but wouldn't you run into a major performance hit when running x86 software on other processors, assuming it didn't just blow up?

Most definitely. At that point, you're emulating, not virtualizing.

Re:Still x86 only (2, Interesting)

TheRaven64 (641858) | more than 5 years ago | (#25677685)

Depends. Modern emulators can run at around 50% of the host platform speed. If your guest is paravirtualised then all of the privileged instructions will be run in the hypervisor. If you're running a JIT in the guest then you can poke it to flush its code caches and start emitting native code for the new architecture, but even if you aren't then migrating the VM from the 200MHz ARM chip in your cell phone to the quad-core 4GHz x86 chip connected to your TV might be interesting.

Re:Still x86 only (1)

TheLink (130905) | more than 5 years ago | (#25679379)

That's doable with emulation but you will take a performance hit. I don't think there's a good way to do it without a lot of emulation.

I don't see a practical reason for cross platform "live" moves.

Switching within a platform class is likely to be far far more useful.

With cross architecture switching, it's going to be a lot harder to use the strengths of the CPUs.

Say you're on x86 and using SSE, then you switch to SPARC, what are you going to do then?

Or you're on UltraSPARC T2 and using the eight encryption engines, then you switch to x86, what do you do then?

Re:Still x86 only (0)

Anonymous Coward | more than 5 years ago | (#25680871)

Try http://www.byte.com/art/9407/sec6/art1.htm but no one was interested then, why now?

vm migration security? (0)

Anonymous Coward | more than 5 years ago | (#25676717)

but is it secure [oberheide.org] yet?

This was in all likelyhood faked. (5, Funny)

Anonymous Coward | more than 5 years ago | (#25676773)

Open source is for morons.

Only Apple has the engineering know-how and skills to pull of something like this. The fact that they have not done so to date is a clear indication that it is impossible.

check the graphs... (5, Interesting)

alta (1263) | more than 5 years ago | (#25676801)

Go to 4:05 in the video. On the far left, you can see from the blue intel line that the guest is running there, then they migrate, and the blue line goes to the idle point, and the orange line starts taking the load. But NOTICE, the AMD line is consistantly higher than the intel line was. I'm no intel fanboy... or AMD. I have both intel and amd servers in my racks. I just thought it was interesting, and I'm surprised they let the video go out like that.

Re:check the graphs... (1)

Loibisch (964797) | more than 5 years ago | (#25677167)

Hehe, I checked the same thing. :)

To be fair, the performance of playing a HD video is pretty much determined by your graphics card. It's not really the best CPU benchmark you could imagine. :)

Re:check the graphs... (2, Interesting)

wanderingknight (1103573) | more than 5 years ago | (#25678573)

GPUs have nothing to do with video decoding, it's handled 100% by the CPU. At least until we get a software that can reliably take advantage of the relatively recent introduction of h264 decoding on some high-end GPUs.

Re:check the graphs... (1)

Loibisch (964797) | more than 5 years ago | (#25679169)

Sure they don't.

Hardware H264 encoding is available and working.

Also go mess around with different video displaying options (overlay, x11...or for windows the various VMR revisions) and watch the CPU load go up and down.

It's not _all_ the CPU, so it'S bullshit as a CPU benchmark, especially on guaranteed-to-be-different systems.

Re:check the graphs... (1)

nschubach (922175) | more than 5 years ago | (#25677187)

I can't watch the video right now, so I'm assuming the graph is processor utilization?

Could it possibly be because the AMD processor is running some kind of instruction translation, communication layer, or something like that?

Re:check the graphs... (1, Interesting)

Anonymous Coward | more than 5 years ago | (#25677319)

(1) It didn't seem clear to me how many VM's each box was running. Could very well be that the Shanghai box was already doing quite a bit before the migration.

(2) There's a reason Shanghai isn't available yet.

(3) There's a reason this live migration stuff isn't available yet. Could very well be that the migration (at the moment) causes additional overhead.

I'm not trying to justify AMD here per se. It's just there's no where near enough information to make any real conclusions what so ever. This may not say anything bad about AMD which AMD would have wanted to cover up.

Re:check the graphs... (2, Informative)

michrech (468134) | more than 5 years ago | (#25677507)

It didn't seem that interesting to me. If you watch the video, the Intel and Barcelona machines showed no VM's running (0% load). When the Shanghai server took over the load, *of course* it's load line will rise -- it's the only server running a VM at that point!

There are no shenanigans going on here, and I don't think this says anything about the chips as you imply, either.

Re:check the graphs... (1)

Luke_22 (1296823) | more than 5 years ago | (#25677571)

Go to 4:05 in the video. On the far left, you can see from the blue intel line that the guest is running there, then they migrate, and the blue line goes to the idle point, and the orange line starts taking the load. But NOTICE, the AMD line is consistantly higher than the intel line was.

look better: when the switch happens, one load eliminates the other, they're equal
then amd load keeps increasing a little bit, even after the switch is complete.
I guess it could just be the s.o. doing something else. it was a windows after all ;)

Re:check the graphs... (1)

Ecuador (740021) | more than 5 years ago | (#25677593)

Well, duh, thet can run their Core 2 @ 4.5GHz on stock air cooling, silly!

Shanghai can still be faster clock for clock as they promised ;)

Seriously now, a CPU % utilization of a VM running WMP is no indication of anything.

Re:check the graphs... (0)

Anonymous Coward | more than 5 years ago | (#25678701)

Doesn't mean much. Maybe the Intel was more powerful, or maybe the VM needs more cpu when "booting" the migrated OS.

Re:check the graphs... (1, Informative)

Anonymous Coward | more than 5 years ago | (#25683279)

True, it is higher but the guy mentions each server is running several VMs (each of which could be doing stuff), not just the one. Also the scale of time isn't visible from the start of migration until finish. Not sure it shows anything really but well spotted.

what about endianness? (0)

Anonymous Coward | more than 5 years ago | (#25676845)

can this, theoretically, be done with a mixture of big-endian and little-endian architectures?

Re:what about endianness? (1)

xouumalperxe (815707) | more than 5 years ago | (#25677061)

This was done between different vendors, not altogether different architectures. That would demand emulation beneath the virtualization, on at least one machine -- not likely to happen any time soon.

Pfff... (0, Redundant)

Turiacus (1316049) | more than 5 years ago | (#25676859)

This is completely trivial. You simply have to mark the VM with the architecture of its code. Then each host contains both a virtualization layer (à la vmware) and a multi-platform emulator (à la qemu). If the VM matches the architecture the host is running on, you use the virtualization layer, if it doesn't, you use the emulator.

As for moving between AMD64 and Intel 64 (for example), the VM has to emulate the few instructions that differ and virtualize the rest.

Of course, cross-architecture migration is not that useful since you have an emulation penalty. It is much simpler (and cheaper) to do everything on x64.

Re:Pfff... (3, Insightful)

Anonymous Coward | more than 5 years ago | (#25676953)

so easy that you did it yourself three years ago, right?

Re:Pfff... (0)

Anonymous Coward | more than 5 years ago | (#25682231)

4yrs ago, if you must know....

Re:Pfff... (0)

Anonymous Coward | more than 5 years ago | (#25677677)

It is much simpler (and cheaper) to do everything on x64.

If you're starting from scratch, yes, a homogenous system is easier to manage. Unfortunately, in the real world most businesses have an eclectic mixture of systems accumulated through aquisitions and mergers and years of running without centralized IT planning.

Stability issues are justified (4, Interesting)

mnmn (145599) | more than 5 years ago | (#25676901)

Declaration: VMware support engineering here, but speaking strictly on my own behalf.

The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).

We can compile a kernel for strictly 486 CPUs and demonstrate migrations between AMD and Intel using extensive CPU masking: http://kb.vmware.com/kb/1993

We've also known that mismatched CPU stepping makes the VMs unstable. This is because instructions suddenly run faster or slower compared to the front side bus, not all of Linux and Microsoft code has been tested against that. You can happily try it and a lot of our customers succesfully do. Some get BSODs and kernel oops. This is not our fault.

If you virtualize the instructions more (bochs?) you can of course move the VM anywhere including a Linksys router's MIPS chip. At the cost of speed of course.

Lastly, why would we want to keep customers stuck to one CPU vendor? We've software vendors.

Re:Stability issues are justified (4, Interesting)

Anthony Liguori (820979) | more than 5 years ago | (#25677029)

Declaration: VMware support engineering here, but speaking strictly on my own behalf.

The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc).

KVM goes to great lengths to by default, mask out CPUID features that aren't supported across common platforms. You have to opt-in to those features since they limit a machine's migrate-ability.

However, I won't say this is always safe. In reality, you really don't want to live migrate between anything but identical platforms (including identical processor revisions).

x86 OSes often rely on the TSC for time keeping. If you migrate between different steppings of the same processor even, the TSC calibration that the OS has done is wrong and your time keeping will start to fail. You'll either get really bad drift or potentially see time go backwards (causing a deadlock).

If you're doing a one time migration, it probably won't matter but if you plan on migrating very rapidly (for load balancing or something), I would take a very conservative approach to platform compatibility.

Re:Stability issues are justified (1)

NonSequor (230139) | more than 5 years ago | (#25677451)

Is there any reason you couldn't keep a list of processor dependent memory locations and regenerate them for the current machine as part of the migration?

Re:Stability issues are justified (1)

Anthony Liguori (820979) | more than 5 years ago | (#25679495)

Is there any reason you couldn't keep a list of processor dependent memory locations and regenerate them for the current machine as part of the migration?

The halting problem?

Re:Stability issues are justified (5, Informative)

kscguru (551278) | more than 5 years ago | (#25677777)

Yet Another VMware engineer here.

The new Intel/AMD CPU features that allow masking of CPUID bits while running virtualized also make processors recent enough that most of the interesting features are present - MMX, SSE up to ~3. The "common subset" ends up looking like an early Core2 or a Barcelona (minus the VT/SVM feature bits, of course) - Intel and AMD run about a generation behind on adding each other's instructions. Run on anything older than the latest processors, and you have to trap-and-emulate every CPUID instruction. Enough code still uses CPUID as a serializing instruction that this has noticeable overhead.

So there are two strategies. Pass directly through the CPUID bits (and on the newest processors, apply a mask), or remember a baseline value, trap-and-emulate every CPUID and always return that value. Sounds like KVM has picked the latter approach for a default; VMware's default is to expose the actual processor features and accept a mask as an optional override, which skews towards exposing more features at the expense of some compatibility. Equally valid choices, IMHO.

The Worst Case Scenario when not doing a trap-and-emulate of every CPUID is an app that does CPUID, reads the vendor string, then decides based on the vendor string which other CPUID leafs to read. (Like the 0x80000000 leafs, which are vendor-specific and would come back as gibberish if you get the processor wrong). If the app migrates during the dozen or so instructions between the first CPUID and the following ones, instant corruption. Good enough for a pretty demo, destined to make a guest kernel die a few times a year if actually used in production. And I'm 95% sure this is what the OP demo is doing - living dangerously by hoping mismatched CPUID results never get noticed.

I agree with Anthony Liguori here - on a production machine, an Intel/AMD migration is way too much of a stupid risk. All you have to do is reboot the VM, it's much safer.

(As a side note to everyone reading, the reason Linux timekeeping is such a problem is that TSC issue. Intel long ago stated TSC was NOT supposed to be used as a timesource. Linux kernel folks ignored the warning, made non-virtualizable assumptions, and today are in a world of hurt for timekeeping in a VM. And only now, many years later, are patching the kernel to detect hypervisors to work around the problem.)

Re:Stability issues are justified (3, Interesting)

Chirs (87576) | more than 5 years ago | (#25678391)

The TSC is an optional clock source. You can use other things (ACPI, HPET) but the problem is that they're relatively expensive to access.

The kernel people have been complaining literally for multiple years that x86 needs a system-wide clocksource that is cheap to access (and presumably hypervisor-friendly). So far AMD and Intel haven't bothered to provide one.

Re:Stability issues are justified (2, Interesting)

TheLink (130905) | more than 5 years ago | (#25679113)

Yes you're not supposed to use TSC.

BUT there is no good alternative that's:
1) Cheap
2) Fast
3) Available on most platforms

I find it quite amazing actually that the CPU manufacturers add all those features, and yet after so many years there is still no good standard way to "get time", despite lots of programs needing to do it.

Re:Stability issues are justified (1)

virtualboy (1402343) | more than 5 years ago | (#25679305)

Virtual Iron Engineer You also have to be worried about programs that check for CPU and use specific functions within that CPU. When you then move to other CPU that don't have the functions the OS may stay up and running but the application may crash. In house we have done this, but don't recommend customer to LiveMigrate from between Intel and AMD.

Re:Stability issues are justified (2, Informative)

Anthony Liguori (820979) | more than 5 years ago | (#25679443)

The new Intel/AMD CPU features that allow masking of CPUID bits while running virtualized also make processors recent enough that most of the interesting features are present - MMX, SSE up to ~3. The "common subset" ends up looking like an early Core2 or a Barcelona (minus the VT/SVM feature bits, of course) - Intel and AMD run about a generation behind on adding each other's instructions. Run on anything older than the latest processors, and you have to trap-and-emulate every CPUID instruction. Enough code still uses CPUID as a serializing instruction that this has noticeable overhead.

Modern OSes do not use CPUID for serialization. We trap CPUID unconditionally in KVM and have not observed a performance problem because of it. Older OSes did this but I'm not aware of a modern one.

My understanding of the reason for the recent CPUID "masking" support is because if you are not using VT/SVM (Xen PV or VMware JIT), there is no way to trap CPUID when it's executed from userspace. AMD just happened to have this feature so when Intel announced "FlexMigration", they were able to just document it. I don't think it's really all that useful though.

(As a side note to everyone reading, the reason Linux timekeeping is such a problem is that TSC issue. Intel long ago stated TSC was NOT supposed to be used as a timesource. Linux kernel folks ignored the warning, made non-virtualizable assumptions, and today are in a world of hurt for timekeeping in a VM. And only now, many years later, are patching the kernel to detect hypervisors to work around the problem.)

The TSC is often used as a secondary time source, even outside of Linux, but yes, Linux is the major problem. But Windows it not without it's own faults wrt time keeping. Dealing with missed timer ticks for Windows guests is a never ending source of joy. Virtualization isn't the only source of problems here. Certain hardware platforms have had overzealous SMM routines and the results was really bad time drift when running Windows.

Re:Stability issues are justified (1)

Chris Snook (872473) | more than 5 years ago | (#25681189)

Rebooting isn't always an option. If you've got 10 guests running on a host, and you have the luxury of rebooting 9 of them, you still need to migrate one of them. Sure, you can keep separate pools of hosts with different processor revisions and migrate between them most of the time, but what happens when it's time to retire your rack full of netburst-era Xeon boxes, running several hundred guests? You're correct that CPUID trapping introduces overhead on older CPUs, but this demo was run on new CPUs, in part to show off how they make it easier to migrate to.

TSC timekeeping is essential for SMP scalability. When your hypervisor only supports 4-cpu scalability, you may not notice this effect, but for those of us running on bare metal or other hypervisors that allow us to use more CPUs, the effect becomes quite pronounced when running enterprise transactional workloads. The Linux kernel has gone to great lengths to use the most efficient timekeeping mechanism that can be used safely. The only patches I've seen lately on this topic have been to *enable* TSC timekeeping when running under VMware, since Linux distrusts the TSC by default, and has trouble verifying it in a virtualized environment.

Re:Stability issues are (not) justified (0)

Anonymous Coward | more than 5 years ago | (#25681333)

I've seen this done before by masking the cpu flags so as the VM only sees the lowest common denomination of features of a group of CPUs across which it can migrate.

VMWare have been unable to make this feature stable in their product while others like the commercial products based on KVM and also Citrix Xen have managed to get this working to a level where they are confident enough to do live demos (rather than slideware). There was a video of this linked over on the 360is blog last week.

AG

Re:Stability issues are justified (0)

Anonymous Coward | more than 5 years ago | (#25682077)

Yet Another VMware engineer here.

Shouldn't both of you be working?

Re:Stability issues are justified (1)

Malc (1751) | more than 5 years ago | (#25677463)

VMWare have more stability worries than this on their plate. I've just upgraded Fusion on the Mac to version 2 and it's still very unstable. First use the guest OS locked up, forcing me to reboot the host so I could try again, only to find that, like with Fusion 1.1, the Mac hangs on shutdown. *sigh*

Re:Stability issues are justified (0)

Anonymous Coward | more than 5 years ago | (#25678895)

Declaration: VMware support engineering here, but speaking strictly on my own behalf. The stability issues are justified if you consider all types of VMs. Windows 2003, default RHEL5 kernels etc use more than the basic set of assembler instructions (disk IO code uses MMX, SSE etc). We can compile a kernel for strictly 486 CPUs and demonstrate migrations between AMD and Intel using extensive CPU masking: http://kb.vmware.com/kb/1993 [vmware.com] We've also known that mismatched CPU stepping makes the VMs unstable. This is because instructions suddenly run faster or slower compared to the front side bus, not all of Linux and Microsoft code has been tested against that. You can happily try it and a lot of our customers succesfully do. Some get BSODs and kernel oops. This is not our fault. If you virtualize the instructions more (bochs?) you can of course move the VM anywhere including a Linksys router's MIPS chip. At the cost of speed of course. Lastly, why would we want to keep customers stuck to one CPU vendor? We've software vendors.

The question is why would your company say it's impossible when it isn't?

I see a much harder problem... (1)

Osvaldo Doederlein (34220) | more than 5 years ago | (#25680301)

This migration won't work for systems that employ advanced JIT code generation, such as Java. Modern production JVMs, like Sun's and IBM's, will create native code on the fly - and they will produce code that's ultra tuned for the specific processor that is running. This means using the best instructions available (like SSEx), and also fine-tune various behaviors, e.g. GC can be tuned for the L1/L2 cache sizes, and locking can be tuned to factors like number of CPUs/cores/hardware threads - so for example, if it's running on a uniprocessor/single-core machine, the JVM will simply not emit memory barrier instructions for memory model consistency.

And it's not only Java, we have an increasing large number of JIT compilers that may employ similar tricks: Microsoft .NET (CLR); Flash 9+ (Tamarin) for ActionScript; Mozilla TraceMonkey and Google V8 for JavaScript; new LLVM-based runtimes for other languages... the list is only growing. Even for traditional static-compiled languages, some apps can have multiple shared libs compiler for different CPU levels, and choose the best lib at startup.

The only way I see around this problem is making ALL these runtimes and applications migration-aware. Each process should be notified before the migration, initiate some pre-migration task, and after the migration, being notified again to resume work and if necessary perform some post-migration step. Specifically for Java, the pre-migration would need to "park" all threads in OSR safepoints, then free all JIT-generated code; and in the after-migration, retune/config itself for the new CPU, then unpark the threads - that would resume execution in interpreted mode until the JIT compiler recreates all native code for the new CPU. Fortunately this is relatively simple to do in JVMs, because all necessary plumbing is preexisting (safepoints, on-stack replacement... required for advanced GC and dynamic optimizations). And once a new JVMs are enhanced with this feature, thousands of Java apps become magically migration-aware. Could be harder though for other runtimes.

Still, very hot technology, just not as easy as we can imagine to get right and compatible with all applications.

Re:I see a much harder problem... (1)

BitZtream (692029) | more than 5 years ago | (#25680587)

The apps already are migration aware, as are the OSes, thats why you reboot them.

Re:Stability issues are justified (0)

Anonymous Coward | more than 5 years ago | (#25682005)

Can you clarify why even changing a stepping makes it unstable? After all, the OS would work fine on those processors without any code modifications if it were running natively (in that it handles the instruction speed difference). Heck, computers even deal with CPU scaling. Can you calrify this please?

I doubt VMWare is scared...yet (0)

Anonymous Coward | more than 5 years ago | (#25676907)

Not to diss the acheivement which is cool. It does require newer processors with the special VM extensions. So it may commoditze future CPUS. Also KVM requires QEMU and many running VMWARE depend on a tested solution that is delivered complete from the vendor. And VMWARE is propably looking at the source and will have it or something similar in future builds.

Wasn't this always possible? (1)

tlhIngan (30335) | more than 5 years ago | (#25676919)

The point of virtualization is to isolate the hardware from the software - I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed). Nor how it cna be impossible - while the x86 has many extensions, it's still a well-specified architecture with specific behaviors.

The real trick is if an application is using features not present on the other architecture - e.g., an AMD virtual machine migrating to an Intel one while running applications use 3DNow instructions (which don't exist on Intel CPUs). Or perhaps an old 16-bit application running on a 32-bit VM under a 32-bit OS migrating to a 64-bit VM (since you can't do real mode or other legacy things in x64 mode) and continuing without a hitch... (Maybe it's a VM running MS-DOS, say?)

Re:Wasn't this always possible? (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#25676995)

Fuck it, we'll do it live! FUCKING THING SUCKS!

Re:Wasn't this always possible? (0)

Anonymous Coward | more than 5 years ago | (#25677361)

Or perhaps an old 16-bit application running on a 32-bit VM under a 32-bit OS migrating to a 64-bit VM (since you can't do real mode or other legacy things in x64 mode) and continuing without a hitch... (Maybe it's a VM running MS-DOS, say?)

This one, everybody has already solved. Or you couldn't boot past BIOS - the last refuge of real-mode code in today's computers.

Re:Wasn't this always possible? (1)

thePowerOfGrayskull (905905) | more than 5 years ago | (#25677369)

The point of virtualization is to isolate the hardware from the software - I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed).

Erm... actually, if you watch the video, you will see that the "live" migration is actually live - the VM is not suspended, it is kept running and active through the migration.

Re:Wasn't this always possible? (2, Informative)

TheRaven64 (641858) | more than 5 years ago | (#25677793)

Actually, it is suspended, but only for a fraction of a second. First you copy the entire contents of memory to the new machine and mark it as read-only. Each page fault caused by this is used to mark pages that are still dirty. Then you copy these. You keep repeating this process until the set of dirty pages is very small. Then you suspend the VM, copy the dirty pages, and start the VM on the new machine. Userspace programs will just notice that they went an unusually long time without their scheduling quantum. With Xen, at least, the kernel is responsible for bringing up and shutting down all CPUs except the first one, so the kernel will notice the migration (in a paravirtualised kernel - with HVM it won't) and restart the other (virtual) CPUs.

Re:Wasn't this always possible? (0)

Anonymous Coward | more than 5 years ago | (#25677573)

I would say "the point" of virtualization is subjective, and that may be your purpose but other people have different purposes. My point is to run multiple operating systems from the same dev machine, reducing hardware cost. Some people use it for the sake of redundancy and infrastructure management on mirrored hardware. VMWare Fusion users use it to have the best of both worlds hand in hand. I would say "the point" of virtualization is that it is very useful, for many reasons, and that with pros come cons.

Completely isolating the hardware from the software has downsides too; most notably speed. There has been architecture emulation for quite some time with completely isolated infrastructure. The answer to speed concerns was to improve hardware support for virtualization, which both dominant manufacturers have implemented. Now VMWare and other systems can issue instructions to the processor specific to managing allocated zones of hardware and then they can pass the instructions directly through to the processor rather than interpreting them with a virtual processor (effectively a Hardware Hypervisor). You could probably effectively move from inferior hardware to superior hardware as either isolated or integrated so long as you proxy some of the messages and are aware of the expectations of the guest OS, but it's a lot of work to accomplish and it's very dependent on the new machine being able to, while proxying, perform at least as well as the VM required of the old hardware.

Re:Wasn't this always possible? (1)

Ephemeriis (315124) | more than 5 years ago | (#25677785)

I fail to see how this is unique other than it being done "live" (which just means the VM is suspended, and the state of everything moved to the new machine and the VM resumed).

You just completely missed the point. The VM was not suspended, moved, and resumed. It was moved live. The VM never stopped doing its thing. It was up, running, and servicing requests the whole time.

...which isn't terribly amazing. I know VMWare can do that now. The big deal is apparently that it moved from one CPU vendor to another. I didn't realize this was so tricky... I kind of figured that x86 was x86 regardless of vendor. Obviously, I was wrong.

Re:Wasn't this always possible? (1)

BitZtream (692029) | more than 5 years ago | (#25680715)

The virtual machine was paused, just not very long. At some point you have to transfer the contents of the VMs ram between the servers running it and swap which hardware owns the virtual disk. When that moment occurs, the virtual machine is paused for a brief period of time while the final bits of memory and ownership of disks is transfered to the new host.

This pause is mitigated by transfering as much of the running VM's RAM to the new host as possible, then when the move actually occurs, copying those last changed bits over. On certain servers, where ram is changing constantly and there is a lot of it, you will see a very obvious pause in the virtual machine during migration. Smaller amounts of ram, not so much.

Don't be impressed by the appearent 'lack of pause during transfer' because it was there, they just made sure the test was done in such a way that they could demonstrate it without you noticing it. Thats the advantage of setting up your own demos, you can hide all the bad parts pretty easy.

Doesn't surprise me (1)

guruevi (827432) | more than 5 years ago | (#25676951)

After all, all x86 are the same. MMX extensions get emulated on AMD, Linux distro's run on both processors without recompiling, the kernel handles calls and most likely an Apache server is not going to call the special media extensions. It would be interesting to see this happen in an environment that has been optimized and is using certain incompatible extensions (like 3DNow!) eg. a computing cluster.

If you abstract enough and emulate a processor you should even be able to move between architectures but the overhead of emulation wouldn't make it very cost effective.

#irc.trolltalk.cOm (-1, Troll)

Anonymous Coward | more than 5 years ago | (#25677035)

AWESOME move any equipment (I always bring my Future at aal under the GPL. And/or distribute hapless *BSD fun to be again. sorely diminished.

Not quite a break through (2, Insightful)

Anthony Liguori (820979) | more than 5 years ago | (#25677055)

FWIW, KVM live migration has been capable of this for a long time now.

KVM actually supported live migration of Windows guest long before Xen did. If you haven't given KVM a try, you should!

Creds anyway (2, Insightful)

noundi (1044080) | more than 5 years ago | (#25677117)

It's worth noting that VMware have been a huge contribution to the Linux-society, giving corps a very good reason (â$£) to migrate, thus including important pawns in the future of Linux. I for one believe that VMware was wrong, but that it's an honest mistake. There's no use in poking on VMware for this one, hopefully they'll help lift the technology even higher along with their competitors.

You've lost this round VMware, but the match isn't over yet!

AMD (1)

wzinc (612701) | more than 5 years ago | (#25677203)

I don't know if this will help AMD sell more procs. I like AMD, but Intel's stuff is by far faster these days. Still, Intel's procs are nightmarishly expensive compared to AMD, and the difference in price/performance seems disproportionate to me.

AMD ftw... (0)

Anonymous Coward | more than 5 years ago | (#25678467)

Shows that AMD is the better company. Intel just buys and kills everything in it's way with it's evil black market and under the counter deals...

OpenVZ has been able to do it for like 2 years now (1)

dowdle (199162) | more than 5 years ago | (#25680123)

Let me clarify before people jump down my throat... OpenVZ (www.openvz.org) is OS Virtualization (aka containers) and NOT machine / hardware virtualization... so it can only run Linux on Linux... but it has been able to do live migrations from one processor family to another since they initially added checkpointing. OpenVZ is fairly CPU agnostic and it has been ported to a number of CPU families. In fact the project leader recently ported it to ARM (Gumstix Overo). See: http://community.livejournal.com/openvz/24651.html [livejournal.com]

Once again... (1)

emptycorp (908368) | more than 5 years ago | (#25680665)

AMD is the first to technological breakthrough and all Intel can do is copy the technology and overclock it to do better on benchmarks.

AMD - First to create lower clock speeds with same or better performance to Intel's higher speeds.

AMD - First (and only) to TRUE dual and quad core technology (Intel does not use logical cores).

AMD - First to 64-bit.

Of course other smaller chip makers have done these sorts of things first, but they don't compare to the Intel/AMD dominance and consumer marketplace.

VMWare, the latest 1-product-company fighting M$ (-1, Troll)

Anonymous Coward | more than 5 years ago | (#25681453)

Is it just me or does any other VMWare user worry about the future of a company which has 1 product and is now going head to head with microsofts (inferior) play in this space, Hyper-V?

History is littered with 1-product companies like VMWare who lost out to a weak product from Redmond, pitched at a low (free?) price and bundled with their main OS.

Hyper-V is only 30 bucks with windows server 2008, and allows 4 free Windows 2003 instances before you have to pay any more.

Anyone else remember Netscape? What about Stacker? The list goes on...

VMWare user.

Intel will be fixing this problem soon (0)

Anonymous Coward | more than 5 years ago | (#25683371)

Look for Intel to provide "Intel Genuine Advantage" that makes it impossible to migrate a VM, under any circumstances, with any degree of success.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?