CUDA to run a virtual machine?

Alorwin · November 23, 2008, 9:01pm

I’m curious about if it’s possible to use CUDA to make a virtual machine to run off the GPU? In other words, a software x86-to-cuda interpreter, and if it was possible, what would the performance be?

E.D_Riedijk · November 23, 2008, 9:23pm

It would be pretty much impossible I think, and performance would likely be bad.
Each has its own place.

Sarnath · November 24, 2008, 12:45am

VT-D enabled hardware + Virtualizer support (XEN is more closer to support…) might help to run CUDA on top of Virtual machines

The link below has a list of VT-D enabled hardware and has a nice discussion as well.

Check out this link: http://forums.nvidia.com/index.php?showtopic=78487 (towards the end…)

BUt well, the URL does NOT cover the Latest Hyper-V from Microsoft. Has any1 tried?

I just read the tech-republic PDF – It talks about Synthetic devices – where one can assign devices directly to guest OSes… If that works for GPUs or the TESLA, we should get CUDA going on Virtualization platforms…

Has any1 got this installed? (Note: Hyper-V is only for 64-bit platforms…)

http://ct.techrepublic.com.com/clicks?t=72…EPUBLIC&s=5

Alorwin · November 24, 2008, 1:11am

It sounds like what you’re suggesting is running CUDA from the virtual machine, but what I meant was running the virtual machine with CUDA. Or am I wrong?

Sarnath · November 24, 2008, 1:12am

How could you run a VM on CUDA…??

CUDA cannot run any general purpose OS. So… Hmm… I dont see your point.

Sarnath · November 24, 2008, 1:15am

Oops… SOrry, I just read “virtualization” and “CUDA” and started blabbering… Sorry about that…

I think GPUs are just not cut out for all that. Look at WARPs. They all execute the same instruction at the same point of time. It just does not fit the model of general purpose computing at all.

parallelis · December 7, 2008, 10:35pm

Emulating a x86 CPU on a GPU may seems irrelevant, but it could be done, and performance may be great if you emulate SIMULTANEOUSLY a huge number of x86 CPU.

GPU may be efficient in the emulation because there’s a lot of invariant instruction decoding-code that may execute in parallel on a warp, while the resulting executed code for each thread will probably be executed sequentially (huge divergence on the emulated instructions at any point between the different emulated x86 core). The parallelization of the instruction decoding may enable good performance level, especially if you try to hide instruction differences (i.e: goto and mov are finally similar).

You will have many problems with the memory:

There’s less that 4MB available per SP on actual implementation, limiting emulated code to this space
Memory accesses will slow down the emulation, because you will end up with at least 1GB/s IO per emulated CPU
(and this is to compare with 100GB/s+ of actual L1 cache)
No cache so any PUSH/POP or use of local frame will rely on GPU main memory (ouch!)
You will have to add instructions to protect memory areas of one emulated x86 CPU to be overriden by another one

I don’t think it’s undoable, I think that for some specific x86 code it may be doable and using 64threads+ per SM it could even run well, but I seriously doubt that it could compete with actual generation (Core i7) CPU architectures in terme of performances.

To emulate ARM or any-other RISC-oriented Instruction Set may be an interesting use of CUDA, as it’s easier to decode and execute than x86 ISA, and CUDA only exists on x86 architecture (sadly), so anyone having access to CUDA already possess an x86 CPU :-)

seibert · December 8, 2008, 12:55am

There’s little point in kludging a parallel processor to emulate a sequential system. Much better to find the places in your x86 code where you are forcing the CPU to emulate a data-parallel processor through big for loops, SSE, or threads and offload those tasks to CUDA. :)

torasama · January 26, 2010, 10:20pm

very interesting. thanks everybody and especially iAPX for the rundown.

but since this threads dates from early 2007, I was wondering if now would be a different story.

with the recent Geforce 295 GTX that is like 5.57 times faster than a i7 950 4x3.0Ghz according to some testing with pyrit

[url=“Google Code Archive - Long-term storage for Google Code Project Hosting.”]Google Code Archive - Long-term storage for Google Code Project Hosting.

could not we run a lot of x86 or sse2 instructions on a 295 GTX ?
I guess the RAM would still be the problem

what do you guys think?

seibert · January 26, 2010, 11:39pm

The problem continues to be latency and bandwidth over the PCI-Express bus unless you load the entire process onto the GPU, in which case it is terribly inefficient.

The GPU is actually slower per operation than the CPU, but it is very wide. Even running single precision SSE instructions on the GPU would leave a one multiprocessor idle 87.5% of the time, and a GTX 295 has 2 GPUs with 30 multiprocessors on each GPU.

x86 emulation would not be very effective unless you could extract a huge amount of parallelism from the x86 binaries you were running, which just is not practical. It is far easier to specify the massive parallel operations at the source code level and compile down to CUDA, instead of trying to infer your way back up from machine instructions.

Gregory_Diamos · January 27, 2010, 12:15am

I’ve always thought that it would be funny to boot windows on a GPU. You absolutely could do it, given a massive engineering effort. Transmeta did something slightly less ambitious a while back (think of running an x86 build of windows on an Intel Itanium). The only benefit in my mind would be psychological, in helping to convince people that there really isn’t a fundamental difference between CPUs and GPUs, but there are probably more productive ways of doing that.

Sarnath · January 27, 2010, 11:31am

Hmm… Say we emulate x86 stuff… Say you use zero-copy to make the RAM look bigger… How would one expose the PCI bus, network cards and other devices? zero-copy supports that??? On the whole, It looks very confusing…

Stoat · April 15, 2010, 6:32am

alternatively, I understand that the graphics card could not be used for the display, but is there anyway that you can enable the VM to access the GPU for numeric computation. i.e. is there any way I can run a Linux and Windows VM on the same machine via VMWare’s ESXi and have both VMs off loading calculations to the GPU?

Stoat · April 15, 2010, 6:32am

alternatively, I understand that the graphics card could not be used for the display, but is there anyway that you can enable the VM to access the GPU for numeric computation. i.e. is there any way I can run a Linux and Windows VM on the same machine via VMWare’s ESXi and have both VMs off loading calculations to the GPU?

E.D_Riedijk · April 15, 2010, 10:12am

Not currently, but it seems it should be possible in theory, as there is such a thing possible for Quadro GPUs, where virtual machines can use a GPU in the host system. But there a GPU is assigned to a VM and it is not possible for two VMs to use the same GPU. Don’t remember the name of the tech, but it should be on the NVIDIA site.

E.D_Riedijk · April 15, 2010, 10:12am

Not currently, but it seems it should be possible in theory, as there is such a thing possible for Quadro GPUs, where virtual machines can use a GPU in the host system. But there a GPU is assigned to a VM and it is not possible for two VMs to use the same GPU. Don’t remember the name of the tech, but it should be on the NVIDIA site.

Stoat · April 15, 2010, 10:41am

Interesting, I’ll have a look. Thanks

Stoat · April 15, 2010, 10:41am

Interesting, I’ll have a look. Thanks

Topic		Replies	Views
CUDA on Dell Server with Virtualization CUDA Programming and Performance	31	78557	April 17, 2012
Multiple users running CUDA WinXP CUDA Programming and Performance	22	6942	June 10, 2008
CUDA - NonCUDA GPUs Hardware Configurations CUDA Programming and Performance	4	2025	February 27, 2010
Wishlist Place your considered suggestions here CUDA Programming and Performance	201	204314	April 13, 2009
CUDA without a GPU? CUDA Programming and Performance	16	58222	November 29, 2010
GTX295 multi GPU programming CUDA Programming and Performance	22	10654	July 9, 2009
CUDA 2.0 Beta 2 GTX support, more Linux distros... CUDA Programming and Performance	29	55614	October 30, 2008
An Even Easier Introduction to CUDA Technical Blog	141	6178	November 28, 2023
GTX295 Specefications & CUDA CUDA Programming and Performance	5	12275	October 7, 2010
CUDA very slow performance CUDA Programming and Performance	21	16557	March 6, 2020

CUDA to run a virtual machine?

Related topics