CUDA device not primary display adapter

When using a CUDA device on Windows XP it is recommended to not connect it to a display but to use another NVIDIA card as the primary display adapter.

  1. What NVIDIA cards can be used for this? All cards? Any particular cards? What cards are recommended?

  2. When using a Tesla card does the problem exist too? Of course one will need a graphics adapter but are other vendors also possible? Can onboard graphics be used together with a Tesla card?

Thanks in advance,

I would really appreciate if someone from NVIDIA or someone with experience regarding the matter could answer this.

I even have a third question:
3) When using the CUDA device for both calculation and as the primary display adapter what are the performance losings compare to a dedicated CUDA device?

  1. I have no experience with a multi-display system, but I do know you can only install one display driver. So, the 2nd card must be supported by the driver you are installing. This limits the options for the 2nd card to any recent NVIDIA card, since the newest drivers do start to drop support for extremely old cards.

  2. Same answer as (1) as far as I know.

  3. As long as there is no software actively using the display (i.e. an updating desktop widget, an updating webpage, or a user moving windows around the screen), I notice no appreciable performance loss.

The loss is measureable, but only if you know what to look for. My application calls 100’s to 1000’s of short kernels per second. If I look at the runtime of each kernel, the average is (for the sake of example) 1.0 ms per launch, but every so many launches takes 1.1ms. I attribute the overhead to some little blip updating on the display requiring a CUDA/graphics “context switch” of some kind. The overall performance loss (measured in linux console vs linux X, I don’t have a 2 display system) is less than 1%.

For one particular user (sorry, I don’t remember the thread), these “blips” in the kernel performance were unacceptable because they caused a noticable blip in the framerate of the animation.

Thanks for the answers MisterAnderson.

So this is kind of a problem as it kills the “just put one or two Teslas in the system and it will calculate at the speed of light” argument. It will be more like “find a mainboard that can hold one or two Teslas and one other recet NVIDIA card, put all that in the system and it will calculate at the speed of light”.

After searching for a little while I found the Quadro NVS product series:
Would such a device, especially the NVIDIA Quadro NVS 280 (PCI) work with, say 2 Teslas or 2 Geforce8 cards?

You can always install linux and work remotely on the machine. No need for a display-card then.

Hey seb,

On Windows, as MisterAnderson42 points out, there’s only one driver, so you need to have an NVIDIA card as your display card. The NVS card you mention is just fine. It also doesn’t need to be an 8-series, so older NVIDIA devices are fine.
I use an NVS285 with Tesla C870 and D870s. These also are available in X1 format so that can be handy if you don’t want to use another x16 slot. You can certainly use something more powerful that can also do CUDA work while displaying.

On Linux, particularly where servers have non-NVIDIA on-board display devices, that’s ok.

I’m not sure yet what’s going to be required when we release Vista support for CUDA. That may require all cards to be 8-series or newer (which happen to be any CUDA-capable devices), but certainly like with XP, all NVIDIA devices. I’ll post those details when I find out. Or someone here may post it sooner.

Thank you all for the clarification.
And thanks for pointing out the NVS285 x1. This card could indeed be handy because my major concern actually was the number of available x16 slots.

I like the idea of using a x1 PCIe video card for the desktop. I’m thinking about getting one of those motherboards with 3 PCIe x16 slots. I’ll then wait for the new video cards with dual GPUs on them. So, I suppose in a few months, it will be possible to build a system with 3 cards, dual GPUs in each, for a total of 6 GPUs doing CUDA … awesome.

Could NVidia provide a list of x1 PCIe cards that would be compatible with 8-series and also the next version 9-series cards when used in an XP system.

I guess it is still not known how VISTA would fit into such an architecture.

I don’t know if this is applicable in your case but somewhere here it was mentioned that for each GPU one CPU core is recommended to run CUDA apps efficiently.


Good point. The 1 CPU core per GPU … is that a technical requirement of the CUDA runtime or is it just a recommendation for the sake of performance?

This is especially true if you run a large number of very short kernels in succession. Then the latency introduced waiting for the GPU to finish between kernels (or after the launch queue fills up if you are not waiting) can become significant. A one-to-one mapping between CPUs and GPUs will ensure GPUs are serviced quickly when they become available. Longer running kernels may not be so affected by an oversubscribed CPU.

(Batching the work done by my kernels so they ran longer turned out to be a huge speed improvement for me. The extra scratch space I needed in shared memory reduced the occupancy of each kernel call, but the result was still a net win.)

This is a recommendation. Requirement is to have 1 CPU thread per GPU.

Another thing a college just pointed out regarding this:

When using two different video cards say one onboard display adapter and one Geforce8 both video devices show as functional in the device manager and it is possible to use 2 monitors at the same time, from those 2 different video cards.

So it seems like Windows XP is very well capable of using 2 different video drivers at the same time. Any insight on that?

Personally, and I’m just experimenting with this right now, I have 4 Tesla GPUs installed and using the onboard ATI for video out. Even though I have drivers installed for the Tesla C870’s in the systems (according to my device manager, it shows 1 ATI ES1000 and 4 C870’s), any CUDA application I use insists there are no CUDA devices in the system and defaults to emulation mode.

I know it is possible in Linux to use the ATI device for video out and still have access to the Teslas for CUDA, since it primarily looks for the devices in /dev/nvidia0, /dev/nvidia1, etc., but it would certainly be nice if this capability was possible in Windows as well.