No device supporting CUDA?

Hey guys, I’m having troubles trying to figure this one out. I updated to the 169.21 drivers with the Tesla support, uninstalled the previous drivers, then updated to CUDA 1.1. My configuration is using a GeForce 8800 GTS as the primary display controller, and the rest of the devices are the Tesla C870. Windows sees all devices in the hardware manager, and the drivers for the standard onboard ATI are disabled.

When I try and run the multi GPU application to make sure I can see everything, it actually sees nothing. It tells me “There is no devices supporting CUDA”. What could be the reason for it not seeing a CUDA device when all I have is CUDA devices installed?

devicequery is also telling you there are no devices? Then you have made some mistake in installing the drivers/cuda 1.1. You have to uninstall CUDA 1.0 also before installing 1.1

I’ve just uninstalled everything and reinstalled everything twice, both with the same results. The deviceQuery application is telling my that only emulation is available. I’ve even gone as far as to do a complete driver wipe to make sure any previous versions of the drivers were eliminated completely before installing the latest 169.21 drivers.

Still a no go right now.

same problem here… with Tesla C870 (used for the first time)

I have uninstalled the old drivers, SDK and toolbox, reboot and reinstalled latest driver, SDK and toolbox.

Maybe any special drivers/windows/BIOS preferences necessary?

Edit: Is it necessary that already an(other) NVIDIA graphics device is present in the system to use Tesla?

Thanks for help!

I was told that it is in the case of Windows that NVIDIA has to be the primary display adapter in order to use CUDA, which is understandable given Windows flexibility in the matter is near zero, unlike Linux.

However, in a case like ours, it just seems plain odd how even when the primary display device is an NVIDIA device, and in my case, where the primary display device is a CUDA device.

I know that applications saw all of my CUDA devices including the 8800 when I was using the drivers that were included with the Tesla, but when upgrading to the latest drivers that have CUDA included, it isn’t happening anymore.

as mentioned you must have an nvidia display card along with Tesla under WindowsXP and that NVIDIA display device must have an active display.

Chris, first run the NVIDIA control panel (right-click on the desktop) and tell us what you get when you select Help->System Information.
This should list your NVIDIA display card and the C870(s).

For example, my system reports:
GeForce 8800 GT
Tesla C870
You can click on both of them and it should report ForceWare version 169.21 (from the brand new driver that added C870 support last week, that’s what I’m running in this configuration, I’ve also tried it with the D870).

If you don’t see both of these, let’s go back a few steps.

I’d try just running CUDA with your GeForce card alone. If deviceQuery doesn’t work with that, there’s still a more basic problem.

Make sure you’ve also got just the CUDA v1.1 toolkit and SDK, nothing from CUDA 1.0. And try just using the prebuilt binaries in the NVIDIA CUDA SDK\bin\win32\Release

I’m also still a little concerned Chris about your x4 electrical slots. I’m just not sure we’ve tried that with a C870 card - could be a power problem with those to the card? If you have a full x16 electrical slot you might want to try that first as well.

Also look at the windows hardware device manager. Under Display adapters you should see both your GeForce display card and the C870(s). Again, in my configuration it’s simular to what I show above from the NVIDIA Control Panel, both the GeForce 8800 GT and Teslsa C870 are listed

Please let us know what these report.

Thanks David. I’ll be able to confirm some of what you are saying later. While I’m not 100% certain on the power issue, I’m pretty confident that electrically, we are feeding the cards enough power (in watts, not bandwidth) to satisfy the needs of the hardware. What I did notice when changing from the GTS boards to the Teslas is that we are consuming close to 100W more on idle (670W to 680W as opposed to 600W with just the GTS boards). We’ll have a QuadroFX 5600 in soon to see what we will push with that in there instead of the GTS for primary display. I’m estimating about 700W even.

I will check in the NVIDIA control panel to see what it is showing. I know when I look in the hardware manager, I do see the 8800 GTS and 5 Tesla C870 GPUs in the Display Devices and nothing else. I uninstalled the ATI driver from the hardware so now the onboard appears more as an unknown VGA device. Not sure if the presence of any ATI drivers at all might affect CUDA at all. could be a possibility so I’ll confirm that later.

I’ll see what I can get accomplished tonight and get you more information to see if everything looks right.

Hey Chris,

See if you have an option in the bios to disable the vga device entirely. Often there’s something that will disable it if there’s a pcie graphics device. This could be what CUDA needs.

I talked with one of our system guys and he said that in a x4 electrical slot, that will only be sending 25 Watts through the slot, so this may be affectings the card which expects more.

The disabling of the device didn’t work, however, I did find that it worked when I pulled all Teslas out of the system. Then I put one back in, then CUDA could see the 8800 and a Tesla C870, then I put 2 in, still good. Once I put three in, then CUDA couldn’t see anything.

What I did notice however in the system information part of the NVIDIA display panel was that all of the Tesla cards were assigned the same IRQ (0), while the 8800 GTS was assigned an IRQ of 18. Might this be causing an issue (too many CUDA devices conflicting for the same IRQ, especially since IRQ 0 is a reserved system IRQ).

The odd thing to me is that the Tesla cards are using IRQ 0 in the first place. Is this normal?

As for the power requirements, I’ll double check with the backplane manufacturer regarding wattage. All of the slots will be x16 mechanical, so I can only assume that power wise, it will satisfy power requirements for that, but in terms of bandwidth, that obviously won’t be able to saturate the card bandwidth.

ok good, so you didn’t need to disable it and you’ve got 2 C870s plus an 8800 working. cool. That’s a lot of video memory the motherboard bios needs to be aware of ~3.5GB. You may need to work with the bios supplier to map in more. It may not know what to do with ~5GB when you get to 3 C870s plus the GTS, let alone the ~9GM you’re hoping to get to.

I decided to take a look into what the Windows hardware manager is saying about the IRQs, and I noticed that all of the GPUs are sharing IRQ space at 4 or more cards. While at 3 cards, each has a unique IRQ (17, 18 and 19), as soon as I go to 4, they start repeating, and at 6 GPUs, I have IRQ 17, 18 and 19 appearing twice.

My suspicion is that this may be causing an issue, but I also have the suspicion that the point you made might go hand-in-hand with the duplicate IRQs.

I’ve contacted support for the host board to see what their thoughts on it are, but I also wouldn’t mind knowing if my suspicions are the case or not from you guys as well.

IRQ 0 is typically reserved, and shouldn’t be handed out to other devices. I suspect that your motherboard might be hitting a BIOS bug.

While I know IRQ 0 is typically reserved, I’m kind of curious as to why for the Tesla devices, the IRQ that Windows hardware manager tells me and the IRQ that the NVIDIA System Information panel tells me are different. As I mentioned, the System Information panel is telling me that Tesla, all 5 of them, are using IRQ 0, while the Windows hardware manager is telling me that the Teslas are using 17, 18 and 19 (with 17 and 19 used twice by the Teslas), with the 8800 GTS in both the System Information panel and the Windows hardware manager, is 18.

when you have a chance, can you let us know what speed the cards are reporting they’re running in those x4 slots? deviceQuery should report this for all the cards. Would be interesting to see, even for your 2+1 card configuration that’s working if they’re being clocked-down at all due to power issues.

I’m not in front of the system right now, but I do remember the System Information panel telling me the speeds. Due to the nature of the board I’m using right now, some slots are 8x and some are 4x, and it appears that the 8800 and a Tesla are being shown as running on 4x and the other Tesla shows 8x. When running the bandwidth test, I get 800 to 900MB/s going to all 3 GPUs at once. I haven’t run the device query on it yet, but I’ll do that and copy those results in the 3 GPU configuration.

I’m also going to try and install XP64 tonight to see if that 4GB memory limit (which I’m guessing applies to GPUs) is stopping CUDA from detecting devices.

Good news. As I expected, the issue was CUDA not being able to address more than 4GB of memory. We have finally gotten a true test in 64bit Windows now and device query sees all 6 GPUs (5 Teslas and a GTS). Going to move on getting the Linux thing happening.

Let me see if I understand (this might become important for me soon)

  • PC with more than 4 Gb of memory, Win32 + CUDA -> troubles
  • PC with more than 4 Gb of memory, Win64 + CUDA -> no troubles

For me it would probably be running Linux64, but still it is good to know the first configuration gives troubles.

Sounds about right, although I would specify that as total GPU memory. Not just system memory.

Look at it this way, Windows can only address 4GB of system memory since that is all that will fit in a 32bit address space. Even though each GPU may have their own discreet 1.5GB on it, my assumption was that CUDA needs to account for all available memory on all GPUs, pushing CUDA past the 32bit limit of addressable memory, especially if it is considering it a single resource.

With XP64, this is no longer an issue, and even applies to CUDA as well.

Ah I understand. So even not having 4 Gb in your main memory already makes you need win64. Good to know for the future although 32bit OS is already not the way to go anymore.