4 x GTX-295: CUDA only sees 5 x GPU (NOT the usual issues!)

If they’re showing up in deviceQuery (that command line program on the left in the screenshot), then that means they’re being recognized by CUDA. I’ve never managed to get all the cards to show up in deviceQuery, but I’ve only tested with Vista so I guess that’s the problem. There also seems to be a problem with the latest drivers (on Vista, anyway) where the whole system locks up for 2-3 seconds anytime you initialize a new CUDA session on a new GPU, and if you try to initialize more than one core at the same time you’ll get a bluescreen about 20% of the time. That bug’s pretty easy to work around though; just don’t initialize everything at the same time.

deviceQuery is the canonical “what GPUs can be used by CUDA” application. If it shows up in deviceQuery, it can run a CUDA app (unless it’s in compute-prohibited mode which isn’t supported by WDDM drivers anyway).

I actually got the 32-bit install to run BOINC with 6 GPUs just fine, but it seemed to be pretty upset about how much address space was left over after mapping all of the GPU’s memory. 64-bit runs no problem.

How are you getting six cuda processors to be detected by windows? Do you have sli enabled, if so then are you returning invalid results? Well at least everyone else does, I need to run with SLI disabled and then only two cuda processors are available.

Can you post a link to the computer? like this: My Quad SLI GTX 295 rig.

Can you post your .reg patch or PM me with the text.

Try running this Win64 Lunatics’ Unified Installer v0.2, it is a better optimized cuda application and also uses newer cuda dll files. when running the installer, shut down Bonic, and click the box to install the optimised cuda application.

Where do i find “deviceQuery.exe”?

so how is possible to run 32 bit windows with six 896MB graphics cards, aren’t all the physical memory address space used by the graphics cards and nothing left for the OS???

I used the registry keys to enable the GPUs like I’ve said since the beginning of time…

To use the attached reg entry, first run regedit and go to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class{4D36E968-E325-11CE-BFC1-08002BE10318}

Look at all of the numbered entries, specifically the Device Description key. If it’s what you expect (GTX 295 or whatever), then you should add those keys there. Open the file I attached in a text editor, make sure the entries are there. If you have more entries than are in that file, add more lines with the right number at the end of the location. If you don’t have 0003 or 0004 or whatever, just get rid of those three lines in the entry. It should be fairly obvious how the entry works.

When you’re done, change the extension on the file to .reg, open it to install it, and reboot.

Also, of course you can have more than 4GB of video memory on a 32-bit OS; you don’t necessarily map the entirety of video memory into the address space. (I’ve had four Teslas on a 32-bit OS, which gives you 16GB of video memory).

edit: if having GPUs in SLI causes you to return invalid results, that probably means you’re running into problems with heat or power. having cards in SLI does nothing from a CUDA POV.
6_device_cuda.reg.txt (1.13 KB)

i ran the .reg file, i had used binary and it switched my entries over to dword. didn’t work.

oh my, WITH SLI ENABLED I get screen flashes, nvidia driver restarts, and thrashed (erorr/invalid results, that is why i asked for a link to your rig in seti so i can see) work units from the start! If i let it go for long it will blue screen on me. I must run SLI dissabled, or not run Bonic at all!

I’m running:

Boinc 6.10.18

Lunatic v.2 app

nvidia 195.62 drivers

physX 09.09.1112

Cuda-Z doesn’t freak out the system with SLI enabled and detects four CUDA units.

temps are under control 65 degree C

power supply is a thermaltake 1200 watt, with one card plugged into V3 and the other into V4.

motherboard is EVGA X58 SLI

i7-920 water cooled

2X GTX 295

could your debuging applications keep your system tame.

again, please post a link to your computer. like this http://setiathome.berkeley.edu/show_host_d…?hostid=3050453

If it doesn’t work, post the contents of HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class{4D36E968-E325-11CE-BFC1-08002BE10318}. Anything else is a waste of time–there’s absolutely nothing else I can do without that.

I uploaded my dump file when running Bonic with sli enabled, yes bsod.

I also uploaded the requested registry export.

Please link me to your Bonic Seti@home computer so i can verify it is working as you say… third request…

thanks for your help so far.
122009_17218_01.zip (31.6 KB)
video_card_reg.zip (135 KB)

Go to Device Manager, uninstall one of your GTX 295s, and click the checkbox to remove drivers for that device. Reboot, install 195.81, and start over.

(you have entries for six devices, this seems a little odd if you only have two 295s)

I don’t have a link to BOINC stats or anything like that, nor can I do anything with a BOINC dump. I have no interest in verifying that BOINC is working correctly, just CUDA in general. Whether or not six GPUs can run BOINC simultaneously is beyond the scope of this exercise because there are way too many other variables to make any sort of general troubleshooting useless unless it simply does not work anywhere. I ran it for about five hours on my 32-bit install with 6 GPUs, no blue screens, stock drivers, everything was fine. That doesn’t mean you’re not hitting power or thermal problems now.

Also I’m pretty sure somebody confirmed this patch works in another thread (there was a complaint about how it disabled multiple monitors, which it does).

Will do and i will report back.

Please run Bonic 64 bit client, and install the above linked to Lunatic V0.2 application. link me to your user account so we can verify your claims that it works.

dude, i swapped this PS with another from another rig, i have four toughpower 1200W power supplies. cpu is water cooled and there is lots of cold air for the 295 to gobble up.

didn’t work, best i get is 2 cuda units available with “SLI Dissabled”. enabling SLI trashes work units, causes screen flashes, nvidia driver restarts (driver not responding), and blue screens.

edit: added registry extract.
video_card_reg_195.81.zip (92.5 KB)

Badly behaved applications can cause any of those things (well they shouldn’t cause blue screens but I think there’s a limit to the number of times you can TDR by default before Windows thinks something has gone horribly wrong and does blue screen).

Anyway, I’m now on vacation and away from the machine, so I’m not running BOINC on it anytime soon. Plus I don’t really like the idea of trusting some v0.2 hacked/“optimized” build.

Hey Tim,

Finally this worked for me. Devicequery finds all devices now.
Although I searched the web for the two terms “Limitvideopresentsources” and “DisplaylessPolicy” earlier (read those names somewhere without guide what to do), I was not able to find out how to use them and where to place them into the reg. Your file helped here to clarify things. Maybe you can place it as a short guide as a sticky somewhere in this forums or push including setting of those keys in driver options/setup.

At least from my point of view it’s worth trying.
ToDo:

  • find your GTX295s in the reg and
  • adapt those keys at the right places
  • report success/failure to pinpoint further probs.

Good Luck!

Thanks,
Markus

The fix that NVIDIA sent to me didn’t work with v191.XX of the drivers. Now trying to load 195.62 and re-test. Unfortunately 195 will not upgrade from 191 without a Blue Screen. Reinstalling the O/S from ground zero to see if yet another “Registry Patch” will work…

Tim you are NO help. For starters the app is not hacked, and give the large selection of Cuda applications I don’t know how you can come off this way.

I dissabled TDR and it still shreds workunits.

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers]

“TdrLevel”=dword:00000000

@evanevery:
I had to adapt the enumeration of the registry entries to the devices where Device Description entry indicated a nVidia Geforce GTX295.
So one entry of

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class{4D36E968-E325-11CE-BFC1-08002BE10318}\0000]
“DisplayLessPolicy”=dword:00000001
“LimitVideoPresentSources”=dword:00000001

for each registry key

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class{4D36E968-E325-11CE-BFC1-08002BE10318}\000X] (X indicated the enumeration)

that corresponds to a GTX295.

I’m using latest drivers from nVidias website (195.62 (?) or something close).

Upgrading the driver shouldn’t be that complicated:
boot in “safe mode” and then uninstall and delete all drivers under software and in device manager. This should remove all nVidia drivers. Reboot and check whether the redetected GTX295s get the microsoft default driver or still the nVidia version. If it’s by nVidia you may shout some evil curses at me. Otherwise just install the latest driver and pray =).

I also think there are some free driver remover tools out there to get rid of driver relicts. I never ever had to reinstall an OS because of a driver issue.

Markus

Markus,

I’m working my way through a baseline out-of-the-box install. With all the time and money we have invested in this - Safe Mode is just not that “Safe”… Anyway, I hope to have some more info today…

WE MAY HAVE A FIX!

:rolleyes: (Something to put under your tree!) :rolleyes:

NVIDIA Engineering sent me a new Registry Patch which appears to solve the problem. I now have all 8 GPU’s being enumerated under Win7 (64 Bit Ultimate) using the latest driver (195.62). However, the latest driver must be used to get this to work!

I have requested permission to post the patch to this forum. Waiting for a response now…

Great news! Look forward to trying it myself. Right now I’ve had to pull one of my two GTX295s in order for my system to run well. I’ll check back for any updates.

Please don’t PM me and ask for the patch. I’ve asked permission to post the patch so lets just wait for NVIDIA’s approval. They may also have a more complete solution in the works. I’m also not sure if the patch is specific to my system or not. (Some of these Registry patches are GUID specific…) Unfortunately, the holidays may delay their response (even further)…

Based on what I’m seeing, it appears they have a handle on the issue. However, I’m not sure how the fix will ultimately will be implemented.

It looks like we’re on top of this! Lets give NVIDIA a chance to do this right…

Hang in there and have a Happy Holiday!

OK - NVIDIA said it was OK to post the patch they sent me (attached)

Here’s what (I think) I Know:

  1. This patch must be used with the latest driver (195.62). It will not work with earlier releases (at least not in my testing)

  2. Although this patch references the tired, old, “DisplayLessPolicy” and “LimitVideoPresentSources” tokens, they are not installed under the NVIDIA hardware keys. I’m told they are being installed under the generic MS Windows video hardware enumeration keys (note no reference to NVIDIA in the registry keys as was formely suggested)

  3. Given the above, I think this patch is somewhat generic. (It no longer needs to be installed with reference to the specific GUID used by the particular NVIDIA cards installed in the system)

  4. The patch (as provided) references 8 total GPU’s (per my system). You may want to trim this back to match your total installed GPU’s, however, I don’t expect having the extra registry keys for GPU’s which don’t exist will hurt anything (give it a try)

Using 195.62 on a clean install of Win7 64 Bit Ultimate, I now have 8 GPU’s being enumerated by CUDA. I’m curious if this works for other people with different video card combinations and other O/S installs…

(Merry Christmas!)
Enable_Disable_Non_Display_CUDA.zip (864 Bytes)