4 x GTX-295: CUDA only sees 5 x GPU (NOT the usual issues!)

Its not a monitor selection issue. I have the KVM attached to all 8 ports and when the screen goes black I can easily scan across all 8 ports!

I tried loading the 190.38 beta drives a SECOND time on a fresh install. Same problem as before. I get the driver loaded OK with only 1 card in the system. Then when I add three more, it detects a couple of cards and then the screen goes black and the system locks up (No image on any screen). I also get an “Error in NVCPL.DLL. Missing Entry: NvStartupFirstRunAfterInstUserAccount” but only after adding the three additional cards (not when a single card is in the system).

So the 190.38 Beta Drivers are totally trashed…

Although the current shipping drivers are stable, We have the ubiquitous 5 GPU limit (2+1+1+1)…

I believe I have done all I can do with the Vista environment. Its now up to NVIDIA to solve the problem! I will notify them of my findings via my Open Incident…

OK, Great! Looks like the 190.38 drivers went live! Now I need to do this all over again a THIRD time. This time I will uninstall all traces of the existing driver before trying to load 190.38. I have a hunch that this will change nothing…

Tim, are there any news about 5 devices limit in WDDM drivers? Can you confirm this? The issue might be quite pressing for us as many of our customers are using more than 5 devices on XP and will likely switch to Vista/Win7 sometime soon.

I don’t know if WDDM drivers have a 5 “device” limit, but I can see all 8 monitor devices (2 per card) in the device manager from Vista using the 186.18 drivers when I have four 295 cards in the system. Its just that I only get 5 GPU’s enumerated via CUDA…

I’m currently trying a clean load of the new 190.38 PRODUCTION drivers as the 190.38 BETA distribution trashed my system (TWICE) with four 295 cards in the box (although they worked OK with only 1 card). I’m skeptical if the 190.38 PRODUCTION drivers are any different then the 190.38 BETA drivers, but I’m proceeding cautiously and slowly and adding 1 card each boot to see exactly if/when the crash occurs. This thread is largely a blog for me now as to what I have tried and what has happened. I have a support incident opened with NVIDIA and at least I can point them to this thread for the running details. (Although no one from NVIDIA has called me back or emailed me to discuss any of these findings…)

The motherboard manufacturers have stepped up and started to build motherboards with 4 properly spaced PCIex16 slots just for CUDA support (Asus, Asrock, and I think also Gigabyte)…

Its time for NVIDIA to step up and fix the multi-GPU CUDA enumeration problem in the Vista/Win7 drivers…

If NVIDIA wants the CUDA hardware momentum to continue into the OEM market, then we must have 8 GPU support for Vista/Win7! Otherwise OEM’s will not be able to make complete machines using these boards and only the consumer market will be buying them for personal projects. While individual consumers will likely be able to scrape up a copy or two of XP, OEM’s will not be able to sell complete systems in larger numbers unless current O/S’ are supported… (Even Vista downgrade rights require the end user to have a valid XP key to use in place of a Vista key - something an OEM can not provide).

So what’s up NVIDIA?

UPDATE: The 190.38 Vista Drivers are really trashed wrt CUDA! (IMHO)

Here is what I have found out in a nice clean Vista64 SP2 install using the production 190.38 drivers. I installed the drivers on a box with a single card and then slowly and methodically added cards and changed the configurations. These drivers are even less functional (and less stable) then the prior 186.18 drivers. I can only get 4 GPU’s max now (I used to get 5) and the NVIDIA Control Panel is basically useless to control the configuration. With all the hardware and labor we have invested in this, and with all the detailed information we’ve gathered, you’ld think that NVIDIA might be interested in discussing this further…

Cards____PhysX____SLI___Monitors__GPU’s_____Notes
1________Y_______Y______1_______2_____ Default Config

(Installed Second Card)

2________Y_______N*1_______2 *SLI was Auto Disabled
2________Y_______Y______1_______4_____ After Turning on SLI

(Installed Third Card)

3________Y_______N*1_______2 *SLI was Auto Disabled
__3________Y_______Y______1_______4________After Turning on SLI

(Installed Fourth Card)

__4________N*___N*1_______2*SLI/Physx Auto Disabled
__4________Y_______N______1_______2________After Turning on PhysX, However NVIDIA Control Panel would NOT allow SLI to be enabled
__4________Y_______N______8_______4________Attached all 8 Monitors (KVM), 4 GPU’s show up
__4________Y_______Y______8_______4________Enabled Multi-GPU, Vista immediately Reports Driver Crash but says the driver has recovered
__4________Y_______N______8_______4________Viewing the NVIDIA control Panel after the driver crash and “recovery”, SLI is off again

Additional Notes for final config (last Item above):

  1. All 8 monitors (KVM) are attached

  2. All 8 Display Devices show up in Windows Device Manager

  3. Only Three (of four) 295 Cards show up in the NVIDIA Control Panel to “Setup Multiple Displays”

  4. Of the Three 295’s which show up, only the first card will allow both monitors to be selected.
    The other two cards only allow one of the two monitors to be selected (“This GPU Only Supports a Single Display”)

  5. Regardless of what is selected (I.E. Both Monitors on first card, one monitor on each of the other two visible cards).
    the control panel simply rejects the options and returns to the original config when “Apply” is selected.
    (The original/default config is One Monitor on LAST of the three visible cards)

No updates, trying to track this down internally still. Driver team is working on it though, don’t worry.

evanevery, you should be able to get at least five GPUs to show up with three cards. I’m using the same drivers and hardware (except different mobo). Based on your mobo configuration, you should know which two cards will be picked for quad-SLI. Make sure one of these two cards is hooked to the KVM – I used the “bottom” DVI jack (closest to the mobo) on the “first” card (closest to the CPU). Also make a single KVM connection to the non-SLI card.

This should mean Windows will detect two monitors in Windows display properties. I actually saw three monitors, so I suspect the third monitor is a “ghost” based on driver weirdness or something. When I tried to extend the desktop to the third monitor, I got the Vista “driver crash” errors that you also saw. Just make sure SLI and PhysX are enabled, and then extend the desktop across two monitors. Once you can see five CUDA devices, try adding the fourth card, do the same configuration (should now extend across three monitors, again ignoring any extras) and maybe you’ll get a sixth CUDA device.

I used BOINC for quickly detecting the number of CUDA devices available, but I’ve confirmed Folding@home also works with the 190.xx drivers with SLI enabled.

Please see my configuration matrix I provided in the last message. All cards ARE hooked to monitors.

You will also see in the matrix that I *DID have only three cards in the system and I DID NOT see 5 CUDA devices.

You will also see that I did try tuning SLI/Physx on/off…

Only 3 GTX-295’s max are displayed by the Nvidia Control Panel (although 8 display devices are shown by the Windows Device Manager). Something is VERY wrong with the new drivers. This very same configurations shows all cards and but only 5 GPU’s under the previous Visata drivers and shows all cards and the appropriate 8 GPU’s under XP (also prev driver).

I can try the new drivers under XP as well, but I’m really tired of trying to isolate NVIDIA’s problems without any support from them. Its like calling 911 and all you get is the 911 Operator - no ambulance, police car, or fire truck has ever shown up! “The developer’s have been notified and will get back with you…”

What MB are you using? Does it support 4 GTX-295’s (4 x PCIex16 with proper slot spacing, etc)?

I have monitors hooked up and I have run only 3 cards… So which configuration EXACTLY do you think I’m missing in my testing? (Please refer to the provided test matrix)…

I’m using a DFI Lanparty DK X58-T3eH6, which only supports three cards. My quad 9800GX2 box is using the MSI K9A2 Platinum with AMD Phenom II processor. It’s weird, but the AMD/ATI boards and the K9A2 Platinum in particular have a good reputation in the distributed computing community for having the best support for multiple NVIDIA cards, even though they don’t support SLI. I don’t know if that would be a problem with enabling SLI with the new drivers. My box is running XP Pro x64, which works fine (8 CUDA devices, no dummy plugs, etc.), so I’ve never tried Vista or Win7 on it.

In your test matrix, you don’t mention using the KVM when testing three cards. I’m suggesting, rather than hooking up all six DVI connectors to the KVM, only hook up two DVI connections, one for one of the quad-SLI cards, and one for the non-SLI card. That’s how my current configuation with five CUDA devices is set up (quad-SLI card hooked to a single monitor, and dummy plug on non-SLI card to get Windows to see it).

Since I know five GPUs will work, if you can duplicate that, then you can add the fourth card and see if you can get six GPUs to work. When you add the fourth card, again you would just make a single connection from that card to the KVM, and you should only extend your desktop across three monitors in Windows display properties.

I now have four new Water Cooled EVGA 295 cards installed and running under XP using the 190.38 production drivers. I have a massive 4 Fan Koolance Radiator attached to the top of the box. I am not going to spend any more time on Vista or Win7 since NVIDIA is apparently uninterested in working with us on this. With the water cooling now in place, I have very little ability to rapidly add or remove cards as I was doing before. So for now, I am only running XP until I can get my water cooling config fully quantified and optimized. I currently have all 4 cards in series. Next week I will adding the CPU to the water cooled mix and splitting the cards into two banks. Temps are not too bad under full GPU loading but the last card does run about 8c warmer than the first. I will be trying three loops in parallel: CPU, 2 Cards, 2 Cards. I may use 3/8 tubing for the CPU to help throttle the water flow a little and push more through the 1/2 inch tubing I am using for the 295’s.

Once I get the water cooling optimized then I will return to the Vista/Win7 driver problem. I was going to wait to do this so I could work with NVIDIA in the air cooled config, but that certainly looks like a waste of my time. What are the odds that someone from NVIDIA will actually show some interest in the Vista/Win7 issue by the time I get the Water Cooling done?

Did you get this running?

I have 2xGTX-295 and Vista Ultima 64 Bit, a Power Supply of 1200Watt and like you I entered a terrible nightmare battle with trying to get cuda to see 4GPU’s.

I have 2 dummy vga plug’s next to 1 HMI montior (over DVI adapter) and a second vga montior (so I can cover all 4 dvi ports) and I spend really a lot time with installing/deinstalling nvidia drivers cards and analysing the registry.

I now have despite all logic (Quad SLI is ON!! The SLI Bridge is in and I have connected only 1 monitior the other 3 dvi ports are empty …no dummy vga plugs)… Cuda seeing 4 GPU’s !!!

however …

It doesn’t seems to really work with 4 because:

the best performance I ever got (boinc) was when cuda was seeing 3 GPU’s only!

AND the first card runs HOT and the second card has a comparable moderate temperature.

I tend to say that with cuda seeing 3 GPU’s the second card was even hotter.

So to summrize: Cuda sees 4 GPU’s now

but it runs only with about 2.5 till 3 GPU’s judge by temperature and performance seen with Boinc… :blink: :wacko

The registry entries (Hardware/Devicemap/video) looks like I have one card in Sli only(might fit to quad sli) but yet it finds 4 GPU’s. (The display offers two connection 1 and 2 wherease on 2 is nothing connected grey)

So it seems that isn’t what is relevant for the cuda GPU’s detection.

Now there is another entry. The PCI occupation:

Here I figured I need to have 4 entries/subfolders in one of this PCI folders:

(System/CurrentControlSet/Enum/PCI/VEN_10DE&… )

4 time 6&…

and there in the subfolder DeviceParameters each has a VideoID

and in each I have a valid VidPnLkgTopology binary blog which has information in it (not everythign zero)

if the VidPnLkgTopology entry is missing or the binary blog contains onlyl zero’s then it will not recognice it. So it seems to me. (see below)

During my installing attempts I figured out that I can already trigger the Cuda GPU recognition Problem when I install 1 Video Card in the second PCI slot only and leave the firt empty!!!

So I worked sort of backward:

  1. (SLI disabled, VGA dummy plug’s in, desktop expanded)

  2. Install 1 Card in PCI slot 2 and get 2 GPU’s running! This is the hard part especially when you had 2 card installed before…

  3. Then install second Card in slot 1…this will lead to several Blue Screens!! (thank’s nvidia ). But you have to work your way through that and keep trying till it starts.and installs the driver proper. I use 190.38 beta

I had to switched to first card as primary montior somewhen then.

Check registry for 4 VidPnLkgTopology entries…(doesn’t matters at the moment if one has all zero’s in it.)

you will still not have 4 gpu… probabily you have only 1 or 2.

  1. But then the miracle. activate Quad SLI

after reboot I had cude seeing 4 GPU’s

I rebooted again and it were still 4.

Then I unplugged the VGA dummy’s and the second vga monito…

rebooted and I still have cuda seeing 4 GPU’s…

But as I say…it seems to be it’s only useing two and a half GPU’s judge from performance and temperature (and noise) of the cards…

And unfortunatly it burns my first pci slot card that has bad air flow capability in that slot.

And that’s the point where I then resigned into defat. I was so excited about Cuda wanted to start own projects (tat is the reason why i bought 2 card) and I am honestly so dissapointed by this that I buried my plans with cuda.

Next to cuda I enjoy flightsimulator on an older PC (also vista 64)… the main issue there that I had since over 2-3 years was that the display somewhen crashed and did not recover… Nothing worse than when you have sucha crash right before you could finsih your 2 hour flight. This really kills your day I tell you. This issue was ‘solved’ …well not really solved but at least it recovers from crash and you can continue and end the flight…not before lately driver 185.

Nvidia should really start to focus on their SW quality in my opinion. What is the best hardware (nvidia) on the marked good for when you can not use it because their sw is instable or not working?

Here is the relevant Info of my Registry. If you belive it or not this makes Cuda recognices 4 GPU’s! I really thing it is because of the 4 PCI entries with full VidPnLkgTopology

DEVICEMAP/VIDEO:

\Device\Video0 \REGISTRY\Machine\System\ControlSet001\Services\VgaSave\Device0

\Device\Video1 \Registry\Machine\System\CurrentControlSet\Control\Video{DEB039CC-B704-4F53-B43E-9DD4432FA2E9}\0000

\Device\Video2 \Registry\Machine\System\CurrentControlSet\Control\Video{42cf9257-1d96-4c9d-87f3-0d8e74595f78}\0000

\Device\Video3 \Registry\Machine\System\CurrentControlSet\Control\Video{28089D7D-B2D4-47C2-B2CA-8D60A99A0B34}\0000

\Device\Video4 \Registry\Machine\System\CurrentControlSet\Control\Video{28089D7D-B2D4-47C2-B2CA-8D60A99A0B34}\0001

MaxObjectNumber 4

SYSTEM/CONTROl/VIDEO:

VideoID {28089D7D-B2D4-47C2-B2CA-8D60A99A0B34}

        SubFolders: 0000  (DeviceDescriptions NVIDIA GeForce 295)

                    0001  (DeviceDescriptions NVIDIA GeForce 295)

                    Video (Service nvlddmkm)

VideoID {34018717-467E-4D53-97B5-253D9F65B897} SubFolders: only Video (Service nvlddmkm)

VideoID {94A91FAA-CF9E-4AF0-8A85-1E8971E732F9} SubFolders: only Video (Service nvlddmkm)

VideoID {C50E1217-4107-4E52-9907-69162A757C43} SubFolders: only Video (Service nvlddmkm)

PCI:

VideoID {C50E1217-4107-4E52-9907-69162A757C43} (has valid VidPnLkgTopology entry)

VideoID {28089D7D-B2D4-47C2-B2CA-8D60A99A0B34} (has valid VidPnLkgTopology entry)

VideoID {34018717-467E-4D53-97B5-253D9F65B897} (has valid VidPnLkgTopology entry)

VideoID {94A91FAA-CF9E-4AF0-8A85-1E8971E732F9} (fhas valid VidPnLkgTopology entry)

oh and I didn’t make the

LimitVideoPresentSources

DisplayLessPolicy

entries.

The LimitVideoPresentSources leads to slow Windows startup and Crash when you installa new nvidia driver. So no good option.


I’m having a similar problem. 2-GTX 295. Using 3 devices is almost the same performance then using 4 devices and I don’t know why. But I think it is a problem related to CPU threads/contexts synchronization.

CUDA 2.3 enables all devices when using SLI. That’s why you can see the 4 devices without changing the registry parameters. But it has a slower performance then changing manually the registry, since on SLI all the devices will be activated for render-related processing.

If you find some answers, please post it here!

Thanks.

Well, I actually got Vista64/CUDA to see a total of 5 GPU’s (with 4 x 295’s) with the older driver a while back. That was not so difficult, but it should have seen 8 GPU’s… (The newer driver is complete trash as it causes all sorts of instability).

Its interesting to see that NVIDIA really doesn’t seem to give a crap about this. I keep calling in (I have a support incident open), and they keep telling me the developers are looking into the issue, but no one has ever called back or confirmed they have found the problem… Its obvious that everyone except NVIDIA has found the problem! And this isn’t something new…

I’m forging ahead with the water cooling on my platform. Unfortunately, it only runs under XP so we still do not have a completed product to offer our customers…

evanevery,

I’m having the same problem with my system. I have a P6T7 and it cannot reliably detect > 4 GPUs. It runs fine up to 4, but when I put the third or fourth GTX295 in the system I get blue screens and black screens on all of the displays. I have tried XP Pro x64 and Win7 both with the same results. If I boot into Safe Mode, Windows can see 8 GPUs normally, but when I boot into Normal Mode I have issues from the time that the nVidia driver engages… Prior to the nVidia driver loading I see Windows boot screens, etc., but after the driver loads I get either BSOD or black screens with no output on all monitor outs.

There is a BIOS update that I will point out (I haven’t installed it yet), but it claims only microcode updates and isn’t likely to affect this issue:

http://support.asus.com/download/download…SLanguage=en-us

I thought this was a shortcoming of the BIOS, but I have since met at least one person that has 4 GTX295s working without issue in the P6T7 with the original 210 BIOS.

Jason “Atlas Folder” Farque

jason@atlasfolding.com

I had issues with the P6T7 system opriginally. That is why I went to the Asrock X58 board. It is not only cheaper, but the developers keep updating the BIOS. (This includes specific release notes about the ability to support 4x295’) Asus has not updated the P6T7 BIOS since the original release. Once I moved to the X58, all the instability issues were resolved. 4x295’s work just fine in XP. I’m using XP(32) as XP64 provides absolutely no benefits for this type of work and was never supported very well by anybody anyway… Unfortunately, I’m still waiting for NVIDIA to fix the issues with the Vista/Win7 CUDA enumeration problems. They haven’t even bothered to get back with me regarding my official support incident! My watercooling config is all done and I’m now just dressing up the box for our next trade show. Once I get the system all cleaned up, I’ll post a few pics. It’ll be nice to present a complete OEM manufactured, Watercooled, Supercomputer with 4x295’s. However, its unfortunate we will have to tell our customers we won’t be able to sell it until NVIDIA fixes the CUDA problems under Vista/Win7!

The System Design (including the water cooling) is effectively done. Here are a couple of photos of the prototype…

Still absolutely nothing from NVIDIA on this! I called again today and now I’m being told that the case can not be located! (Although I have email from NVIDIA referencing the specific case number)…

Over a month and NOTHING!

So I’m looking at 6 GPUs being enumerated in Vista right now (as it turns out, finding a machine capable of fitting a lot of GPUs is hard). I did this with one GTX 295 and two Quadro Plex D2s (I still can’t find a machine capable of fitting eight GPUs, although I might be able to make this one work with eight with a hack–previously blew up at the BIOS). I’m using 190.62, which is the latest public release.

And now I hit 7 by setting the GTX 295 and inserting an NVS 295. Still can’t hit eight because of BIOS issues.

Moral of the story is, beyond how ridiculous this is to set up (registry editing, hooray) I’m not convinced there’s an actual problem with the driver and the ability to enumerate devices.

Tim,

If you will take the time to read my original messages you will see that we have tried two seperate MB’s cpable of hosting 4 x GTX295. An Asus board and an Asrock X58 Supercomputer Board. The Asus board was not stable but the Asrock board works great right out of the box.

This is not a ridiculous task. The Asrock board works right out of the box. No BIOS tricks, no registry editing. At least it works without any issues using Windows XP…

So lets get over this OK? Its not a ridiculous task. Its a driver issue. If the Vista Driver worked as well as the XP driver we would not be having this discussion. We’re hitting 5.5 BILLION NTLM passowrds per SECOND using the XP platform and the 1920 processors of the 4xGTX 295. Why can’t we get the Vista/Win 7 Driver to do the same. That is the question! We DON’t want to have to edit and registry settings or modify any BIOS configurations.

…so please do not try to sully this issue by pretending it takes some extraordinary system manipulation to get this to work - it doesn’t! The X58 platform simply works with XP. WHY DOESN’T IT WORK WITH VISTA/WIN7?

All this was very clearly delineated earlier in this thread…

Better yet - Why doesn’t NVIDIA seem to care?