Not all cuda devices detected in cuda fortran Windows 10

I have been trying to get back to compiling codes with pgf90 after upgrading to Windows 10 from Windows 7 and to pgfortran 17. After the upgrades single-gpu versions of my cuda fortran codes compile and run correctly, but not my multi-GPU versions. Cuda only detects two of the three cuda devices installed in my machine. Prior to the upgrade I had running multi-gpu pgf90 codes and would really like to get back there.

I have three cuda devices installed: a Nvidia Quadro 600, I use for graphics and two Tesla C2075s. The three devices all show up in the system’s Device Manager and all show to be working properly. Prior to the upgrades these devices appeared as devices 1, 0, and 2, respectively in cuda programs. That is, one C2075 showed up as device 0 and the other as device 2. Now device 2 is not detected. By “not detected”, I mean that cudaGetDeviceCount(ndevice) returns 2, rather and 3 as before. When I check the device properties, device 0 is the first C2075 as before, the Quadro is device 1,as before, and there is no device 2. I have tried cudasetdevice(2) to see if the device is there, but is not being detected. It returns an error code 10. I have re-installed pgf90 several times with different choices, with no change.

My machine as an Asus Sabertooth X58 mb, with 3 PCIx16 slots, running an i7 6-core and 24 gb ram. The BIOS has only one PCI setting, which switches between plug-and-play and non-plug-and-play modes. I have tried both settings.

Does anyone have any suggestions?

[/i]

What is the output of

pgaccelinfo

That routine should indicate all the GPUs we can detect.
If it is not showing all the GPUs, I would first look to make sure the
CUDA ‘drivers’ have been installed since the upgrade to Win10.
The Drivers come from Nvidia, and with them you should be able
to install and verify the GPU I present with Nvidia SW.

Then run
pgaccelinfo
which calls the same Nvidia routines you refer to.

PGI compilers come after the hardware works - PGI does not diagnose GPU hardware issues.

Thanks for your input. The response to pgaccelinfo is, as you would expect:


CUDA Driver Version: 8000

Device Number: 0
Device Name: Tesla C2075
Device Revision Number: 2.0
Global Memory Size: 5574492160
Number of Multiprocessors: 14
Number of Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1147 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 1566 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Managed Memory: No
PGI Compiler Option: -ta=tesla:cc20

Device Number: 1
Device Name: Quadro 600
Device Revision Number: 2.1
Global Memory Size: 1073741824
Number of Multiprocessors: 2
Number of Cores: 64
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1280 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 800 MHz
Memory Bus Width: 128 bits
L2 Cache Size: 131072 bytes
Max Threads Per SMP: 1536
Async Engines: 1
Unified Addressing: Yes
Managed Memory: No
PGI Compiler Option: -ta=tesla:cc20

So all the pg software agrees. Under Windows 10 and pg17, it can only see two out of three of the installed GPUs. Windows 10 sees three working GPUs. Just a few weeks ago, Windows 7 and the older version of pg fortran could see all three and use the two C2075s. I guess it is possible that the second C2075 failed in the last few weeks or that the mb has developed a problem with that third slot, that Window 10 somehow cannot detect either one. However, it seems far more likely that it is a software problem. My guess is that it is something to do with Windows 10 and how it interacts with the PCI bus compared to Windows 7. I do not think its a pg software problem either. I was hoping to find someone else who had run into this problem, and hopefully how they solved it. My next step will be to shift the positions of the two C2075s to see if I am write that the cards are fine. If that works, I will shift both C2075s to another computer to see if it the mb.

I failed to mention that the machine on which I am running Windows 10, that can only see two out of the three installed cuda devices, is set up as a dual-boot machine with Ubuntu 14.04. I have a license for pgf90 for Linux as well as Windows, but had not tried the Linux version. My plan was to develop code under MS Studio/pgf90 on the Windows side and then re-compile under Linux to run compute jobs on Linux compute servers.

Because I was stuck on getting multi-gpu to work under Windows, I gave the Linux version of pgf90 a try. Everything works fine under Linux. Cuda sees all three devices and compiles multi-gpu applications without a problem. So now I have ruled out hardware as a problem on the Windows side. I have the latest Nvidia drivers installed for both operating systems and have the latest updates on both operating systems. I have also tried all the PCI settings available in the machines BIOS. My conclusion is that Windows 10 is doing something with the PCI bus that is different than both Ubuntu 14.04 and Windows 7 that causes the third cuda device to be unusable. The question is, is this a fundamental limitation of Windows 10 or is it a setting issue? I will leave that for further research. For now my plan is to develop under Ubuntu. This means I am out about $1000 for licenses for pg & MS Studio 2015 that I am not able to use. Live and learn.

Problem solved - sort of.

Engineering suggests a driver issue. Please send the output
of

nvidia-smi.exe

which would typically be found at


C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe

Since everything is of compute capacity of 2.0/2.1,
we don’t think the problem is because you are mixing new
model GPUs with old ones.

dave

Thanks for your continued help. Even if this problem is not solved, I good to go using the Linux version and was thinking about going that way eventually anyway.

The output of NVidia-smi is as follows:

Wed May 24 12:12:19 2017
±----------------------------------------------------------------------------+
| NVIDIA-SMI 377.35 Driver Version: 377.35 |
|-------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro 600 WDDM | 0000:03:00.0 On | N/A |
| 30% 50C P12 N/A / N/A | 222MiB / 1024MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla C2075 TCC | 0000:04:00.0 Off | 0 |
| 30% 56C P12 32W / N/A | 0MiB / 5316MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla C2075 TCC | 0000:05:00.0 Off | 0 |
| 30% 49C P12 28W / N/A | 0MiB / 5316MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

I got the 377.35 driver from Nvidia’s website last week.


The strange thing is that this command detects both C2075. Also, the order it detects them in is different than the device numbering in cuda, with the Quadro as 0, followed by the two C2075s. This by the way, is the order in which they are installed in the slots.

The different numbering of the GPUs is not a puzzle,
as the driver and pgaccelinfo look in a different order.

But the Windows OpenACC not finding the third GPU is a puzzle,
and we are looking at it.

dave

Let me know if I can help with more information, tests, etc. Count me in.

Send the outputs of

pgaccelinfo -dev 0
pgaccelinfo -dev 1
pgaccelinfo -dev 2

on windows and Linux.

You may also want to send trs@pgroup.com the following outputs

On Linux
/sbin/ifconfig

on Windows
ipconfig /all

which may show some differences.

On Windows pgaccelinfo -dev 0 produces:

CUDA Driver Version: 8000

Device Number: 0
Device Name: Tesla C2075
Device Revision Number: 2.0
Global Memory Size: 5574492160
Number of Multiprocessors: 14
Number of Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1147 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 1566 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Managed Memory: No
PGI Compiler Option: -ta=tesla:cc20

On Linux: pgaccelinfo -dev 0 produces:

CUDA Driver Version: 8000
NVRM version: NVIDIA UNIX x86_64 Kernel
Module 375.66 Mon May 1 15:29:16 PDT 2017

Device Number: 0
Device Name: Tesla C2075
Device Revision Number: 2.0
Global Memory Size: 5558763520
Number of Multiprocessors: 14
Number of Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647
Texture Alignment: 512B
Clock Rate: 1147 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 1566 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Managed Memory: No
PGI Compiler Option: -ta=tesla:cc20

On Windows pgaccelinfo -dev 1 produces:

CUDA Driver Version: 8000

Device Number: 1
Device Name: Quadro 600
Device Revision Number: 2.1
Global Memory Size: 1073741824
Number of Multiprocessors: 2
Number of Cores: 64
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1280 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 800 MHz
Memory Bus Width: 128 bits
L2 Cache Size: 131072 bytes
Max Threads Per SMP: 1536
Async Engines: 1
Unified Addressing: Yes
Managed Memory: No
PGI Compiler Option: -ta=tesla:cc20

On Linux pgaccelinfo -dev 1 produces:


CUDA Driver Version: 8000
NVRM version: NVIDIA UNIX x86_64 Kernel
Module 375.66 Mon May 1 15:29:16 PDT 2017
Device Number: 1
Device Name: Quadro 600
Device Revision Number: 2.1
Global Memory Size: 1010958336
Number of Multiprocessors: 2
Number of Cores: 64
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1280 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 800 MHz
Memory Bus Width: 128 bits
L2 Cache Size: 131072 bytes
Max Threads Per SMP: 1536
Async Engines: 1
Unified Addressing: Yes
Managed Memory: No
PGI Compiler Option: -ta=tesla:cc20

On Windows pgaccelinfo -dev 2 produces:

CUDA Driver Version: 8000
Device Number: 2
could not attach to this device


On Linux pgaccelinfo -dev 2 produced:

CUDA Driver Version: 8000
NVRM version: NVIDIA UNIX x86_64 Kernel
Module 375.66 Mon May 1 15:29:16 PDT 2017
Device Number: 2
Device Name: Tesla C2075
Device Revision Number: 2.0
Global Memory Size: 5558763520
Number of Multiprocessors: 14
Number of Cores: 448
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1147 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 1566 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Managed Memory: No
PGI Compiler Option: -ta=tesla:cc20

On Windows ipconfig /all produces:

Windows IP Configuration

Host Name . . . . . . . . . . . . : BigBoy
Primary Dns Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No

Wireless LAN adapter Local Area Connection* 2:

Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Wi-Fi Direct Virtual Adapter
Physical Address. . . . . . . . . : 56-A0-50-70-AD-B5
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes

Wireless LAN adapter Wireless Network Connection 3:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : The ASUS 802.11 Network Adapter provides wireless local area networking.
Physical Address. . . . . . . . . : 54-A0-50-70-AD-B5
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
IPv6 Address. . . . . . . . . . . : ::e133:1c0e:2e40:b306(Preferred)
Temporary IPv6 Address. . . . . . : ::9024:a600:c29f:ab29(Preferred)
Link-local IPv6 Address . . . . . : fe80::e133:1c0e:2e40:b306%11(Preferred)
IPv4 Address. . . . . . . . . . . : 192.168.0.11(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Lease Obtained. . . . . . . . . . : Friday, May 26, 2017 7:31:51 AM
Lease Expires . . . . . . . . . . : Friday, May 26, 2017 8:32:15 AM
Default Gateway . . . . . . . . . : 192.168.0.1
DHCP Server . . . . . . . . . . . : 192.168.0.1
DHCPv6 IAID . . . . . . . . . . . : 475308112
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-1D-E3-DB-5E-BC-AE-C5-56-41-9B
DNS Servers . . . . . . . . . . . : 216.82.201.11
66.90.130.10
NetBIOS over Tcpip. . . . . . . . : Enabled

Tunnel adapter isatap.{612DC874-754E-4D3F-9F63-CEBD30BBCD38}:

Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft ISATAP Adapter
Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes

Tunnel adapter Local Area Connection* 11:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Teredo Tunneling Adapter
Physical Address. . . . . . . . . : 00-00-00-00-00-00-00-E0
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv6 Address. . . . . . . . . . . : 2001:0:4137:9e76:ce7:1f1f:bda5:34ec(Preferred)
Link-local IPv6 Address . . . . . : fe80::ce7:1f1f:bda5:34ec%3(Preferred)
Default Gateway . . . . . . . . . :
DHCPv6 IAID . . . . . . . . . . . : 50331648
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-1D-E3-DB-5E-BC-AE-C5-56-41-9B
NetBIOS over Tcpip. . . . . . . . : Disabled

On Linux ifconfig produces:

eth0
Link encap:Ethernet
HWaddr bc:ae:c5:56:46:47
UP BROADCAST MULTICAST MTU:1500
Metric:1
RX packets:0
errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0
txqueuelen:1000
RX bytes:0 (0.0 B)
TX bytes:0 (0.0 B)
Interrupt:18

eth1
Link encap:Ethernet
HWaddr bc:ae:c5:56:41:9b
UP BROADCAST MULTICAST MTU:1500
Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B)
TX bytes:0 (0.0 B)
Interrupt:17 lo
Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128
Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:563 errors:0 dropped:0 overruns:0 frame:0
TX packets:563 errors:0 dropped:0 overruns:0 carrier:
collisions:0 txqueuelen:0
RX bytes:89930 (89.9 KB)
TX bytes:89930 (89.9 KB)
wlan0 Link encap:Ethernet
HWaddr 54:a0:50:70:ad:b5
inet addr:192.168.0.11
Bcast:192.168.0.255
Mask:255.255.255.0
inet6 addr: ::c08b:adb0:dd38:e5b8/64 Scope:Global
inet6 addr: fe80::56a0:50ff:fe70:adb5/64
Scope:Link
inet6 addr: ::56a0:50ff:fe70:adb5/64 Scope:Global
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8551 errors:0 dropped:0 overruns:0 frame:757
TX packets:4205 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:8698762 (8.6 MB)
TX bytes:608858 (608.8 KB)
Interrupt:16 Base address:0x8000

One question I have, has anyone you are award of been able to access more than two cuda devices under Windows 10 Professional?

I am experiencing the same problem. I only have two GPU’s on Tesla C2075 and one GeForce GT710. Here is the output of pgaccelfino


CUDA Driver Version: 9010

Device Number: 1
Device Name: GeForce GT 710
Device Revision Number: 3.5
Global Memory Size: 2147483648
Number of Multiprocessors: 1
Number of SP Cores: 192
Number of DP Cores: 64
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 954 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 2505 MHz
Memory Bus Width: 64 bits
L2 Cache Size: 524288 bytes
Max Threads Per SMP: 2048
Async Engines: 1
Unified Addressing: Yes
Managed Memory: Yes
Concurrent Managed Memory: No
PGI Compiler Option: -ta=tesla:cc35

PGI$ pgaccelinfo -dev 0

CUDA Driver Version: 9010

Notice that it doesn’t report anything significant about device 0. Here is the output of my nvidia-smi.exe command

C:\Program Files\NVIDIA Corporation\NVSMI>.\nvidia-smi
Sun Nov 11 16:42:48 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 391.35 Driver Version: 391.35 |
|-------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla C2075 TCC | 00000000:28:00.0 Off | 0 |
| 30% 56C P12 32W / N/A | 0MiB / 5316MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GT 710 WDDM | 00000000:60:00.0 N/A | N/A |
| N/A 48C P8 N/A / N/A | 263MiB / 2048MiB | N/A Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 Not Supported |
±----------------------------------------------------------------------------+

C:\Program Files\NVIDIA Corporation\NVSMI>.\nvidia-smi
Sun Nov 11 16:43:01 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 391.35 Driver Version: 391.35 |
|-------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla C2075 TCC | 00000000:28:00.0 Off | 0 |
| 30% 56C P12 31W / N/A | 0MiB / 5316MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GT 710 WDDM | 00000000:60:00.0 N/A | N/A |
| N/A 47C P8 N/A / N/A | 263MiB / 2048MiB | N/A Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 Not Supported |
±----------------------------------------------------------------------------+

C:\Program Files\NVIDIA Corporation\NVSMI>


Nvidia recognizes both GPU controllers but the pgi tools only sees the low end GT710 video card I have connected to some monitors. In fact it is the C2075 that I most care about and most want to use for HPC. Any advice on what can done to fix this?

caseroj,

I’m not too familiar with the issue, but it looks like you have C2075 set in TCC mode (Reference Topics :: NVIDIA Nsight VSE Documentation). I don’t think there are any limitations with TCC and PGI as far as I’m aware, but can pgaccelinfo see the card if you switch it to use WDDM? You can do so with nvidia-smi:

nvidia-smi -g {GPU_ID} -dm {0|1}

Where 0 = WDDM and 1 = TCC. Use -fdm instead of -dm to force it. Though I think you should leave the GPU driving the display to use WDDM.