GeForce Maxwell Titan X and Pascal Titan X in same machine?

I am attempting to get my older Maxwell Titan X and a new Pascal Titan X working on the same Ubuntu 16.04 machine.

When I boot up, the “GeForce GTX” green name lights up on both cards, but as soon as I run nvidia-smi, the light turns off the older Maxwell Titan X. I tried reinstalling NVIDIA-Linux-x86_64-367.35.run and the same thing happens.

The driver 367.35 states that both these cards should be supported.

Are you seeing any other problems than the lights turning off? I think that’s just the behavior of the older Titan: the GPU powers on with the lights on, and then they turn off when the GPU is initialized. Normally, it’s initialized by the SBIOS at boot but when you have two GPUs in the system, the secondary one isn’t initialized until later when the driver loads.

Thank you for the response.

Yes; the problem is that the Maxwell GPU isn’t available. Here is the output of nvidia-smi, I only have the Pascal Titan X listed:

nvidia-smi
Wed Aug 17 11:18:04 2016
±----------------------------------------------------------------------------+
| NVIDIA-SMI 367.35 Driver Version: 367.35 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X Off | 0000:01:00.0 Off | N/A |
| 70% 85C P2 177W / 250W | 10569MiB / 12188MiB | 98% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 23540 C python 10567MiB |
±----------------------------------------------------------------------------+

Can you please generate and attach an nvidia-bug-report.log.gz file?

Hi Aaron,

The requested file is here: http://3DTOPO.com/nvidia-bug-report.log.txt

Thanks!

i have a similar issue. did this get resolved?
3DTOPO, did you get this solved?
I have both chips in one machine. only one shows up in nvidia-smi:
±-----------------------------------------------------+
| NVIDIA-SMI 352.68 Driver Version: 352.68 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX TIT… Off | 0000:02:00.0 Off | N/A |
| 0% 42C P0 48W / 250W | 23MiB / 12287MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

ls /dev/nvidia*
/dev/nvidia0 /dev/nvidia1 /dev/nvidiactl /dev/nvidia-uvm

and here is my bug report: https://transfer.sh/f0OAG/nvidiabugs

Nope - still limping along with one GPU. Seems like your results are consistent with mine, so I guess the driver doesn’t support them both - which seems quite odd to me (not to mention disappointing).

did you upgrade your drivers? I did not. I am on ubuntu 14.04 and i had the older maxwell chip. Then I installed the second titan x pascal without any software updates and when I run nvidia-smi only 1 gpu shows up. If its a NVIDIA issue, then we have to either wait for them to fix it or get another computer. What would be the cheapest computer to get that utilize a titan x pascal?

I started with a clean install so no drivers were updated - I just installed the latest 367.35 drivers.

Personally I am not interested in building and maintaining a second machine when clearly it should be working in one.

I dont want multiple machines either, but it doesnt look like nvidia is doing anything about this.

I observed in log :

Aug 14 14:05:23 brainBot kernel: NVRM: RmInitAdapter failed! (0x25:0x40:1050)
Aug 14 14:05:23 brainBot kernel: NVRM: rm_init_adapter failed for device bearing minor number 1
Aug 14 14:05:51 brainBot kernel: NVRM: RmInitAdapter failed! (0x24:0x40:1035)
Aug 14 14:05:51 brainBot kernel: NVRM: rm_init_adapter failed for device bearing minor number 1

Can you test with ftp://download.nvidia.com/XFree86/Linux-x86_64/370.23/ driver ? Did you observe above errors as soon as run nvidia-smi ?

I installed the driver recommended, restarted and now its even worse:

nvidia-smi
modprobe: FATAL: Module nvidia not found in directory /lib/modules/4.4.0-36-generic
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Now I have 0 GPUs for computing. :(

Tracking this issue under : Bug 200233454

Great, is there a public link to follow the progress of this?

Any update on this issue or estimated time for a solution?

3DTOPO, We are not able to reproduce this issue. Looks issue is specific to your system or motherboard. Do you have any other system to test? Also is the affected GPU on other system? [I’m assuming there in GPU hardware fault.]

3DTOPO, Can you please test with 375.10 driver ?

Also start OS in text mode or runlevel 3 so that X/Graphics will not start. Login to os via console or remote ssh. Then reproduce the issue with command : nvidia-smi --debug=logfile

Please attach logfile here so we can have a look.

Meanwhile check if same issue occur with other motherboard.