Hi,
We are using Cuda9 with 384.111 nvidia driver on the Power9 machines, with no issues.
Specs:
Power9 (ppc64le)
OS: RHEL7.5
FW: OP910.24
GPUs: V100
kernel version: 4.14.0-49.el7a.ppc64le
After a try to upgrade the environment to Cuda9.2 with a newer driver, the cards failed to work.
We’ve upgraded FW (as mentioned on Cuda9.2 installation page for Power9 users), to version: OP910.24
.
We’r currently performing environment upgrades, to Cuda10.
After driver version: 410.104 installation, nvidia-smi outputs ‘Unknown Error’ in the memory field for all GPUs.
IN the installation documentation, its mentioned that supported kernel is 4.14.
Also, followed the fix ‘disabling udev’ mentioned here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#power9-setup
Could you please assist ? Any up to date documentation for setting up new drivers on Power9 machines ?
Please suggest us about any known solutions / incoming fixes.
Thanks !