IBM POWER 9 / PPC64 status

Dear all,

I will have access for a while to a ibm POWER9 machine. I don’t know the spec for now, neither OS nor GPU…
what is the status of NVIDIA drivers on ppc64 ?

regards

Assuming it is a AC922 system, it is supported.

CUDA toolkit installers are at the usual location: http://www.nvidia.com/getcuda

Refer to the linux install guide for instructions:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

If you only want to install a driver, you can use a CUDA installer for that, or you can get standalone driver installers at the usual location: http://www.nvidia.com/drivers

thanks, my concern was about driver support…
I found only tesla p100/v100, I hope we’ll have this in the box

if not, there are no TITAN V ppc64 support available ?

only tesla p100 is supported in power8
only tesla v100 is supported in power9

documented in the linux install guide

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements

If you have an AC922, it has Tesla V100 in it.

Raptor Computing Systems sells Talos II PowerAI Development System based on POWER9 CPU and RTX2070 GPU. How can I download a RTX driver software for Linux running on POWER9 at the NVIDIA Driver Download site?

Thanks.

Probably you should contact Raptor Computing Systems for support of their system.

Yeah, I had asked them about it and they told that you should contact to NVIDIA. It´s our everyday experience in the world. :-(

NVIDIA doesn’t provide any display driver support on Power architectures. That includes the idea that we don’t provide any support for display-out, OpenGL, and other “typical” graphics activities.

With respect to GPU computing, the documented support for Power9 architecture is contained in the linux install guide:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements

“(****) Only the Tesla GV100 GPU is supported for CUDA 10.1 on POWER9.”

So I would suggest that anything other than a Tesla V100 GPU is not supported on a power9 architecture system for GPU computing.

If Raptor is selling a system with an RTX GPU in it, I presumed, perhaps mistakenly, that they would have some reason to do so, and some kind of statement about how to use it. That is why I suggested that you check with them.

NVIDIA doesn’t publish, test, or create drivers that are designed to be used with a RTX GPU on Power9 architecture.

If Raptor is unable to give you any direction as to what works on their system, or how to use their system, I don’t think you’ll find it here.

None of my statements above should be construed to mean that something does not work. I haven’t tested it. Instead, the above statements mean that we don’t do anything to test or provide support for RTX GPUs on Power9, and I wouldn’t expect anything like that to work. It might be that it does work, I cannot say. Even if it does work, it (RTX GPUs on Power9) is entirely unsupported by NVIDIA, at this time.

You’re welcome to try anything you wish, of course.

Thank you for your comments. If I install CUDA toolkit with another required OS, compiler and toolchain, I don’t need any display driver of RTX for GPU computing, correct? If so, I’ll try to challenge the combination for GPU computing. It’s also our everyday experience in the world. :-)

No, that’s not correct.

The CUDA toolkit installer includes a driver installer.

You cannot use a GPU for anything without a driver installed for it.

You cannot use a NVIDIA GPU for CUDA computing without a NVIDIA GPU driver installed.

There is no driver qualified, tested, or advertised by NVIDIA for use with a RTX GPU in a Power9 platform, currently. For any purpose.

I’m not sure how many different ways I can say this. I probably won’t be able to respond for further requests for clarification on this subject.

Feel free to attempt to install anything you wish in any setting you wish. It might work.

@raph38130 I’ve just installed the nvidia-drivers for ppc64le on a talos dual-core Power9 system with an NVIDIA Tesla V100 16GB GPU.

The binary installer is much smaller than the ones for other architectures. It could be because 32-bit compatibility libraries are not included in the ppc64le arch, unlike the x86_64 driver downloads, or because support for certain GPU architectures are not included as part of the nvidia ppc64le driver download.

A quick look at the installed libraries show that the one for egl and vulkan driver support are all present in the nvidia ppc64le driver:

cd /usr/lib/powerpc64le-linux-gnu
ls libnvidia*

libnvidia-cfg.so                     libnvidia-gtk2.so.418.67
libnvidia-cfg.so.1                   libnvidia-ml.so
libnvidia-cfg.so.418.67              libnvidia-ml.so.1
libnvidia-eglcore.so.418.67          libnvidia-ml.so.418.67
libnvidia-egl-wayland.so.1           libnvidia-opencl.so.1
libnvidia-egl-wayland.so.1.1.2       libnvidia-opencl.so.418.67
libnvidia-encode.so                  libnvidia-opticalflow.so
libnvidia-encode.so.1                libnvidia-opticalflow.so.1
libnvidia-encode.so.418.67           libnvidia-opticalflow.so.418.67
libnvidia-fatbinaryloader.so.418.67  libnvidia-ptxjitcompiler.so
libnvidia-glcore.so.418.67           libnvidia-ptxjitcompiler.so.1
libnvidia-glsi.so.418.67             libnvidia-ptxjitcompiler.so.418.67
libnvidia-glvkspirv.so.418.67        libnvidia-tls.so.418.67

The Titan-V with 12GB HBM2 is also a GV100 part. I think that this driver will definitely work with a Titan-V because both the Tesla V100 and the Titan-V use a GV100 GPU part.

In the past, support for other GPU platforms were all inclusive within a single driver download.

You can’t know for certain if a Quadro RTX 6000 or 8000 will work with the nvidia ppc64le driver unless you try it out, since no else else has access to this configuration or is willing to comment on compatibility at this point in time! ;-)

I’m planning to do this test, sometime soon. I’ll let you know when I find out but it might take about month, by the time I get an RTX 8000 to be able to test it out.

Here are my step-by-step instructions for installing ubuntu-desktop and the proprietary nvidia-drivers on a new ubuntu-18.04 server installation.

### Step 01.00: Install additional packages.

In Ubuntu, the opal-prd (Processor Runtime Diagnostics) package that is required for runtime detection and handling of Power processor errors on systems that are running OpenPower firmware is not installed by default. Run the following command to install this package:
```bash
sudo apt-get install opal-prd

Install compilers and build tools:

sudo apt-get install localepurge
sudo apt-get install build-essential dkms pkg-config pkg-config

Step 02.00: Install graphics drivers.

Step 02.01: Install Gnome desktop

We’re going to use tasksel for the installation of the GNOME desktop. tasksel is a Ubuntu and Debian-specific tool, which helps to install multiple related packages as a coordinated task.

sudo apt-get install tasksel -y

Once the above command completes, issue the following command to install the gnome desktop:

sudo tasksel ubuntu-desktop -y

When the process completes, reboot the server.

sudo reboot -i NOW

Step 02.02: Blacklist nouveau drivers.

Ensure you are using only Nvidia proprietary drivers by blacklisting Nouveau, Ubuntu’s built-in Open Source driver.

Create blacklist-nouveau.conf

sudo nano /etc/modprobe.d/blacklist-nouveau.conf

Include the following:

blacklist nouveau
options nouveau modeset=0

Enter the following linux command to regenerate initramfs:

sudo update-initramfs -u

Reboot your system:

sudo reboot -i NOW

Step 02.03: Install NVIDIA proprietary graphics drivers.

List installed PCI devices:

sudo lspci

0000:00:00.0 PCI bridge: IBM Device 04c1
0001:00:00.0 PCI bridge: IBM Device 04c1
0001:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961
0002:00:00.0 PCI bridge: IBM Device 04c1
0003:00:00.0 PCI bridge: IBM Device 04c1
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM Device 04c1
0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:00:00.0 PCI bridge: IBM Device 04c1
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0030:00:00.0 PCI bridge: IBM Device 04c1
0030:01:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
0031:00:00.0 PCI bridge: IBM Device 04c1
0032:00:00.0 PCI bridge: IBM Device 04c1
0033:00:00.0 PCI bridge: IBM Device 04c1

We can see that the NVIDIA V100 GPU is connect to PCIe slot 0030:01:00.0.

Download the driver:

NVIDIA_DRIVER_VERSION='418.67'
NVIDIA_DRIVER_RELEASE_DATE='2019.5.7'
OS_DISTRO='ubuntu'
OS_VERSION='1804'
ARCH='ppc64le'

wget -q --show-progress --progress=bar:force:noscroll http://us.download.nvidia.com/tesla/$NVIDIA_DRIVER_VERSION/NVIDIA-Linux-$ARCH-$NVIDIA_DRIVER_VERSION.run -O /tmp/NVIDIA-Linux-$ARCH-$NVIDIA_DRIVER_VERSION.run

Install the driver, with dkms support and overwrite existing libglvnd files.

sudo bash /tmp/NVIDIA-Linux-$ARCH-$NVIDIA_DRIVER_VERSION.run

In order to configure headless 3D GPU acceleration, you’ll have to use VirtualGL with TurboVNC.

VirtualGL works fine with headless NVIDIA GPUs (Tesla), but there are a few additional steps that need to be performed in order to run a headless 3D X server on these GPUs. These steps should be performed after installing the NVIDIA proprietary driver, but before configuring VirtualGL.

Run nvidia-xconfig --query-gpu-info to obtain the bus ID of the GPU. Example:

Number of GPUs: 1

GPU #0:
  Name      : Tesla V100-PCIE-16GB
  UUID      : GPU-1620f7d6-0bfa-a63b-5f1c-dbbf045e79de
  PCI BusID : PCI:1@48:0:0

  Number of Display Devices: 0

Create an appropriate xorg.conf file for headless operation:

sudo nvidia-xconfig -a --allow-empty-initial-configuration --use-display-device=None \
--virtual=1920x1200 --busid PCI:1@48:0:0

Replace {busid} with the bus ID you obtained in Step 1. Leave out --use-display-device=None if the GPU is headless, i.e. if it has no display outputs.

This will generate the following /etc/X11/xorg.conf file:

cat xorg.conf
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 418.67

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"

    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "Tesla V100-PCIE-16GB"
    BusID          "PCI:1@48:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "UseDisplayDevice" "None"
    SubSection     "Display"
        Virtual     1920 1200
        Depth       24
    EndSubSection
EndSection

Step 02.04: Verify nvidia driver installation.

Check installed libraries.

cd /usr/lib/powerpc64le-linux-gnu/

# nvidia libraries
ls -la libnvidia*

lrwxrwxrwx 1 root root       18 Aug 11 02:52 libnvidia-cfg.so -> libnvidia-cfg.so.1
lrwxrwxrwx 1 root root       23 Aug 11 02:52 libnvidia-cfg.so.1 -> libnvidia-cfg.so.418.67
-rwxr-xr-x 1 root root   207904 Aug 11 02:52 libnvidia-cfg.so.418.67
-rwxr-xr-x 1 root root 25914464 Aug 11 02:52 libnvidia-eglcore.so.418.67
lrwxrwxrwx 1 root root       30 Aug 11 02:52 libnvidia-egl-wayland.so.1 -> libnvidia-egl-wayland.so.1.1.2
-rwxr-xr-x 1 root root    44056 Aug 11 02:52 libnvidia-egl-wayland.so.1.1.2
lrwxrwxrwx 1 root root       21 Aug 11 02:52 libnvidia-encode.so -> libnvidia-encode.so.1
lrwxrwxrwx 1 root root       26 Aug 11 02:52 libnvidia-encode.so.1 -> libnvidia-encode.so.418.67
-rwxr-xr-x 1 root root   155400 Aug 11 02:52 libnvidia-encode.so.418.67
-rwxr-xr-x 1 root root   334904 Aug 11 02:52 libnvidia-fatbinaryloader.so.418.67
-rwxr-xr-x 1 root root 26397320 Aug 11 02:52 libnvidia-glcore.so.418.67
-rwxr-xr-x 1 root root   677904 Aug 11 02:52 libnvidia-glsi.so.418.67
-rwxr-xr-x 1 root root 14122992 Aug 11 02:52 libnvidia-glvkspirv.so.418.67
-rwxr-xr-x 1 root root  1574264 Aug 11 02:52 libnvidia-gtk2.so.418.67
lrwxrwxrwx 1 root root       17 Aug 11 02:52 libnvidia-ml.so -> libnvidia-ml.so.1
lrwxrwxrwx 1 root root       22 Aug 11 02:52 libnvidia-ml.so.1 -> libnvidia-ml.so.418.67
-rwxr-xr-x 1 root root  1577472 Aug 11 02:52 libnvidia-ml.so.418.67
lrwxrwxrwx 1 root root       26 Aug 11 02:52 libnvidia-opencl.so.1 -> libnvidia-opencl.so.418.67
-rwxr-xr-x 1 root root 29180824 Aug 11 02:52 libnvidia-opencl.so.418.67
lrwxrwxrwx 1 root root       26 Aug 11 02:52 libnvidia-opticalflow.so -> libnvidia-opticalflow.so.1
lrwxrwxrwx 1 root root       31 Aug 11 02:52 libnvidia-opticalflow.so.1 -> libnvidia-opticalflow.so.418.67
-rwxr-xr-x 1 root root    97824 Aug 11 02:52 libnvidia-opticalflow.so.418.67
lrwxrwxrwx 1 root root       29 Aug 11 02:52 libnvidia-ptxjitcompiler.so -> libnvidia-ptxjitcompiler.so.1
lrwxrwxrwx 1 root root       34 Aug 11 02:52 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.418.67
-rwxr-xr-x 1 root root  8039952 Aug 11 02:52 libnvidia-ptxjitcompiler.so.418.67
-rwxr-xr-x 1 root root     5264 Aug 11 02:52 libnvidia-tls.so.418.67

# egl libraries
ls -la libEGL*

lrwxrwxrwx 1 root root      20 Jul 18 11:44 libEGL_mesa.so.0 -> libEGL_mesa.so.0.0.0
-rw-r--r-- 1 root root  398384 Jul 18 11:44 libEGL_mesa.so.0.0.0
lrwxrwxrwx 1 root root      23 Aug 11 02:52 libEGL_nvidia.so.0 -> libEGL_nvidia.so.418.67
-rwxr-xr-x 1 root root 1266168 Aug 11 02:52 libEGL_nvidia.so.418.67
lrwxrwxrwx 1 root root      11 Aug 11 02:52 libEGL.so -> libEGL.so.1
lrwxrwxrwx 1 root root      15 Aug 11 02:52 libEGL.so.1 -> libEGL.so.1.1.0
-rwxr-xr-x 1 root root   86944 Aug 11 02:52 libEGL.so.1.1.0

# glx libraries
ls -la libGLX*

lrwxrwxrwx 1 root root      23 Aug 11 02:52 libGLX_indirect.so.0 -> libGLX_nvidia.so.418.67
lrwxrwxrwx 1 root root      20 Jul 18 11:44 libGLX_mesa.so.0 -> libGLX_mesa.so.0.0.0
-rw-r--r-- 1 root root  725688 Jul 18 11:44 libGLX_mesa.so.0.0.0
lrwxrwxrwx 1 root root      23 Aug 11 02:52 libGLX_nvidia.so.0 -> libGLX_nvidia.so.418.67
-rwxr-xr-x 1 root root 1661296 Aug 11 02:52 libGLX_nvidia.so.418.67
lrwxrwxrwx 1 root root      11 Aug 11 02:52 libGLX.so -> libGLX.so.0
-rwxr-xr-x 1 root root   80472 Aug 11 02:52 libGLX.so.0

Step 02.05: Setup nvidia opengl and vulkan driver icds.

Ensure that the glvnd driver icd exists.

sudo nano /usr/share/glvnd/egl_vendor.d/10_nvidia.json

Ensure that the following entries exist:

{
    "file_format_version" : "1.0.0",
    "ICD" : {
        "library_path" : "libEGL_nvidia.so.0"
    }
}

Create that the vulkan driver icd.

Create the file

sudo mkdir -p /usr/share/vulkan/icd.d
sudo nano /usr/share/vulkan/icd.d/nvidia_icd.json

Add the following entries:

{
    "file_format_version" : "1.0.0",
    "ICD" : {
        "library_path": "libGLX_nvidia.so.0",
        "api_version" : "1.1.99"
    }
}

[TODO: Add steps for installing vulkan-sdk.]

Reboot the system:

sudo reboot -i NOW

Run nvidia-smi to check if the driver is installed properly and can detect the GPU:

nvidia-smi

Sun Aug 11 04:03:34 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000030:01:00.0 Off |                    0 |
| N/A   35C    P0    26W / 250W |      0MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I think the current version of the nvidia-driver-418.67 for ppc64el doesn’t include the files for supporting ray-tracing on linux for vulkan.

The two missing files are:

  • libnvidia-cbl.so.$NVIDIA_DRIVER_VERSION
  • libnvidia-rtcore.so.$NVIDIA_DRIVER_VERSION

So, if you put a Quadro RTX 6000 or 8000 into a Power9 box, at the moment, ray-tracing using Vulkan will not work on linux.

You’ll have to wait for a newer release of the nvidia-driver to come out for ppc64el, e.g nvidia-driver-430.40 or higher.

You can find my setup guide for setting up a Raptor Talos II secure workstation with NVIDIA GPUs here:

Noticing divergence between the x86-64/ppc64le cuda headers for the same version of the same package in Ubuntu 18.04 for something I’d think would be cross-platform… Noticed an error about missing ‘cudaEGL.h’ when trying to build the cuda-samples on ppc64le, but not x86-64.

$ uname -m
x86_64
$ docker run --rm nvidia/cuda:10.1-devel-ubuntu18.04 dpkg -L cuda-misc-headers-10-1 | grep -i gl
/usr/local/cuda-10.1/targets/x86_64-linux/include/thrust/system/cuda/detail/cub/agent/single_pass_scan_operators.cuh
/usr/local/cuda-10.1/targets/x86_64-linux/include/thrust/detail/config/global_workarounds.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/CL/cl_gl_ext.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/CL/cl_gl.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/CL/cl_egl.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/cuda_egl_interop.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/cudaGL.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/cudaEGL.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/cuda_gl_interop.h
$ uname -m
ppc64le
$ docker run --rm nvidia/cuda-ppc64le:10.1-devel-ubuntu18.04 dpkg -L cuda-misc-headers-10-1 | grep -i gl
/usr/local/cuda-10.1/targets/ppc64le-linux/include/cuda_egl_interop.h
/usr/local/cuda-10.1/targets/ppc64le-linux/include/cuda_gl_interop.h
/usr/local/cuda-10.1/targets/ppc64le-linux/include/cudaGL.h
/usr/local/cuda-10.1/targets/ppc64le-linux/include/thrust/system/cuda/detail/cub/agent/single_pass_scan_operators.cuh
/usr/local/cuda-10.1/targets/ppc64le-linux/include/thrust/detail/config/global_workarounds.h