Dear all,
I will have access for a while to a ibm POWER9 machine. I don’t know the spec for now, neither OS nor GPU…
what is the status of NVIDIA drivers on ppc64 ?
regards
Dear all,
I will have access for a while to a ibm POWER9 machine. I don’t know the spec for now, neither OS nor GPU…
what is the status of NVIDIA drivers on ppc64 ?
regards
Assuming it is a AC922 system, it is supported.
CUDA toolkit installers are at the usual location: [url]http://www.nvidia.com/getcuda[/url]
Refer to the linux install guide for instructions:
[url]Installation Guide Linux :: CUDA Toolkit Documentation
If you only want to install a driver, you can use a CUDA installer for that, or you can get standalone driver installers at the usual location: [url]Official Drivers | NVIDIA
thanks, my concern was about driver support…
I found only tesla p100/v100, I hope we’ll have this in the box
if not, there are no TITAN V ppc64 support available ?
only tesla p100 is supported in power8
only tesla v100 is supported in power9
documented in the linux install guide
[url]https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements[/url]
If you have an AC922, it has Tesla V100 in it.
Raptor Computing Systems sells Talos II PowerAI Development System based on POWER9 CPU and RTX2070 GPU. How can I download a RTX driver software for Linux running on POWER9 at the NVIDIA Driver Download site?
Thanks.
Probably you should contact Raptor Computing Systems for support of their system.
Yeah, I had asked them about it and they told that you should contact to NVIDIA. It´s our everyday experience in the world. :-(
NVIDIA doesn’t provide any display driver support on Power architectures. That includes the idea that we don’t provide any support for display-out, OpenGL, and other “typical” graphics activities.
With respect to GPU computing, the documented support for Power9 architecture is contained in the linux install guide:
[url]https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements[/url]
“(****) Only the Tesla GV100 GPU is supported for CUDA 10.1 on POWER9.”
So I would suggest that anything other than a Tesla V100 GPU is not supported on a power9 architecture system for GPU computing.
If Raptor is selling a system with an RTX GPU in it, I presumed, perhaps mistakenly, that they would have some reason to do so, and some kind of statement about how to use it. That is why I suggested that you check with them.
NVIDIA doesn’t publish, test, or create drivers that are designed to be used with a RTX GPU on Power9 architecture.
If Raptor is unable to give you any direction as to what works on their system, or how to use their system, I don’t think you’ll find it here.
None of my statements above should be construed to mean that something does not work. I haven’t tested it. Instead, the above statements mean that we don’t do anything to test or provide support for RTX GPUs on Power9, and I wouldn’t expect anything like that to work. It might be that it does work, I cannot say. Even if it does work, it (RTX GPUs on Power9) is entirely unsupported by NVIDIA, at this time.
You’re welcome to try anything you wish, of course.
Thank you for your comments. If I install CUDA toolkit with another required OS, compiler and toolchain, I don’t need any display driver of RTX for GPU computing, correct? If so, I’ll try to challenge the combination for GPU computing. It’s also our everyday experience in the world. :-)
No, that’s not correct.
The CUDA toolkit installer includes a driver installer.
You cannot use a GPU for anything without a driver installed for it.
You cannot use a NVIDIA GPU for CUDA computing without a NVIDIA GPU driver installed.
There is no driver qualified, tested, or advertised by NVIDIA for use with a RTX GPU in a Power9 platform, currently. For any purpose.
I’m not sure how many different ways I can say this. I probably won’t be able to respond for further requests for clarification on this subject.
Feel free to attempt to install anything you wish in any setting you wish. It might work.
@raph38130 I’ve just installed the nvidia-drivers for ppc64le on a talos dual-core Power9 system with an NVIDIA Tesla V100 16GB GPU.
The binary installer is much smaller than the ones for other architectures. It could be because 32-bit compatibility libraries are not included in the ppc64le arch, unlike the x86_64 driver downloads, or because support for certain GPU architectures are not included as part of the nvidia ppc64le driver download.
A quick look at the installed libraries show that the one for egl and vulkan driver support are all present in the nvidia ppc64le driver:
cd /usr/lib/powerpc64le-linux-gnu
ls libnvidia*
libnvidia-cfg.so libnvidia-gtk2.so.418.67
libnvidia-cfg.so.1 libnvidia-ml.so
libnvidia-cfg.so.418.67 libnvidia-ml.so.1
libnvidia-eglcore.so.418.67 libnvidia-ml.so.418.67
libnvidia-egl-wayland.so.1 libnvidia-opencl.so.1
libnvidia-egl-wayland.so.1.1.2 libnvidia-opencl.so.418.67
libnvidia-encode.so libnvidia-opticalflow.so
libnvidia-encode.so.1 libnvidia-opticalflow.so.1
libnvidia-encode.so.418.67 libnvidia-opticalflow.so.418.67
libnvidia-fatbinaryloader.so.418.67 libnvidia-ptxjitcompiler.so
libnvidia-glcore.so.418.67 libnvidia-ptxjitcompiler.so.1
libnvidia-glsi.so.418.67 libnvidia-ptxjitcompiler.so.418.67
libnvidia-glvkspirv.so.418.67 libnvidia-tls.so.418.67
The Titan-V with 12GB HBM2 is also a GV100 part. I think that this driver will definitely work with a Titan-V because both the Tesla V100 and the Titan-V use a GV100 GPU part.
In the past, support for other GPU platforms were all inclusive within a single driver download.
You can’t know for certain if a Quadro RTX 6000 or 8000 will work with the nvidia ppc64le driver unless you try it out, since no else else has access to this configuration or is willing to comment on compatibility at this point in time! ;-)
I’m planning to do this test, sometime soon. I’ll let you know when I find out but it might take about month, by the time I get an RTX 8000 to be able to test it out.
Here are my step-by-step instructions for installing ubuntu-desktop and the proprietary nvidia-drivers on a new ubuntu-18.04 server installation.
### Step 01.00: Install additional packages.
In Ubuntu, the opal-prd (Processor Runtime Diagnostics) package that is required for runtime detection and handling of Power processor errors on systems that are running OpenPower firmware is not installed by default. Run the following command to install this package:
```bash
sudo apt-get install opal-prd
Install compilers and build tools:
sudo apt-get install localepurge
sudo apt-get install build-essential dkms pkg-config pkg-config
We’re going to use tasksel
for the installation of the GNOME desktop. tasksel
is a Ubuntu and Debian-specific tool, which helps to install multiple related packages as a coordinated task.
sudo apt-get install tasksel -y
Once the above command completes, issue the following command to install the gnome desktop:
sudo tasksel ubuntu-desktop -y
When the process completes, reboot the server.
sudo reboot -i NOW
Ensure you are using only Nvidia proprietary drivers by blacklisting Nouveau
, Ubuntu’s built-in Open Source driver.
Create blacklist-nouveau.conf
sudo nano /etc/modprobe.d/blacklist-nouveau.conf
Include the following:
blacklist nouveau
options nouveau modeset=0
Enter the following linux command to regenerate initramfs:
sudo update-initramfs -u
Reboot your system:
sudo reboot -i NOW
List installed PCI devices:
sudo lspci
0000:00:00.0 PCI bridge: IBM Device 04c1
0001:00:00.0 PCI bridge: IBM Device 04c1
0001:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961
0002:00:00.0 PCI bridge: IBM Device 04c1
0003:00:00.0 PCI bridge: IBM Device 04c1
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM Device 04c1
0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:00:00.0 PCI bridge: IBM Device 04c1
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0030:00:00.0 PCI bridge: IBM Device 04c1
0030:01:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
0031:00:00.0 PCI bridge: IBM Device 04c1
0032:00:00.0 PCI bridge: IBM Device 04c1
0033:00:00.0 PCI bridge: IBM Device 04c1
We can see that the NVIDIA V100 GPU is connect to PCIe slot 0030:01:00.0
.
Download the driver:
NVIDIA_DRIVER_VERSION='418.67'
NVIDIA_DRIVER_RELEASE_DATE='2019.5.7'
OS_DISTRO='ubuntu'
OS_VERSION='1804'
ARCH='ppc64le'
wget -q --show-progress --progress=bar:force:noscroll http://us.download.nvidia.com/tesla/$NVIDIA_DRIVER_VERSION/NVIDIA-Linux-$ARCH-$NVIDIA_DRIVER_VERSION.run -O /tmp/NVIDIA-Linux-$ARCH-$NVIDIA_DRIVER_VERSION.run
Install the driver, with dkms
support and overwrite existing libglvnd
files.
sudo bash /tmp/NVIDIA-Linux-$ARCH-$NVIDIA_DRIVER_VERSION.run
In order to configure headless 3D GPU acceleration, you’ll have to use VirtualGL with TurboVNC.
VirtualGL works fine with headless NVIDIA GPUs (Tesla), but there are a few additional steps that need to be performed in order to run a headless 3D X server on these GPUs. These steps should be performed after installing the NVIDIA proprietary driver, but before configuring VirtualGL.
Run nvidia-xconfig --query-gpu-info
to obtain the bus ID of the GPU. Example:
Number of GPUs: 1
GPU #0:
Name : Tesla V100-PCIE-16GB
UUID : GPU-1620f7d6-0bfa-a63b-5f1c-dbbf045e79de
PCI BusID : PCI:1@48:0:0
Number of Display Devices: 0
Create an appropriate xorg.conf
file for headless operation:
sudo nvidia-xconfig -a --allow-empty-initial-configuration --use-display-device=None \
--virtual=1920x1200 --busid PCI:1@48:0:0
Replace {busid} with the bus ID you obtained in Step 1. Leave out --use-display-device=None if the GPU is headless, i.e. if it has no display outputs.
This will generate the following /etc/X11/xorg.conf
file:
cat xorg.conf
# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig: version 418.67
Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Screen0"
InputDevice "Keyboard0" "CoreKeyboard"
InputDevice "Mouse0" "CorePointer"
EndSection
Section "Files"
EndSection
Section "InputDevice"
# generated from default
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/psaux"
Option "Emulate3Buttons" "no"
Option "ZAxisMapping" "4 5"
EndSection
Section "InputDevice"
# generated from default
Identifier "Keyboard0"
Driver "kbd"
EndSection
Section "Monitor"
Identifier "Monitor0"
VendorName "Unknown"
ModelName "Unknown"
HorizSync 28.0 - 33.0
VertRefresh 43.0 - 72.0
Option "DPMS"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "Tesla V100-PCIE-16GB"
BusID "PCI:1@48:0:0"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Device0"
Monitor "Monitor0"
DefaultDepth 24
Option "AllowEmptyInitialConfiguration" "True"
Option "UseDisplayDevice" "None"
SubSection "Display"
Virtual 1920 1200
Depth 24
EndSubSection
EndSection
Check installed libraries.
cd /usr/lib/powerpc64le-linux-gnu/
# nvidia libraries
ls -la libnvidia*
lrwxrwxrwx 1 root root 18 Aug 11 02:52 libnvidia-cfg.so -> libnvidia-cfg.so.1
lrwxrwxrwx 1 root root 23 Aug 11 02:52 libnvidia-cfg.so.1 -> libnvidia-cfg.so.418.67
-rwxr-xr-x 1 root root 207904 Aug 11 02:52 libnvidia-cfg.so.418.67
-rwxr-xr-x 1 root root 25914464 Aug 11 02:52 libnvidia-eglcore.so.418.67
lrwxrwxrwx 1 root root 30 Aug 11 02:52 libnvidia-egl-wayland.so.1 -> libnvidia-egl-wayland.so.1.1.2
-rwxr-xr-x 1 root root 44056 Aug 11 02:52 libnvidia-egl-wayland.so.1.1.2
lrwxrwxrwx 1 root root 21 Aug 11 02:52 libnvidia-encode.so -> libnvidia-encode.so.1
lrwxrwxrwx 1 root root 26 Aug 11 02:52 libnvidia-encode.so.1 -> libnvidia-encode.so.418.67
-rwxr-xr-x 1 root root 155400 Aug 11 02:52 libnvidia-encode.so.418.67
-rwxr-xr-x 1 root root 334904 Aug 11 02:52 libnvidia-fatbinaryloader.so.418.67
-rwxr-xr-x 1 root root 26397320 Aug 11 02:52 libnvidia-glcore.so.418.67
-rwxr-xr-x 1 root root 677904 Aug 11 02:52 libnvidia-glsi.so.418.67
-rwxr-xr-x 1 root root 14122992 Aug 11 02:52 libnvidia-glvkspirv.so.418.67
-rwxr-xr-x 1 root root 1574264 Aug 11 02:52 libnvidia-gtk2.so.418.67
lrwxrwxrwx 1 root root 17 Aug 11 02:52 libnvidia-ml.so -> libnvidia-ml.so.1
lrwxrwxrwx 1 root root 22 Aug 11 02:52 libnvidia-ml.so.1 -> libnvidia-ml.so.418.67
-rwxr-xr-x 1 root root 1577472 Aug 11 02:52 libnvidia-ml.so.418.67
lrwxrwxrwx 1 root root 26 Aug 11 02:52 libnvidia-opencl.so.1 -> libnvidia-opencl.so.418.67
-rwxr-xr-x 1 root root 29180824 Aug 11 02:52 libnvidia-opencl.so.418.67
lrwxrwxrwx 1 root root 26 Aug 11 02:52 libnvidia-opticalflow.so -> libnvidia-opticalflow.so.1
lrwxrwxrwx 1 root root 31 Aug 11 02:52 libnvidia-opticalflow.so.1 -> libnvidia-opticalflow.so.418.67
-rwxr-xr-x 1 root root 97824 Aug 11 02:52 libnvidia-opticalflow.so.418.67
lrwxrwxrwx 1 root root 29 Aug 11 02:52 libnvidia-ptxjitcompiler.so -> libnvidia-ptxjitcompiler.so.1
lrwxrwxrwx 1 root root 34 Aug 11 02:52 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.418.67
-rwxr-xr-x 1 root root 8039952 Aug 11 02:52 libnvidia-ptxjitcompiler.so.418.67
-rwxr-xr-x 1 root root 5264 Aug 11 02:52 libnvidia-tls.so.418.67
# egl libraries
ls -la libEGL*
lrwxrwxrwx 1 root root 20 Jul 18 11:44 libEGL_mesa.so.0 -> libEGL_mesa.so.0.0.0
-rw-r--r-- 1 root root 398384 Jul 18 11:44 libEGL_mesa.so.0.0.0
lrwxrwxrwx 1 root root 23 Aug 11 02:52 libEGL_nvidia.so.0 -> libEGL_nvidia.so.418.67
-rwxr-xr-x 1 root root 1266168 Aug 11 02:52 libEGL_nvidia.so.418.67
lrwxrwxrwx 1 root root 11 Aug 11 02:52 libEGL.so -> libEGL.so.1
lrwxrwxrwx 1 root root 15 Aug 11 02:52 libEGL.so.1 -> libEGL.so.1.1.0
-rwxr-xr-x 1 root root 86944 Aug 11 02:52 libEGL.so.1.1.0
# glx libraries
ls -la libGLX*
lrwxrwxrwx 1 root root 23 Aug 11 02:52 libGLX_indirect.so.0 -> libGLX_nvidia.so.418.67
lrwxrwxrwx 1 root root 20 Jul 18 11:44 libGLX_mesa.so.0 -> libGLX_mesa.so.0.0.0
-rw-r--r-- 1 root root 725688 Jul 18 11:44 libGLX_mesa.so.0.0.0
lrwxrwxrwx 1 root root 23 Aug 11 02:52 libGLX_nvidia.so.0 -> libGLX_nvidia.so.418.67
-rwxr-xr-x 1 root root 1661296 Aug 11 02:52 libGLX_nvidia.so.418.67
lrwxrwxrwx 1 root root 11 Aug 11 02:52 libGLX.so -> libGLX.so.0
-rwxr-xr-x 1 root root 80472 Aug 11 02:52 libGLX.so.0
Ensure that the glvnd driver icd exists.
sudo nano /usr/share/glvnd/egl_vendor.d/10_nvidia.json
Ensure that the following entries exist:
{
"file_format_version" : "1.0.0",
"ICD" : {
"library_path" : "libEGL_nvidia.so.0"
}
}
Create that the vulkan driver icd.
Create the file
sudo mkdir -p /usr/share/vulkan/icd.d
sudo nano /usr/share/vulkan/icd.d/nvidia_icd.json
Add the following entries:
{
"file_format_version" : "1.0.0",
"ICD" : {
"library_path": "libGLX_nvidia.so.0",
"api_version" : "1.1.99"
}
}
[TODO: Add steps for installing vulkan-sdk.]
Reboot the system:
sudo reboot -i NOW
Run nvidia-smi
to check if the driver is installed properly and can detect the GPU:
nvidia-smi
Sun Aug 11 04:03:34 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000030:01:00.0 Off | 0 |
| N/A 35C P0 26W / 250W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I think the current version of the nvidia-driver-418.67 for ppc64el doesn’t include the files for supporting ray-tracing on linux for vulkan.
The two missing files are:
So, if you put a Quadro RTX 6000 or 8000 into a Power9 box, at the moment, ray-tracing using Vulkan will not work on linux.
You’ll have to wait for a newer release of the nvidia-driver to come out for ppc64el, e.g nvidia-driver-430.40 or higher.
You can find my setup guide for setting up a Raptor Talos II secure workstation with NVIDIA GPUs here:
Noticing divergence between the x86-64/ppc64le cuda headers for the same version of the same package in Ubuntu 18.04 for something I’d think would be cross-platform… Noticed an error about missing ‘cudaEGL.h’ when trying to build the cuda-samples on ppc64le, but not x86-64.
$ uname -m
x86_64
$ docker run --rm nvidia/cuda:10.1-devel-ubuntu18.04 dpkg -L cuda-misc-headers-10-1 | grep -i gl
/usr/local/cuda-10.1/targets/x86_64-linux/include/thrust/system/cuda/detail/cub/agent/single_pass_scan_operators.cuh
/usr/local/cuda-10.1/targets/x86_64-linux/include/thrust/detail/config/global_workarounds.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/CL/cl_gl_ext.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/CL/cl_gl.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/CL/cl_egl.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/cuda_egl_interop.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/cudaGL.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/cudaEGL.h
/usr/local/cuda-10.1/targets/x86_64-linux/include/cuda_gl_interop.h
$ uname -m
ppc64le
$ docker run --rm nvidia/cuda-ppc64le:10.1-devel-ubuntu18.04 dpkg -L cuda-misc-headers-10-1 | grep -i gl
/usr/local/cuda-10.1/targets/ppc64le-linux/include/cuda_egl_interop.h
/usr/local/cuda-10.1/targets/ppc64le-linux/include/cuda_gl_interop.h
/usr/local/cuda-10.1/targets/ppc64le-linux/include/cudaGL.h
/usr/local/cuda-10.1/targets/ppc64le-linux/include/thrust/system/cuda/detail/cub/agent/single_pass_scan_operators.cuh
/usr/local/cuda-10.1/targets/ppc64le-linux/include/thrust/detail/config/global_workarounds.h