We have been given an Nvidia driver which supports ARM Neoverse N1 SDP (NVIDIA-Linux-aarch64-450.59.run).
Unfortunately it seems, that we’re experiencing compatibility issues using Nvidia’s driver version: 450.59 with CUDA 10.2. Interestingly, NVIDIA-SMI shows CUDA 11.0, although CUDA 11.0 has never been installed.
The nature of these issues is illegal memory access, that occurs both in our own code, and also in Nvidia’s official example code.
For example, the matrix multiplication CUDA sample code throws:
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “GeForce GTX 1080” with compute capability 6.1
Computing result using CUDA Kernel…
CUDA error at matrixMul.cu:185 code=700(cudaErrorIllegalAddress) “cudaEventCreate(&start)”
This issue could be reproduced by taking the following steps:
- Create a brand new system image from scratch for N1SDP
- Install CUDA 10.2 https://developer.nvidia.com/cuda-toolkit/arm
- Install the 450.59 driver (NVIDIA-Linux-aarch64-450.59.run)
- Try building and running any CUDA example (e.g. matrixMul)
In order to utilize GPUs, I think we would need to get all the relevant packages (e.g. CUDA, libcuDNN, libnvinfer and Tensor RT) that are compatible with the 450.59 video driver version on ARM architecture.
Bence Káposzta - AImotive