I’m trying to expose memory from torch via cuda array interface to openacc, that’s why I’m compiling both frameworks in the same code. I’m using nvc++, since I couldn’t even compile using alternatives, like gcc.
Let’s try adding the “-cuda” flag to you’re link options so the CUDA libraries from the NVHPC SDK are added.
Though in looking at the libraries being added, I see things like “libc10_cuda.so” and “/usr/local/cuda/lib64/stubs/libcuda.so”, and “/usr/local/cuda/lib64/libcudart.so”. Not sure if that means that libTouch is compiled to work with CUDA 10.x or if the “10” means something else. Also, it is linking with the locally install CUDA version.
Do you know what version of CUDA is installed in “/usr/local/cuda”?
One possibility is a mismatch in the CUDA version since NVHPC 22.3 will use CUDA 11.6 by default. If you installed the multi-CUDA package of NVHPC, you can try adding the flag “-gpu=cuda10.2” so CUDA 10.2 will be used instead. Or if you can configure you cmake to use the CUDA 11.6 that ships with the compilers: “/opt/nvidia/hpc_sdk/Linux_x86_64/22.3/cuda/11.6”
Local CUDA instalation comes from this image: nvcr.io/nvidia/pytorch:22.04-py3
(I’m using a custom docker image, if that helps… I can share it; but there is little to no modification on Nvidia base image, just some missing libs to compile OpenACC, like gcc-offload-nvptx, and NVHPC itself)
I tried using this CUDA 11.6, but compiler returned make: *** No rule to make target 'CUDA_curand_LIBRARY-NOTFOUND', needed by 'main'. Stop.
Checking both file trees, I saw some missing libraries in the CUDA that came with compiler. Like cublas, there is no headers or .so lib files . Is it possible that it is somewhere else, or is it really missing?
Ok, so then it’s probably not a CUDA version mismatch issue.
Is it possible that it is somewhere else, or is it really missing?
Since we include more math libraries in the HPC SDK than those included in the CUDA SDK, these get moved to the “/opt/nvidia/hpc_sdk/Linux_x86_64/22.3/math_libs/<cuda_ver>” directory.
One thing to find out is how libtorch determines which version to use. My next theory is that it’s attempting to open a CUDA context but since one’s already opened by the OpenACC runtime, it’s failing and then falling back to use the CPU version. No idea if this is that case or not, but if so, it seems like a poor design choice since the same issue would occur if you’re using CUDA.
To test this theory, move the torch::eye call to the beginning of your program, before any OpenACC directives. Or if libtorch has an initialization call, add that early in the program. The OpenACC runtime will detect if a CUDA context is already created and use it before creating it’s own.