Libtorch with OpenACC doesnt work

I’ve tried a simple code in C++ to test libtorch:

int main() {
  torch::Tensor out = torch::eye( { 3, 3 }, torch::device(at::kCUDA).dtype(at::kFloat) );
  std::cout << out << std::endl;
  return 0;
}

After some tinkering, it worked ok:

1 0 0
0 1 0
0 0 1
[ CUDAFloatType{3,3} ]

However, after trying to compile it with OpenACC, it doesnt work anymore…

#define SIZE 10000
float a[SIZE][SIZE];
float b[SIZE][SIZE];
float c[SIZE][SIZE];
int main() {
  int i,j;
  for (i = 0; i < SIZE; ++i) {
    for (j = 0; j < SIZE; ++j) {
        a[i][j] = (float)i + j;
        b[i][j] = (float)i - j;
        c[i][j] = 0.0f;
    }
  }
  #pragma acc kernels copyin(a,b) copy(c)
  for (i = 0; i < SIZE; ++i) {
    for (j = 0; j < SIZE; ++j) {
      c[i][j] = a[i][j] + b[i][j];
    }
  }
  torch::Tensor out = torch::eye( { 3, 3 }, torch::device(at::kCUDA).dtype(at::kFloat) );
  std::cout << out << std::endl;
  return 0;
}

It only evaluates at CPU now.

1 0 0
0 1 0
0 0 1
[ CPUFloatType{3,3} ]

My cmake is as minimal as I could do it:

cmake_minimum_required(VERSION 3.10)
set (LANGUAGES "CXX")
project(main LANGUAGES ${LANGUAGES})
add_definitions("-DENABLE_SSE")
find_package(Torch REQUIRED)
find_package(OpenACC REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenACC_CXX_FLAGS} ${TORCH_CXX_FLAGS} ")
add_executable(main main.cpp)
target_link_libraries(main "${TORCH_LIBRARIES}")
target_link_libraries(main "${TORCH_CUDA_LIBRARIES}")
target_link_libraries(main ${OpenACC_CXX_OPTIONS})
set_property(TARGET main PROPERTY CXX_STANDARD 14)

I’m trying to expose memory from torch via cuda array interface to openacc, that’s why I’m compiling both frameworks in the same code. I’m using nvc++, since I couldn’t even compile using alternatives, like gcc.

Hi Marcelo.olsi,

Sorry, but I’m not familiar with libtorch so don’t know what would have it fall back to using the host.

Can you run the make command in verbose mode (i.e. “make VERBOSE=1”) and post the compilation and link lines for this source file? I’d like to see what compiler options are being used.

My only guess is that since libtourch most likely uses CUDA under the hood, you need to compile and link with the “-cuda”.

-Mat

Hi Math!

Here is what you asked me:


/opt/nvidia/hpc_sdk/Linux_x86_64/22.3/compilers/bin/nvc++ -DENABLE_SSE -DUSE_C10D_GLOO -DUSE_C10D_MPI -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -isystem /opt/conda/lib/python3.8/site-packages/torch/include -isystem /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -acc -D_GLIBCXX_USE_CXX11_ABI=1 -D_GLIBCXX_USE_CXX11_ABI=1 --c++14 --gnu_extensions -MD -MT CMakeFiles/main.dir/main.cpp.o -MF CMakeFiles/main.dir/main.cpp.o.d -o CMakeFiles/main.dir/main.cpp.o -c /codes/teste01/main.cpp


/opt/nvidia/hpc_sdk/Linux_x86_64/22.3/compilers/bin/nvc++ -acc -D_GLIBCXX_USE_CXX11_ABI=1 CMakeFiles/main.dir/main.cpp.o -o main -Wl,-rpath,/opt/conda/lib/python3.8/site-packages/torch/lib:/usr/local/cuda/lib64 /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch.so /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/libnvrtc.so /usr/local/cuda/lib64/libnvToolsExt.so /usr/local/cuda/lib64/libcudart.so /opt/conda/lib/python3.8/site-packages/torch/lib/libc10_cuda.so /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/libnvrtc.so /usr/local/cuda/lib64/libnvToolsExt.so /usr/local/cuda/lib64/libcudart.so /opt/conda/lib/python3.8/site-packages/torch/lib/libc10_cuda.so -acc -Wl,–no-as-needed,"/opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so" -Wl,–as-needed -Wl,–no-as-needed,"/opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so" -Wl,–as-needed /opt/conda/lib/python3.8/site-packages/torch/lib/libc10_cuda.so /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so /usr/local/cuda/lib64/libcufft.so /usr/local/cuda/lib64/libcurand.so /usr/local/cuda/lib64/libcublas.so /usr/lib/x86_64-linux-gnu/libcudnn.so -Wl,–no-as-needed,"/opt/conda/lib/python3.8/site-packages/torch/lib/libtorch.so" -Wl,–as-needed /usr/local/cuda/lib64/libnvToolsExt.so /usr/local/cuda/lib64/libcudart.so


Thanks.

Let’s try adding the “-cuda” flag to you’re link options so the CUDA libraries from the NVHPC SDK are added.

Though in looking at the libraries being added, I see things like “libc10_cuda.so” and “/usr/local/cuda/lib64/stubs/libcuda.so”, and “/usr/local/cuda/lib64/libcudart.so”. Not sure if that means that libTouch is compiled to work with CUDA 10.x or if the “10” means something else. Also, it is linking with the locally install CUDA version.

Do you know what version of CUDA is installed in “/usr/local/cuda”?

One possibility is a mismatch in the CUDA version since NVHPC 22.3 will use CUDA 11.6 by default. If you installed the multi-CUDA package of NVHPC, you can try adding the flag “-gpu=cuda10.2” so CUDA 10.2 will be used instead. Or if you can configure you cmake to use the CUDA 11.6 that ships with the compilers: “/opt/nvidia/hpc_sdk/Linux_x86_64/22.3/cuda/11.6”

Again, these are pure guesses, but worth a try.

-Mat

Local CUDA instalation comes from this image: nvcr.io/nvidia/pytorch:22.04-py3
(I’m using a custom docker image, if that helps… I can share it; but there is little to no modification on Nvidia base image, just some missing libs to compile OpenACC, like gcc-offload-nvptx, and NVHPC itself)

Ie, I’m using CUDA 11.6 .
That C10 is an internal lib to torch. Kinda misleading, I agree… but you can check discussion here: Document what is C10 · Issue #14850 · pytorch/pytorch · GitHub

I tried using this CUDA 11.6, but compiler returned
make[2]: *** No rule to make target 'CUDA_curand_LIBRARY-NOTFOUND', needed by 'main'. Stop.
Checking both file trees, I saw some missing libraries in the CUDA that came with compiler. Like cublas, there is no headers or .so lib files . Is it possible that it is somewhere else, or is it really missing?

Ok, so then it’s probably not a CUDA version mismatch issue.

Is it possible that it is somewhere else, or is it really missing?

Since we include more math libraries in the HPC SDK than those included in the CUDA SDK, these get moved to the “/opt/nvidia/hpc_sdk/Linux_x86_64/22.3/math_libs/<cuda_ver>” directory.

One thing to find out is how libtorch determines which version to use. My next theory is that it’s attempting to open a CUDA context but since one’s already opened by the OpenACC runtime, it’s failing and then falling back to use the CPU version. No idea if this is that case or not, but if so, it seems like a poor design choice since the same issue would occur if you’re using CUDA.

To test this theory, move the torch::eye call to the beginning of your program, before any OpenACC directives. Or if libtorch has an initialization call, add that early in the program. The OpenACC runtime will detect if a CUDA context is already created and use it before creating it’s own.

FYI, another engineer let me know Torch has a “torch.cuda.init()” call. Try adding this before the OpenACC code.