Nvc++ compiled program can't find CUDA-capable device

sanderkorteweg · November 9, 2022, 1:45pm

I’m trying to get the HPC SDK compilers working on Windows Subsystem for Linux (WSL). I’m running WSL with Ubuntu version 20.04. I’ve installed the CUDA Toolkit version 11.7 (not the latest version, but HPC SDK seems to only support 11.7 at the latest) and I’m able to use the nvcc compiler to compile and run basic CUDA programs, this all works fine. This indicates to me that there is no problem with the CUDA drivers or CUDA Toolkit installation.

Then, installing the correct version of HPC SDK is also okay. I can even compile programs using nvc++ -stdpar=gpu gpu=cc86 program.cpp, the problem lies in the first call to an STL algorithm that is also using the std::execution::par or std::execution::par_unseq execution-policies. Take for example, the following very basic sorting program:

#include <iostream>
#include <execution>
#include <algorithm>
#include <vector>

int main()
{
	std::vector<int> v = { 3, 2, 1 };

	std::sort(std::execution::par, v.begin(), v.end()); // or std::execution::par_unseq

	return 0;
}

This programs compilers fine, but then throws the following error:

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal
Aborted

I.e. this program cannot find and properly make a kernel call to the GPU. This is very confusing to me, especially because compiling a simple CUDA file with the same compiler flags compiles and runs just fine. Any help is appreciated!

MatColgrove · November 9, 2022, 9:14pm

Hi sanderkorteweg,

I don’t have a WSL system I can use to test your code, but running on a Linux system here, the code runs fine. Hence, I suspect that the issue is specific to your system.

What type of device do you have and which compiler version are you using?

What is the output from the commands “/usr/lib/wsl/lib/nvidia-smi” and “nvaccelinfo”?

Are you able to compile and run a simple Thrust example? (we using Thrust under the hood). For example:

% cat thrust.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include <cstdlib>
int main(void)
{
 // generate 32M random numbers on the host
 thrust::host_vector<int> h_vec(32 << 20);
 thrust::generate(h_vec.begin(), h_vec.end(), rand);
 // transfer data to the device
 thrust::device_vector<int> d_vec = h_vec;
 // sort data on the device
 thrust::sort(d_vec.begin(), d_vec.end());
 // transfer data back to host
 thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());
 return 0;
}
% nvcc thrust.cu -gencode arch=compute_86,code=sm_86 -std=c++17 ; a.out
%

-Mat

sanderkorteweg · November 15, 2022, 8:47pm

Thanks for your response Mat. nvidia-smi output:

Tue Nov 15 21:27:49 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 522.06       CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:09:00.0  On |                  N/A |
|  0%   37C    P8    14W / 170W |    348MiB / 12288MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

which indicates to me that the GPU is found. I’m a little bit worried about the CUDA version because HPC SDK seems to only be available for CUDA Toolkit version 11.7, could that be an issue? Indeed, running nvaccelinfo could not find a CUDA capable device so I guess there is something wrong with my installation of HPC SDK. I will try again and let you know, thanks for now!

MatColgrove · November 15, 2022, 9:19pm

This could mean that the runtime can’t find the CUDA driver, libcuda.so. I doubt it’s a specific issue with the 11.8 driver (11.8 can run using binaries built with older CUDA versions), but maybe it put libcuda.so in an unexpected location?

Search for where libcuda.so is installed and then set the environment variable LD_LIBRARY_PATH to include this directory.

No idea if it will fix it, but worth a try.

-Mat

sanderkorteweg · November 16, 2022, 8:48am

Thanks Mat! This indeed solved it.

system · November 30, 2022, 8:49am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.