nvlink errors using dynamic parallelism with CUDA 9.1 on Tesla V100/Ubuntu 18.04

I’m diving into CUDA at the moment and was trying to use dynamic parallelism on my remote machine which is running Ubuntu 18.04 LTS with 4 Tesla V100 GPUs.

My code looks as follows (“slightly” modified):

#define LSIZE 5
#define RSIZE 5
#define LENGTH 5

// ...

__global__ void hammingDistance(const bool* left, const size_t size_l, const bool* right, const size_t size_r, int* out)
{
	if (size_l != size_r) {
		*out = -1;
		return;
	}

	for (int i = 0; i < size_l; ++i) {
		*out += left[i] ^ right[i];
	}
}

__global__ void executeMatching(bool** leftDescriptorSet, bool** rightDescriptorSet)
{
	for (size_t iLeft = 0; iLeft < LSIZE; ++iLeft) {
		bool* lDesc = leftDescriptorSet[iLeft];

		for (size_t iRight = 0; iRight < RSIZE; ++iRight) {
			bool* rDesc = rightDescriptorSet[iRight];
            
			int* sum = new int(0);
            
			hammingDistance<<<1, 1>>>(lDesc, LSIZE, rDesc, RSIZE, sum);

			// ...
		}
	}
}

// ...

int main() {
        // ...

	// example data
        bool *dev_aSetPtr, *dev_bSetPtr;
        cudaMallocManaged(&dev_aSetPtr, LSIZE * sizeof(bool));
	cudaMallocManaged(&dev_bSetPtr, RSIZE * sizeof(bool));

        // ...

	executeMatching<<<1, 1>>>(&dev_aSetPtr, &dev_bSetPtr);

        // ...
}

When compiling using

/usr/bin/nvcc /home/tibor/cuda_hm/hamming_matcher.cu -o /home/tibor/cuda_hm/hamming_matcher -gencode arch=compute_70,code=sm_70 -rdc=true

I keep getting an error:

nvlink error   : Undefined reference to 'cudaGetParameterBufferV2' in '/tmp/tmpxft_00006ca8_00000000-10_hamming_matcher.o'
nvlink error   : Undefined reference to 'cudaLaunchDeviceV2' in '/tmp/tmpxft_00006ca8_00000000-10_hamming_matcher.o'
The terminal process terminated with exit code: 255

Is there something wrong with my CUDA Toolkit installation?

Add

-lcudadevrt

to the end of your compile command line.

any dynamic parallelism CUDA sample code/project will also have a makefile that shows what is needed

That didn’t change anything unfortunately.

I tried using the same code on another machine with Windows 10 / VS 2019 and CUDA 10.2, linking cudadevrt.lib into the project which made it work like a charm, unfortunately I’m not getting it set up in the Linux dev env.

possibly a mismatched or corrupted linux install.

I shudder every time I see people using

/usr/bin/nvcc

As I prefer to only use

/usr/local/cuda/bin/nvcc

Well regardless of the path used, the solution was using the CUDA 10.0 compiler which made it work.