Cannot get CUDA kernels to run in Visual Studio

Hello-
I’m starting to learn CUDA programming using c++ in visual studio. I’ve been going through Nvidia’s deep learning institute and am at the very beginning of the accelerated computing course. I downloaded the latest version of Nsight Visual Studio Edition. I wrote a program that adds two arrays. Array 1 is filled with 1’s and array 2 is filled with 2’s. I add the two arrays using an add function that is supposed to invoke a CUDA kernel. The kernel is not getting invoked. If I run the program normally, it will compile and complete without the kernel ever having been invoked, so the two arrays don’t get added up. If I step into the program to debug, I get the error below. Has anyone had experience with this and can they help me out?

The call to invoke the kernel is “add<<<1, 1>>>(N, x, y);”

Thanks in advance
Daniel G

include
#include<math.h>
#include<cuda.h>
#include<cuda_runtime.h>
#include<device_launch_parameters.h>

global
void add(int n, float *x, float *y)
{
for (int i = 0; i < n; i++)
y[i] = x[i] + y[i];
}

int main(void) {
int N = 1 << 20;

float* x, * y;

// Allocate unified memory - accessible from the CPU or GPU
cudaMallocManaged(&x, N * sizeof(float));
cudaMallocManaged(&y, N * sizeof(float));

//initialize x and y arrays on the host
for (int i = 0; i < N; i++) {
	x[i] = 1.0f;
	y[i] = 2.0f;
}

// Run kernel on 1M elements on the GPU
add<<<1, 1>>>(N, x, y);

// Wait for the GPU to finish before accessing the data
cudaDeviceSynchronize();

// Check for errors (all values should be 3.0f)
float maxError = 0.0f;
for (int i = 0; i < N; i++)
	maxError = fmax(maxError, fabs(y[i] - 3.0f));
std::cout << "Max error: " << maxError << std::endl;
std::cout << y[0] << std::endl;  //just checking what y[0] is

// Free memory
cudaFree(x);
cudaFree(y);

return 0;

}

Error if I step into the program. It stops at the kernel invocation.

tmpxft_00003d08_00000000-7_kernel.cudafe1.stub.c not found

Hi, @daniel.guerand

As you said,

  • If I run the program normally, it will compile and complete without the kernel ever having been invoked, so the two arrays don’t get added up.
    Then there must be something wrong with the code writing causing the kernel actually not executed.

Please refer cuda-samples/Samples/0_Introduction/vectorAdd/vectorAdd.cu at master · NVIDIA/cuda-samples · GitHub to do the simple add.

Any other programming problem, please seek help in “CUDA Programming and Performance” forum directly.

Hi Veraj-

Thanks for the response. I tried running that CUDA code sample you linked. I get the error “addKernel launch failed: the provided PTX was compiled with and unsupported toolchain.” It seems to fail when the CUDA kernels are invoked, same as my previous program.

I checked that error and found that other people have had the same issue. The recommendation is to get the latest GPU driver version. I have the latest driver installed along with CUDA toolkit 12.4. Any help you can give me on this error?

Thanks
Daniel

How do you compile the sample and which driver/GPU are you using ?

Hi Veraj-

I was going through the installation instructions for the CUDA toolkit and it mentioned that the graphics card driver should be installed through the toolkit. So, I reinstalled the toolkit and let it install the graphics card driver. Everything now works. I appreciate you helping me out.

Daniel

Good to know about this. Enjoy CUDA !

This topic was automatically closed after 46 hours. New replies are no longer allowed.