Hello-
I’m starting to learn CUDA programming using c++ in visual studio. I’ve been going through Nvidia’s deep learning institute and am at the very beginning of the accelerated computing course. I downloaded the latest version of Nsight Visual Studio Edition. I wrote a program that adds two arrays. Array 1 is filled with 1’s and array 2 is filled with 2’s. I add the two arrays using an add function that is supposed to invoke a CUDA kernel. The kernel is not getting invoked. If I run the program normally, it will compile and complete without the kernel ever having been invoked, so the two arrays don’t get added up. If I step into the program to debug, I get the error below. Has anyone had experience with this and can they help me out?
The call to invoke the kernel is “add<<<1, 1>>>(N, x, y);”
Thanks in advance
Daniel G
include
#include<math.h>
#include<cuda.h>
#include<cuda_runtime.h>
#include<device_launch_parameters.h>
global
void add(int n, float *x, float *y)
{
for (int i = 0; i < n; i++)
y[i] = x[i] + y[i];
}
int main(void) {
int N = 1 << 20;
float* x, * y;
// Allocate unified memory - accessible from the CPU or GPU
cudaMallocManaged(&x, N * sizeof(float));
cudaMallocManaged(&y, N * sizeof(float));
//initialize x and y arrays on the host
for (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
}
// Run kernel on 1M elements on the GPU
add<<<1, 1>>>(N, x, y);
// Wait for the GPU to finish before accessing the data
cudaDeviceSynchronize();
// Check for errors (all values should be 3.0f)
float maxError = 0.0f;
for (int i = 0; i < N; i++)
maxError = fmax(maxError, fabs(y[i] - 3.0f));
std::cout << "Max error: " << maxError << std::endl;
std::cout << y[0] << std::endl; //just checking what y[0] is
// Free memory
cudaFree(x);
cudaFree(y);
return 0;
}
Error if I step into the program. It stops at the kernel invocation.
tmpxft_00003d08_00000000-7_kernel.cudafe1.stub.c not found