I’ve just started to learn CUDA and have created a simple program that creates a 2D array of int, assigns the memory on the device and then copies the array onto the device. Eventually I want to expand this into a graph searching algorithm. However, when using an array with 1,000 verticies (indicies) it simply crashes. As far as I can tell, its populating the array on the host that causes the crash.
Call me a noob but I thought that an array of this size was perfectly acceptable?
Here’s my code anyway
global void myKernel(int* deviceArrayPtr, int pitch)
int* deviceArrayPtr; size_t devicePitch, hostPitch, width, height; int hostArray; width = 1000; height = 1000; for(int i = 0; i < 1000; i ++) for(int j = 0; j < 1000; j ++) hostArray[i][j] = 20; //20 is an abitrary number //Allocates memory on the device cudaMallocPitch((void**)&deviceArrayPtr, &devicePitch, width * sizeof(int), height); hostPitch = devicePitch; //Copies hostArray onto the pre-allocated device memory cudaMemcpy2D(deviceArrayPtr, devicePitch, &hostArray, hostPitch, width * sizeof(int), height, cudaMemcpyHostToDevice); myKernel <<< 100, 512 >>> (deviceArrayPtr, devicePitch);
Anyone have any ideas about this?