Simple 2D Array Program Crashing

Odc · October 15, 2009, 2:53pm

I’ve just started to learn CUDA and have created a simple program that creates a 2D array of int, assigns the memory on the device and then copies the array onto the device. Eventually I want to expand this into a graph searching algorithm. However, when using an array with 1,000 verticies (indicies) it simply crashes. As far as I can tell, its populating the array on the host that causes the crash.

Call me a noob but I thought that an array of this size was perfectly acceptable?

Here’s my code anyway

[codebox]#include <stdio.h>

#include <cuda.h>

global void myKernel(int* deviceArrayPtr, int pitch)

{

}

main()

{

int* deviceArrayPtr;

size_t devicePitch, hostPitch, width, height;

int hostArray[1000][1000];

width = 1000;

height = 1000;

for(int i = 0; i < 1000; i ++)

	for(int j = 0; j < 1000; j ++)

		hostArray[i][j] = 20;           //20 is an abitrary number



//Allocates memory on the device

cudaMallocPitch((void**)&deviceArrayPtr, &devicePitch, width * sizeof(int), height);

hostPitch = devicePitch;

//Copies hostArray onto the pre-allocated device memory

cudaMemcpy2D(deviceArrayPtr, devicePitch, &hostArray, hostPitch, width * sizeof(int), height, cudaMemcpyHostToDevice);

myKernel <<< 100, 512 >>> (deviceArrayPtr, devicePitch);

}[/codebox]

Anyone have any ideas about this?

Thibaud · October 15, 2009, 3:15pm

Hi !

I’ve just got the same problem two hours ago ! But not with such big arrays !

The only solution I have found was to use multiple of 2 and to define a good size of grid & thread per block !

I’ve tried on your program, I did not test if it prints the good values but there is no more “segmentation fault” !

Here is the code modification I’ve made :

#include <stdio.h>

#include <cuda.h>

__global__ void myKernel(int* deviceArrayPtr, int pitch)

{

}

main()

{

int* deviceArrayPtr;

size_t devicePitch, hostPitch, width, height;

int hostArray[1024][1024];

width = 1024;

height = 1024;

for(int i = 0; i < 1024; i ++)

for(int j = 0; j < 1024; j ++)

hostArray[i][j] = 20; //20 is an abitrary number

//Allocates memory on the device

cudaMallocPitch((void**)&deviceArrayPtr, &devicePitch, width * sizeof(int), height);

hostPitch = devicePitch;

//Copies hostArray onto the pre-allocated device memory

cudaMemcpy2D(deviceArrayPtr, devicePitch, &hostArray, hostPitch, width * sizeof(int), height, cudaMemcpyHostToDevice);

dim3 threadPerBlock(16,16);

dim3 dimGrid(width/threadPerBlock.x , height/threadPerBlock.y);

myKernel <<< dimGrid, threadPerBlock >>> (deviceArrayPtr, devicePitch);

}

Odc · October 15, 2009, 4:03pm

Hi !

I’ve just got the same problem two hours ago ! But not with such big arrays !

The only solution I have found was to use multiple of 2 and to define a good size of grid & thread per block !

I’ve tried on your program, I did not test if it prints the good values but there is no more “segmentation fault” !

Here is the code modification I’ve made :

#include <stdio.h>

#include <cuda.h>

__global__ void myKernel(int* deviceArrayPtr, int pitch)

{

}

main()

{

int* deviceArrayPtr;

size_t devicePitch, hostPitch, width, height;

int hostArray[1024][1024];

width = 1024;

height = 1024;

for(int i = 0; i < 1024; i ++)

for(int j = 0; j < 1024; j ++)

hostArray[i][j] = 20; //20 is an abitrary number

//Allocates memory on the device

cudaMallocPitch((void**)&deviceArrayPtr, &devicePitch, width * sizeof(int), height);

hostPitch = devicePitch;

//Copies hostArray onto the pre-allocated device memory

cudaMemcpy2D(deviceArrayPtr, devicePitch, &hostArray, hostPitch, width * sizeof(int), height, cudaMemcpyHostToDevice);

dim3 threadPerBlock(16,16);

dim3 dimGrid(width/threadPerBlock.x , height/threadPerBlock.y);

myKernel <<< dimGrid, threadPerBlock >>> (deviceArrayPtr, devicePitch);

}

Thanks for the quick reply :).

I have just used your modifications but I’m still getting the same problems. When the console opens, it just stops responding straight away with no error codes.