Memory management

This isn’t really a question concerning CUDA, but rather a memory management issue I ran into while programming CUDA. I’m a novice in C programming and this is as far as I can see an error originating from faulty memory management. I’m used to java programming so this is quite new to me.

This is a function in my CUDA program and the idea is to run it several times in a for loop. It runs like it should the first time, but the second time it hangs on the line indicated by the comment in the code.

[codebox]int* calculateNewArray(int *oldMatrix, int oldSize, int oldRowLength, int newSize, int newRowLength, int newNumOfRows, int extraRows){

printf("Test3\n");

int *newMatrix_h, *newMatrix_d, *oldMatrix_d, *newMatrixReturned;

size_t newMSize = newSize*sizeof(int);

size_t oldMSize = oldSize*sizeof(int);

printf("Test4\n");

//Allocating space on the host for the result of the addition step

newMatrix_h = (int *)malloc(newMSize);		//<--------------------------The program hangs on this line the second time the function is run

printf("Test5\n");

newMatrixReturned = (int *)malloc(newMSize);



//Allocating space on the device for the old and the new Matrix

cudaMalloc((void **) &newMatrix_d, newMSize);		

cudaMalloc((void **) &oldMatrix_d, oldMSize);

//Initializing array on host

for(int i=0; i<newSize; i++){

	newMatrix_h[i]=(int)0;

}

//Copying the old and the new Matrix to the device

cudaMemcpy(newMatrix_d, newMatrix_h, sizeof(int)*oldMSize, cudaMemcpyHostToDevice);

cudaMemcpy(oldMatrix_d, oldMatrix, sizeof(int)*oldMSize, cudaMemcpyHostToDevice);

//Computing execution configuration

int blockSize = 100;

int nBlocks = 1;

//Running the kernel which adds together the rows of the old matrix

addMatrix <<< nBlocks, blockSize >>> (oldMatrix_d, oldRowLength, newMatrix_d, newSize, newRowLength, newNumOfRows, extraRows);

printf("Test6\n");

//Copying the result back to the host

cudaMemcpy(newMatrixReturned, newMatrix_d, sizeof(int)*newMSize, cudaMemcpyDeviceToHost);

printf("Test7\n");

return newMatrixReturned;

}[/codebox]

What should I do to make this function run several times without failing?

[quote name=‘orjanb314’ date=‘Oct 31 2008, 02:13 PM’ post=‘458515’]

This is a function in my CUDA program and the idea is to run it several times in a for loop. It runs like it should the first time, but the second time it hangs on the line indicated by the comment in the code.

Although I don’t see why this should hang your code it would probably be a good idea to free newMatrix_h before leaving the function.

I have tried to add “free(newMatrix_h);” in the end of the function, but that results strangely in an unhandled win32 exception.

Do you use the driver from your cards driver cd or is it the driver from nvidia.com/cuda?
Like theMatrix, I can’t see an error, too. Please upload your kernel and tell something about your system like os, architecture, gcc version :) Perhaps we can find the problem outside your code.

And unhandled Win32 exception when freeing memory somewhat sounds like you smashed your stack by doing memory copies that were larger than their destination. But I am not a Windows user, therefore this is just a wild guess.

cudaMemcpy(newMatrixReturned, newMatrix_d, sizeof(int)*newMSize, cudaMemcpyDeviceToHost);

Should be

cudaMemcpy(newMatrixReturned, newMatrix_d, newMSize, cudaMemcpyDeviceToHost);

Also, whenever you malloc or cudaMalloc something, you must free() and cudaFree() it. So you need three of those at the end, and remember to free newMatrixReturned after you’re done using it (normally you would allocate it not in this function but in the caller, and free it there too).

Thanks for the answers, the first thing I have tried now is to allocate more memory then needed to be on the safe side. The program no longer hangs, so I just need to figure out how to allocate and free memory properly.