Dynamic allocation/dealloc of global memory

Hi all…

im new to CUDA programming, and i have a problem (obvious :)).

My task is to implement Ant colony optimisation for detecting edges on images. To be clear, im not here to ask someone to do my ‘homework’. I dont know dimensions of images - size of an array in which i would store its values (floats) isnt known for me until runtime, and it could be 256x256, 512x512, or smth else (lets say, whole image will take 1 or more mb). So, as far as i know, it only fits in global device memory, which is fine for me - my kernels will have easy access to values stored in that array.

My first problem was how to allocate global memory dynamically, which i solved. eg:

__device__ float *onDevice;

__global__ alloc(float *ptr) { onDevice = ptr; }

__global__ myKernel() { onDevice[threadIdx.x] += 0.3f; }

int main(){

	float *dMem;

	cudaMalloc((void**)&dMem, 64 * sizeof(float)); //allocate memory on device

	cudaMemset(dMem, 0, 64 * sizeof(float));

	alloc<<<1, 1>>>(dMem);

	myKernel<<<1, 64>>> ();

	float tmp[64];

	cudaMemcpy(tmp, dMem, 64 * sizeof(float) , cudaMemcpyDeviceToHost);

	//print all 64 floats to stdout

	//cudaFree()???

}

first question is - is this approach good pracitce?

next, and more important question is - how to free allocated memory??

if run with cudaMemset(dMem, 0, 64 * sizeof(float)), initial values in tmp (after invoking of kernel) are 0.3 (clearly, 0 + 0.3 = 0.3). after that, i remove that line, compile, run over… values are now 0.6, next run 0.9…, even after restarting of computer. Each time i run program, i call cudaMalloc(). It should allocate new memory block somewhere in device memory (tell me if im wrong). But, why i get such result? Its look like memory is allocated and assigned only once, and after that each time i run program, kernel goes through that array and increment every value again. How to dealloc that memory? Im rly confused with that, please help :)

Thanks all

edit - i searched forum, google… i didnt find answer…

Hi all…

im new to CUDA programming, and i have a problem (obvious :)).

My task is to implement Ant colony optimisation for detecting edges on images. To be clear, im not here to ask someone to do my ‘homework’. I dont know dimensions of images - size of an array in which i would store its values (floats) isnt known for me until runtime, and it could be 256x256, 512x512, or smth else (lets say, whole image will take 1 or more mb). So, as far as i know, it only fits in global device memory, which is fine for me - my kernels will have easy access to values stored in that array.

My first problem was how to allocate global memory dynamically, which i solved. eg:

__device__ float *onDevice;

__global__ alloc(float *ptr) { onDevice = ptr; }

__global__ myKernel() { onDevice[threadIdx.x] += 0.3f; }

int main(){

	float *dMem;

	cudaMalloc((void**)&dMem, 64 * sizeof(float)); //allocate memory on device

	cudaMemset(dMem, 0, 64 * sizeof(float));

	alloc<<<1, 1>>>(dMem);

	myKernel<<<1, 64>>> ();

	float tmp[64];

	cudaMemcpy(tmp, dMem, 64 * sizeof(float) , cudaMemcpyDeviceToHost);

	//print all 64 floats to stdout

	//cudaFree()???

}

first question is - is this approach good pracitce?

next, and more important question is - how to free allocated memory??

if run with cudaMemset(dMem, 0, 64 * sizeof(float)), initial values in tmp (after invoking of kernel) are 0.3 (clearly, 0 + 0.3 = 0.3). after that, i remove that line, compile, run over… values are now 0.6, next run 0.9…, even after restarting of computer. Each time i run program, i call cudaMalloc(). It should allocate new memory block somewhere in device memory (tell me if im wrong). But, why i get such result? Its look like memory is allocated and assigned only once, and after that each time i run program, kernel goes through that array and increment every value again. How to dealloc that memory? Im rly confused with that, please help :)

Thanks all

edit - i searched forum, google… i didnt find answer…

You can use
cutilSafeCall( cudaMalloc…

cutilSafeCall( cudaFree(
repeatedbly but it has a high overhead
better to use cudaMalloc only when your program starts

Bill
ps: problem with formu - retyping

You can use
cutilSafeCall( cudaMalloc…

cutilSafeCall( cudaFree(
repeatedbly but it has a high overhead
better to use cudaMalloc only when your program starts

Bill
ps: problem with formu - retyping