Hi all…
im new to CUDA programming, and i have a problem (obvious External Image.
My task is to implement Ant colony optimisation for detecting edges on images. To be clear, im not here to ask someone to do my ‘homework’. I dont know dimensions of images - size of an array in which i would store its values (floats) isnt known for me until runtime, and it could be 256x256, 512x512, or smth else (lets say, whole image will take 1 or more mb). So, as far as i know, it only fits in global device memory, which is fine for me - my kernels will have easy access to values stored in that array.
My first problem was how to allocate global memory dynamically, which i solved. eg:
__device__ float *onDevice;
__global__ alloc(float *ptr) { onDevice = ptr; }
__global__ myKernel() { onDevice[threadIdx.x] += 0.3f; }
int main(){
float *dMem;
cudaMalloc((void**)&dMem, 64 * sizeof(float)); //allocate memory on device
cudaMemset(dMem, 0, 64 * sizeof(float));
alloc<<<1, 1>>>(dMem);
myKernel<<<1, 64>>> ();
float tmp[64];
cudaMemcpy(tmp, dMem, 64 * sizeof(float) , cudaMemcpyDeviceToHost);
//print all 64 floats to stdout
//cudaFree()???
}
first question is - is this approach good pracitce?
next, and more important question is - how to free allocated memory??
if run with cudaMemset(dMem, 0, 64 * sizeof(float)), initial values in tmp (after invoking of kernel) are 0.3 (clearly, 0 + 0.3 = 0.3). after that, i remove that line, compile, run over… values are now 0.6, next run 0.9…, even after restarting of computer. Each time i run program, i call cudaMalloc(). It should allocate new memory block somewhere in device memory (tell me if im wrong). But, why i get such result? Its look like memory is allocated and assigned only once, and after that each time i run program, kernel goes through that array and increment every value again. How to dealloc that memory? Im rly confused with that, please help :)
Thanks all
edit - i searched forum, google… i didnt find answer…