Just a question regarding cudaFreeArray() and if/why there is a difference in behaviour to cudaFree().
The way I understand it with global memory is as follows:
CUDA_SAFE_CALL( cudaMalloc( (void**)&d_working, size ) ); CUDA_SAFE_CALL( cudaMemcpy( d_working, d_data, size, cudaMemcpyDeviceToDevice ) ); kernel<<< grid, threads >>>(d_working, d_data, width, height); CUDA_SAFE_CALL( cudaFree( d_working ) );
Where ‘d_data’ is the device memory we wish to process & ‘d_working’ is a copy of our original to work from.
We call our kernel, which presumably passes control back to the host, while processing continues in the background on the GPU.
Next the bit I’m unsure about - we call cudaFree() on d_working, which is presumably still in use by the GPU. As the above code seems to work for me, it seems reasonable that ‘cudaFree’ either does a synchronize until the kernel is done, or the request to free the memory is queued.
Next - I tried to implement the above, while taking advantage of texture caching and linear interpolation:
CUDA_SAFE_CALL( cudaMallocArray(&d_working_array, &desc, width, height)); CUDA_SAFE_CALL( cudaMemcpyToArray(d_working_array, 0, 0, d_data, size, cudaMemcpyDeviceToDevice)); CUDA_SAFE_CALL( cudaBindTextureToArray( tex, d_working_array, desc ) ); kernel<<< grid, threads >>>(d_data, width, height); CUDA_SAFE_CALL( cudaFreeArray( d_working_array ) );
Strangely, with the above everything worked fine in ‘Debug’ mode (I’m not referring to emulation mode here), but when it came to running it in ‘Release’ mode, my system freezes (including mouse etc) for a couple of seconds, before disabling CUDA completely and in one instance, causing a BSOD.
After hours of scratching my head, I finally found that adding a ‘cudaThreadSynchronize()’ just before freeing the array fixed the problem.
So the real question to come from all of the above is why cudaFree appears to wait for the kernel to run, whereas cudaFreeArray kills the system (but only in a release build!)
The BSOD suggests this is perhaps a bug? Or is there perhaps likely to be something else going on here that I’m unaware of?
Thanks for bearing with me!
 I reproduced the problem in a simpler scenario and the cudaFreeArray code fails in both debug and release builds. I’m not sure why this would be - perhaps timing related? [/edit]