I found the after allocate the global memory, the value of all elements will not always be zero.
Is it normall?
And, if its normal, how to clear it faster instead of put zero on all elements one by one?
well … its always like that. On the cpu as well, setting all the allocated memory to a certain value takes time. And if you are going to copy data to the allocated space then there is no point in setting an initial value.
in cuda you can use cudaMemset to set a value to a memory space very quickly.
I can’t remember what version of CUDA I was using at the time but I actually found cudaMemset to be quite slow. I found it was quicker to write a simple kernel to do it.
this can be very true, there is a thread here of someone writing a simple kernel to do device to device mem copies and it was also faster then using cudaMemcpy. Of course you need to do it smartly using the big chunks that the gpu can write at once (128 bytes i think).
I got it, thanks for your reply.~ :rolleyes: