malloc & cudaMalloc confusion over initialization of the two

We know that ‘malloc’ allocates a memory block & does not initialize the pointers.
Does ‘cudaMalloc’ initialize the memory block (pointers) along with allocating it?

Because in one of the problems in image rotation,
while executing the CUDA code in GPU (using cudaMalloc), copying the pixel values from the original image(a) to final rotated image(b), the unused pixel locations that have not been fed with pixel values(in b) show some color in the final rotated image (that means there is some value getting fed, which should not have been the case).
On the contrary, in C code(using malloc), the unused pixel locations show black color (which should have been containing some garbage value).
Please clarify the confusion.

No it doesn’t.

No it doesn’t.

Then Sir, do I need to launch another kernel to initialize the memory block?

This will near about double the current processing time.

Is there any other way out?

Then Sir, do I need to launch another kernel to initialize the memory block?

This will near about double the current processing time.

Is there any other way out?

cudaMemset if you’re OK filling it with a particular byte.

cudaMemset if you’re OK filling it with a particular byte.

Sir, I’ll try this on GPU and comment back.

Thanks.

Sir, I’ll try this on GPU and comment back.

Thanks.

Which will, of course, launch another kernel ;)

Which will, of course, launch another kernel ;)

CudaMemset() did the trick.

But Sir, does cudaMemset launch another kernel?

Because there has been no difference in the previous and the new timings.

CudaMemset() did the trick.

But Sir, does cudaMemset launch another kernel?

Because there has been no difference in the previous and the new timings.

It does launch a kernel, and your results only suggest that you aren’t measuring the execution times correctly in the first place.

It does launch a kernel, and your results only suggest that you aren’t measuring the execution times correctly in the first place.

Sir, should we measure the timing of the kernel only?

or also the memory copy operations (along with cudaMemset)?

Because what I am doing is that I am only measuring the timing of the kernel.

Sir, should we measure the timing of the kernel only?

or also the memory copy operations (along with cudaMemset)?

Because what I am doing is that I am only measuring the timing of the kernel.

I suspect you are only measuring the kernel launch time, not the kernel execution time.

I suspect you are only measuring the kernel launch time, not the kernel execution time.

No sir,

its the execution time of the kernel.