CudaMemset internally calls a kernel?

CudaDevOps · April 1, 2023, 10:08am

I wanted to ask if cudaMemset(…), actually initialises an array on the device by calling a kernel and doing it in parallel, or does it use the PCI E bus to copy elements individually to the device array.

Although the latter makes little sense to do, I could not find the specifics mentioned anywhere.

If anyone knows how cudaMemset(…) is implemented, if you would please shed some light on this.

Thank you.

Robert_Crovella · April 3, 2023, 7:51pm

Yes, typically it calls a kernel. This is because it is far more efficient (i.e. faster) to do it this way than to copy the entire buffer over the PCIE bus. The kernel can access device memory at speeds typically in excess of 100GB/s. The PCIE bus speed varies by generation but may be in the 6GB/s to 25GB/s range, currently (or 50GB/s for Gen5, I guess.)

This isn’t documented anywhere that I know of, its an implementation detail. You can inspect the behavior yourself with a GPU profiler.

system · April 17, 2023, 7:51pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
fastest way to initialise large arrays cudaMemset v cudaMemcpyDeviceToDevice CUDA Programming and Performance	7	17890	March 22, 2011
Memset? CUDA Programming and Performance	9	1071	June 17, 2024
cudaMemset() CUDA Programming and Performance	6	19658	November 26, 2009
cudaMemset question CUDA Programming and Performance	2	8370	October 29, 2008
cudaMemset too slow on Xavier Jetson AGX Xavier cuda	6	1203	October 18, 2021
cudaMemset run by kernel or DMA (Direct memory access) CUDA Programming and Performance	2	845	July 24, 2015
cudaMemSet with streams expected a version of cudaMemSet for steams CUDA Programming and Performance	8	6243	September 16, 2010
Setting arrays to a value Float arrays CUDA Programming and Performance	2	16344	July 28, 2008
How to reset __device__ array? cudaMemset does not seem to work CUDA Programming and Performance	6	5537	March 9, 2010
cudaMemset bug cudaMemset, is it really so slow ?? CUDA Programming and Performance	1	4299	December 3, 2009

CudaMemset internally calls a kernel?

Related topics