Setting arrays to a value Float arrays

senorbum · July 11, 2008, 1:40pm

So from my understanding setting all elements in a float array can’t be done by cudaMemset since it is only safe for integers. So the question becomes, what is the best way of setting these elements all to one number? Say I need an array to be reset every time I iterate through a loop, and the array needs to be set to something like 1 or 0 or whatever number. Does it make sense to invoke a kernel call and design a simple kernel to help with this process? Or would it be faster if I know the element is going to be the same every time (say 1.0f) to make an array in gpu global and then just memcpy from device to device, since xfer time is very fast here?

Example:

//call only once before iteration begins.

initVector<<<dimGrid,dimBlock>>>(gpuZero, 0, nColumns);

for(int i = 0; i < nColumns; i++){ 

	cudaMemcpy(gpuTemp, gpuZero, nColumns*sizeof(float), cudaMemcpyDeviceToDevice);

	cudaMemcpy(gpuVecW, gpuZero, nColumns*sizeof(float), cudaMemcpyDeviceToDevice);

// do a bunch of computations //

}

vs.

for(int i = 0; i < nColumns; i++){  

	initVector<<<dimGrid,dimBlock>>>(gpuTemp, 0, nColumns);;

	initVector<<<dimGrid,dimBlock>>>(gpuVecW, 0, nColumns);

// do a bunch of computations //

}

I would think the memcpy would be faster, but I don’t know. Also, are there other ways that I’m not thinking about?

e.ping · July 28, 2008, 5:13pm

You can use cudaMemcpy() for the device to device copy if you have enough extra GPU memory to allocate a third buffer for the zeroes. You can use cudaMemset() if you want to initialize to 0.0f:

cudaMemset((void*) d_buffer, 0, numFloats*sizeof(float));

but it’s really slow. A small GPU kernel will do the initialization much faster. I wrote a small test program and computed the equivalent memory bandwidth of cudaMemset() and the GPU kernel that fills a float array with a constant value. On my GTX280 using Cuda 1.1:

buffer                     GPU 

  size    num     memset   kernel

  (MB)   floats   (GB/s)   (GB/s)

   0.1     25k     4.0      6.5

   0.3     75k     6.1     17.2

   1.0    250k     7.5     37.1

   3.0    750k     8.0     55.9

  10.0    2.5M     8.2     67.1

  30.0    7.5M     8.3     69.2

100.0     25M     8.3     65.1

For large buffers the GPU kernel for initialization is giving about 60% of the device to device bandwidth as reported by bandwidthTest (116 GB/s).

senorbum · July 28, 2008, 6:01pm

You can use cudaMemcpy() for the device to device copy if you have enough extra GPU memory to allocate a third buffer for the zeroes. You can use cudaMemset() if you want to initialize to 0.0f:
cudaMemset((void*) d_buffer, 0, numFloats*sizeof(float));
but it’s really slow. A small GPU kernel will do the initialization much faster. I wrote a small test program and computed the equivalent memory bandwidth of cudaMemset() and the GPU kernel that fills a float array with a constant value. On my GTX280 using Cuda 1.1:
buffer                     GPU 

  size    num     memset   kernel

  (MB)   floats   (GB/s)   (GB/s)

   0.1     25k     4.0      6.5

   0.3     75k     6.1     17.2

   1.0    250k     7.5     37.1

   3.0    750k     8.0     55.9

  10.0    2.5M     8.2     67.1

  30.0    7.5M     8.3     69.2

100.0     25M     8.3     65.1
For large buffers the GPU kernel for initialization is giving about 60% of the device to device bandwidth as reported by bandwidthTest (116 GB/s).

[snapback]417195[/snapback]

Awesome. I ended up using a kernel when I had to and stored a vector with all elements set to 0 as I needed it frequently and had space for it. I then just cudaMemCpy’ed the 0 vector to where I needed it. My kernel was something like vector at [Idx] = value where Idx is the standard way to access elements in an array by unique thread/block ID and value was just the value I wanted the array initialized to. I didn’t realize that memset was that slow, and avoided it mostly because I had to initialize to other values than 0 occasionally so I thought I might as well use the same kernel for my 0 vectors.

Topic		Replies	Views
How to set all elements in a CUDA array to be zeros? CUDA Programming and Performance	6	46359	July 13, 2018
fastest way to initialise large arrays cudaMemset v cudaMemcpyDeviceToDevice CUDA Programming and Performance	7	17679	March 22, 2011
Any quick way to set an array to zeros? CUDA Programming and Performance	4	4376	June 12, 2008
Initilize Array to Zero CUDA Programming and Performance	3	6256	November 15, 2009
Int Array initialization CUDA Programming and Performance	5	3508	March 18, 2011
newbie question: cudaMemcpy2DArrayToArray CUDA Programming and Performance	9	16223	June 26, 2008
How to reset __device__ array? cudaMemset does not seem to work CUDA Programming and Performance	6	5451	March 9, 2010
Arrays on the GPU and CudaMemset() CUDA Programming and Performance	2	12118	June 10, 2010
cudaMemset() CUDA Programming and Performance	6	19404	November 26, 2009
is it possible to Memset float values? CUDA Programming and Performance	3	15687	August 12, 2009

Setting arrays to a value Float arrays

Related topics