Can I set a floats to zero with cudaMemset?

Hi there,

I am setting an array of floats or doubles to zero, using

double* dev_ptr;
// allocate memory
cudaMemset(dev_ptr, 0, array_size * sizeof(double));

This works fine. But as far as I understand, this does not always work with the C version (memset) as not all compilers implement floating point numbers to be of zero value when all bits are zero (IEEE 754 compliance, discussed e.g. here:
There is a guide on IEEE 754 compliance and CUDA and I couldn’t find anything about zero representation of floats. I just want to be sure that what I am doing will give be expected results (setting the floats to zero) on any CUDA GPU? Code portibility is important here.

I know similar questions have been asked before but none wanted to set floats to zero (only floats to non-zero or int to zero).

TL;DR Yes, you can.

Floating-point arithmetic on the GPU adheres to IEEE-754, so a pattern of all 0s represents a floating-point zero. Furthermore, all host platforms supported by CUDA use IEEE-754 floating point arithmetic, and a pattern of all 0s represents a floating-point zero there, too.

Even if one goes back to pre-IEEE-754 floating-point arithmetic, like the DEC VAX, or Turbo Pascal, or Microsoft Basic, a pattern of all zeros represents a floating-point zero. There are corners of the internet where the language lawyers hang out that rush to point out what is not guaranteed by the C++ standard. In practice, as someone who has programmed for 40 years now, I cannot recall any platform where a pattern of all zeros does not or did not represent a floating-point zero. Such environments may exist somewhere, but are not relevant here.

Side remark: For CUDA programming you can also use memset with a pattern of 0xff to initialize floating-point data with NaNs, which can sometimes help in debugging.

Generally speaking, outside the realm of debugging or test programs, I do not consider initializing floating-point data (or any other data, for that matter) with memset as a recommendable practice.

Hi njuffa,

thanks for clarification on the 0s representation!

So what would be a situation where you use NaNs for debugging? Initialize with NaNs, run a kernel and test if there are still NaNs where there shouldn’t be any?

Are you saying, I shouldn’t be using cudaMemset at all? My use case is: I simulate a network through time. At each time step, I have to run a bunch of kernels. Before one kernel, I need to reset a global memory array (float/double) to all 0s. I use cudaMemset. I mean I could write my own kernel or use thrust::fill_n instead, but cudaMemset seemed more straight forward?

I am saying that I find bulk initialization (using memset or any other way) to be a suspicious design pattern, as it is hardly ever necessary. It can also be wasteful from a performance perspective since it is performed as a separate step.

The benefit of NaNs is that they are propagated through almost all floating-point operations, so they are sometimes useful as a quick check that a piece of code hasn’t picked up any data that it wasn’t supposed to pick up.