Trouble with converting floats to ints in CUDA.

What I’m trying to do is this:

I have an array of floating point values. These values need to be “continuous” (hence I can’t store integers instead), but I need to use them to index into another array.

In standard C I would do something like this:

(int) floor(value + 0.5f);

Converting 1.56 to 2, 1.04 to 1, etc. Allowing me to use them to index into an array.

In CUDA, however, the same (using floorf) doesn’t seem to provide the same results as standard C. Looking through the programming guide, I’ve also tried __float2int_rn (and _rz) as well as __float_as_int. Neither provided the same results either.

Any suggestions on how to mirror the functionality would be greatly appreciated!


Just to provide some further information, both results match up if I drop the addition of 0.5f in the floor function call:

(int) floor(value);
(int) floorf(value);

However, as soon as I add in + 0.5f to both the C and CUDA code, it breaks. I’m not sure if I’m missing something in CUDA, or something fundamental to floating point numbers here…

You do realize that your little snippet of host code you posted is truncating a double precision value to integer and not a single precision value? That might make a considerable difference depending on how (value + 0.5f) compares to (value + 0.5). Also by casting the result of the floor operation to an integer, there is the possibility that the single precision result floor(value + 0.5f) might not be exactly representable as a single precision value, and the cast might not yield the correct result. The CUDA float2int functions are IEEE compliant and should be correct. Can you post some actual results?

Sorry, I missed the f in my little C code snippet.

I will post up some results tomorrow. Thanks for the comments :)

Alright, with a fresh mind this morning I took another look.

The problem doesn’t appear to be with the conversion from float to int. After writing a quick test, the GPU results always match up with the CPU results regardless of whether I use floorf(val + 0.5f) or __float2int_rn(val).

The problem seems to lie in my code where I index into an array using these converted values. It doesn’t appear to be a problem with the way I’m calculating an index from a high level, as floorf(val) correctly indexes to the same location on the GPU as floor(val) does on the CPU.

However, even though floorf(val + 0.5f) (or __float2int_rn(val + 0.5f)) resolves to the same integer as floor(val + 0.5f) in my aforementioned tests, I seem to have issues in the indexing regardless.

In any case, this is something I’ve gotta dig into and work out. I just thought I’d post to say I worked out the converted floats to int problem (which, well, never existed!) since you were kind enough to post some help to begin with.