Working on Floats as Integers Tips needed

Hi, I am looking for a straight forward way to work direclty on the byte-level representation of floats, in order to perform some simple integer arithmetic operations on them. If a is a floating point number , then I would perform some integer arithmetics on a1=int(a), where a1 is represented by the same 32 bits as a. In my case, each thread would load one float a, perform some floating point operations in the first phase, and then switch to integer view and perform some integer arithmetic operations. What would be the best way to do something like that?

Further, I’ve read that CUDA’s float is not fully IEEE compatible. How much does it influence the arithmetic operations, and can it be ensured that the following operations would give exact same value for all variables:
float A, B
float C=B-A
Transfer A to PC
Transfer C to PC
HOST PC(Intel): float B=A+C

You cannot simply change a float into an integer. The bit representation is very different.
See Float representation
Therefore, casting from float to int requires a bit representation change:
Casting in C/C++
If you would like to perform int bit operations, you need to store the value in a separate variable.

Hm, yes I know, but I don’t think that would do the job. Let me try to explain it a bit better:I would like to treat bits of a floating point number as though they were representing e.g. unsigned int, then perform some int op on them.

The fact that the integer value that I’m working on is not what I would get when casting float to int, isn’t important for what I need to do.

So basically I would like to load a set of 32-bit floats from the global memory to the shared memory, perform some floating point arithmetics operations on them, get 32-bit float result c. When I get 32-bit float c, I would like to ignore its real value, and work further on it as its 32-bits were representing an integer instead.

Ah, is this what you need: float as bits

To access a float as an integer simply cast the pointer of the float to an integer:

float x;

int i=*( (int*) &x );

Don’t worry about IEEE compliance. In most cases (B-A)+A won’t equal B anyway. No matter if you use double or float as datatype and no matter if you compute on your GPU, your CPU or something else (with finite precision). If you want to know how precise CUDA can calculate see Appendix A.2 and B.1 in the CUDA Programming guide.

Lookup __float_as_int and __int_as_float in the CUDA programming guide. They are nicer looking than ugly pointer tricks, and don’t break strict aliasing rules (it may not be a problem in nvcc on the GPU, but I know of one code using the pointer trick that computes incorrectly when compiled with gcc and optimizations on).

With a bit of notational inconvenience, all you need to do is define an union holding and either an integer or a float. Something like this

typedef union
int i;
float f;

will do the trick. If the variable mx is declared like this,

mixed mx;

the nvcc compiler should be clever enough the keep mx in a register, no matter if is accessed as mx.i or as mx.f

Thanks a lot for your tips! I’ll try it out now.

Do you maybe know which method could be used for writing variable length of bits e.g 9 to the global memory?

What do you mean by writing 9 bits to memory? Would that be writing one whole byte and updating just bit 0 in the next byte, but leaving the rest of the bits unchanged? I’m not aware of any memory system that lets you do this in one operation. You would have to read the partial byte, update it, and write it back to global memory. Can you queue up bits and write them in batches which are a multiple of 8?

You should definitely try to queue up at least 4 bytes before writing to global memory (even more if possible using shared memory so you can coalesce the write). But why the need to write only 9 bits? You need to write packed values for some codec or so?

yes, right :) I’m working on packaging the bits into integers for writing them into GM right now. However, the shared memory usage in this constellation seems to be quite high, so I’m looking for some ways to decrease it at the expense of performance.