I’m currently working on a small and simple Cuda interval library (a bachelor thesis) and need some help with the rounding modes.
The Programming Guide states:
and later on in the appendix:
My question is, how can I “statically” set the rounding mode to “round-towards-zero”?
I know, there are C intrinsics with all 4 rounding modes. Yet being intrinsics/functions they are very slow.
And a little bit off-topic:
I tried to multiply large float numbers with __fmul_rz,
e.g. __fmul_rz(2x10^32, 2x10^32)
and the expected result should be +infinity, however the result is the number “below infinity”,
i.e the highest possible float: 3.4028235x10^38
It seems as __fmulr_rz would “round down” infinity. I’m not sure if this is my fault or not, because I am using the 3.0 SDK Debug Emulator (which is deprecated)
Thanks for any help!