It looks like cvt.sat.f64.f64 in addition to clamping double to the range [0, 1] as supposed to do ALSO MISTAKENLY PERFORMS THE BINARIZATION OF THE ARGUMENT (JUST LOOK AT THE PIXEL COLORS BELOW). For example:
For -1 we get 0;
For -0.5 we get 0;
For 0 we get 0;
For 0.25 we get 0; (instead of 0.25)
For 0.75 we get 1; (instead of 0.75)
For 1 we get 1;
For 1.5 we get 1;
I have RTX 4070, OptiX 8.0.0, CUDA 12.4.1 and Windows 10.
It looks like the problem is connected to the JIT compilation by the OptiX function optixModuleCreate. I asked question on the general CUDA forum. The people wrote simple piece of code with cvt.sat.f64.f64 and it worked as supposed to do.
and that works perfectly fine. I don’t have clue what’s going on, since the cvt.sat.f64.f64 theoretically speaking should be equivalent to the code with setp.lt.f64, setp.gt.f64 and self.f64 and it shouldn’t perform the “binarization”.
I tried to increase the number of registers in the module compile options as well as the optimization level. Nothing works. Is it possible that optixModuleCreate wrongly translates cvt.sat.f64.f64 into the SASS code? Now the most interesting part… . cvt.sat.f64.f64 works perfectly fine in “pure” CUDA. It looks like the PTX → SASS translation by the optixModuleCreate is messing it up. Check out the discussion on the topic on the general forum:
Thank You from Your answer. As for OptiX IR, the first time I learned about it was from Your post. On the other hand, as far as my specs are concerned, I have RTX 4070, OptiX 8.0.0, CUDA 12.4.1, VS 2019 Enterprise, Windows 10 Professional and 560.96 NVIDIA GPU Driver. Since I am now performing some piece of scientific computation, so far I haven’t considered upgrading my CUDA drivers because I am affraid evething I am working on will mess up. But upgrading GPU driver may be a very good idea since the OptiX is embedded in the drivers.
Thank You one more time for Your help. It’s good to know it’s not my code’s fault… . I will surely consider switching to OptiX IR but since I am new to OptiX in general, it will take me some time to learn this stuff. Actually I am compiling my shaders in somewhat primitive manner by setting the *.ptx file as the output of my compilation and then loading it manually from the compiled *.ptx file using fread function to finally pass the NULL terminated string containing the code as the argument of the optix_ModuleCreate function.
Just a small update to close out this thread, the bug was identified and fixed a few weeks ago. The fix is scheduled to roll out with the r580 drivers… which is at least a couple of months out. Thanks for reporting it!