It looks like cvt.sat.f64.f64 in addition to clamping double to the range [0, 1] as supposed to do ALSO MISTAKENLY PERFORMS THE BINARIZATION OF THE ARGUMENT. For example:
For -1 we get 0;
For -0.5 we get 0;
For 0 we get 0;
For 0.25 we get 0; (instead of 0.25)
For 0.75 we get 1; (instead of 0.75)
For 1 we get 1;
For 1.5 we get 1;
I built the following on Windows 10 with CUDA 12.3 using nvcc -arch=sm_75 -o cvt_sat_f64_f64.exe cvt_sat_f64_f64.cu
#include <stdio.h>
#include <stdlib.h>
__global__ void kernel (double a, double *r)
{
asm ("cvt.sat.f64.f64 %0,%0;\n\t" : "+d"(a));
*r = a;
}
int main (void)
{
double a, r, *r_d = 0;
cudaMalloc ((void**)&r_d, sizeof (r_d[0]));
for (a = -1.2; a <= 1.2; a += 0.1) {
kernel<<<1,1>>>(a, r_d);
cudaMemcpy (&r, r_d, sizeof r, cudaMemcpyDeviceToHost);
printf ("a = % 23.16e sat(a)=% 23.16e\n", a, r);
}
cudaFree (r_d);
return (EXIT_SUCCESS);
}
The output of the above program is as expected:
a = -1.2000000000000000e+00 sat(a)= 0.0000000000000000e+00
a = -1.0999999999999999e+00 sat(a)= 0.0000000000000000e+00
a = -9.9999999999999989e-01 sat(a)= 0.0000000000000000e+00
a = -8.9999999999999991e-01 sat(a)= 0.0000000000000000e+00
a = -7.9999999999999993e-01 sat(a)= 0.0000000000000000e+00
a = -6.9999999999999996e-01 sat(a)= 0.0000000000000000e+00
a = -5.9999999999999998e-01 sat(a)= 0.0000000000000000e+00
a = -5.0000000000000000e-01 sat(a)= 0.0000000000000000e+00
a = -4.0000000000000002e-01 sat(a)= 0.0000000000000000e+00
a = -3.0000000000000004e-01 sat(a)= 0.0000000000000000e+00
a = -2.0000000000000004e-01 sat(a)= 0.0000000000000000e+00
a = -1.0000000000000003e-01 sat(a)= 0.0000000000000000e+00
a = -2.7755575615628914e-17 sat(a)= 0.0000000000000000e+00
a = 9.9999999999999978e-02 sat(a)= 9.9999999999999978e-02
a = 1.9999999999999998e-01 sat(a)= 1.9999999999999998e-01
a = 2.9999999999999999e-01 sat(a)= 2.9999999999999999e-01
a = 4.0000000000000002e-01 sat(a)= 4.0000000000000002e-01
a = 5.0000000000000000e-01 sat(a)= 5.0000000000000000e-01
a = 5.9999999999999998e-01 sat(a)= 5.9999999999999998e-01
a = 6.9999999999999996e-01 sat(a)= 6.9999999999999996e-01
a = 7.9999999999999993e-01 sat(a)= 7.9999999999999993e-01
a = 8.9999999999999991e-01 sat(a)= 8.9999999999999991e-01
a = 9.9999999999999989e-01 sat(a)= 9.9999999999999989e-01
a = 1.0999999999999999e+00 sat(a)= 1.0000000000000000e+00
a = 1.2000000000000000e+00 sat(a)= 1.0000000000000000e+00
This is the closest I can approximate your platform configuration at this time. If you post code for others to build and run, we may get more data points. As the holidays are imminent, I would expect forum participation to be extremely low, though.
and that works perfectly fine. I don’t have clue what’s going on, since the cvt.sat.f64.f64 theoretically speaking should be equivalent to the code with setp.lt.f64, setp.gt.f64 and self.f64 and it shouldn’t perform the “binarization”.
In your environment, when does PTX get translated to machine code (SASS), and what software component performs this translation? Obviously when I am running nvcc it runs the offline version of ptxas to compile PTX into SASS and I can then inspect the resulting SASS with cuobjdump --dump-sass.
Thanks for showing the compilation step. Unfortunately I wouldn’t know what that really does and how things could go wrong. Any chance this JIT-compilation step is misconfigured in some way?
The people in the OptiX sub-forum might have more insights, given that this is not a pure CUDA scenario.
Could you extract a direct example of code using fmin and/or fmax with a specific argument giving a wrong result (like a one-liner), which could be manually reasoned about?
R = MIN_R(MAX_R(((REAL_R)R), ((REAL_R)0)), ((REAL_R)1));
G = MIN_R(MAX_R(((REAL_R)G), ((REAL_R)0)), ((REAL_R)1));
B = MIN_R(MAX_R(((REAL_R)B), ((REAL_R)0)), ((REAL_R)1));
if (fmax(0.25, 0.0) == 0.0)
R = G = B = 1.0;
else
if (fmax(0.25, 0.0) == 0.25)
R = G = B = 0.0;
else
R = G = B = 0.5;
0.25 has an exact representation in float and double, so a comparison with no tolerance (epsilon) should work.
According to your post, the result would be white.
If not (e.g. black), find out, where the difference lies, e.g. change to fmin(fmax(…)) with numbers, and so on.