Strange global memory behaviour


how can it be, that two global memory assigns have completely different timings? I wrote a simple raytracer based on cuda and if I do all the raytrace computations and assign the final calculated color value to a global pixel array it takes about 1,5 secs. On the other hand, if I assign a constant color value to the global array it just takes about 10 ms. That seems strange to me. Is cuda optimizing code in some way?

Thanks in advance

The compiler is probably smart enough to see that you’re not using the intermediate values you calculated so it decides not to perform them.