how can it be, that two global memory assigns have completely different timings? I wrote a simple raytracer based on cuda and if I do all the raytrace computations and assign the final calculated color value to a global pixel array it takes about 1,5 secs. On the other hand, if I assign a constant color value to the global array it just takes about 10 ms. That seems strange to me. Is cuda optimizing code in some way?
Thanks in advance