Hi,
i have a simple code and i want to prevent the compiler from unrolling the 2 loops so i add #pragama unroll 1. I get the running time of 8seconds. Then i do unroll the inner one
by doing #praga unroll. Again i get 8 seconds. If i unroll both i get again 8 s. If i get rid of the pragmas i get the same again.
Can anyone tell me why i cant stop (prevent) compiler from unrolling. I am using CUDA 4.0 and compute capability 2.0. Card is Gefore GTX 470.
They are not. I have tested the same code on my laptop and it works. On my desktop, its not. But on my laptop i have computability 2.1 and version of CUDA 3.2. Dont know whether it makes any difference.