i have a simple code and i want to prevent the compiler from unrolling the 2 loops so i add #pragama unroll 1. I get the running time of 8seconds. Then i do unroll the inner one
by doing #praga unroll. Again i get 8 seconds. If i unroll both i get again 8 s. If i get rid of the pragmas i get the same again.
Can anyone tell me why i cant stop (prevent) compiler from unrolling. I am using CUDA 4.0 and compute capability 2.0. Card is Gefore GTX 470.