I did a code wich takes most of the time 2.000000 ms but ocasionally it takes around 40.00000ms the code is something like:
func::::
8 diferent memsets
kernel 1
kernel 2
for {
kernel 3
kernel 4
kernel 5
kernel 6
}
this happens after a few runs of the function each kernel or memset call (all else comented) inside or outside individually the for gives this behavior. Im running this function 5 times each for 7 cicles and around 200 times.
I scratched floating point issues since even the memsets do this behavior any sugestion on what may be the problem?