CUDA erratic behavior, Code total times variates

I did a code wich takes most of the time 2.000000 ms but ocasionally it takes around 40.00000ms the code is something like:


8 diferent memsets

kernel 1

kernel 2

for {

kernel 3

kernel 4

kernel 5

kernel 6


this happens after a few runs of the function each kernel or memset call (all else comented) inside or outside individually the for gives this behavior. Im running this function 5 times each for 7 cicles and around 200 times.

I scratched floating point issues since even the memsets do this behavior any sugestion on what may be the problem?

Are you doing error checking after all of your kerenel/memory calls?

This thread is a duplicate:

Closing this one and stiking to the other just noticed I got an error in creation and assumed that this one wasn’t created, thanks for the pointing that!