I have a question that I am hoping someone might be able to answer:
I am using a For Loop to execute a kernel multiple times, for the first 16 iterations, the processing time is consistent with what the average time should be. However, after the 16th iteration, the processing time jumps significantly, ie from 20 micro seconds to 5 milliseconds. This continues to occur even though I am using multiple kernels…after the 16th kernel call, the processing time sky rockets!
Can someone explain why this is happening, your time is greatly appreciated!