I’ve been doing some timings of my 2 kernels. I’m calling 2 kernels one after the other. I’m using cudaThreadSynchronize() at the beginning of the first kernel and another time at the end of the second kernel. I’m averaging on 10,000 iterations (I don’t take into account the first 10,000). Here are a few statistics:
0.2% are below 0.81ms
30% are below 0.83ms
92% are below 0.87ms
7.4% are above 0.9ms
6.5% are above 1ms
2% are above 1.5ms
I was wondering if these variations in the timings were expected. Did anyone experience this?
Being above 1.1 x average approximately 7% of the time is quite fair I reckon. But why does the maximum reach more than 3ms?!? My application needs to run at 1,000Hz. So I need to stay below 1ms at all iterations. Because of these peaks, my rendering is a bit jerky (I compute the positions of some vertices with these kernels, that I render afterwards). I don’t think I’m doing something wrong.
It’s worth noting that I also tried to decrease the size of my model. With a smaller model, I notice approximately the same figures (average 0.32ms, 6% above 0.4ms, maximum 1.2ms).
Could I fix that? Or is it normal? And if it’s normal, perhaps someone from NVIDIA can explain me from where does it come from and if this irregularity will be fixed in a next release. That would be great.