Execution time not stable And incerasing the shared memory degraded the performance

Hello,

I have observed that most of the time my kernel reports different execution time. I am using CUDA events for timing informations. This variation some time goes to 5 to 10%. Any idea why this is so?

Also, when I increased the amount of shared memory by a factor of 5 to 10, the performance degraded drastically, as bad as 300 times poor performance. I did not get any error saying that amount of shared memory left is insufficient. I am wondering why the increased amount of shared memory would degrade the performance. Would anybody tell me why it is so?

Thanks in advance

Is that variation 10% of 0.1ms or 10% of 10 seconds?

Occupancy. More shared memory = fewer blocks per MP. The how and why are explained in section 5.1 of the programming guide.