Matlab GPU Sporadic Delay

I am facing a sporadic behavior in some of the matlab GPU toolkit algorithm. I have an NVidia GForce GTX 860M on my laptop. The problem is this: -

There is a delay of about one second in the first GPU computation as compared to subsequent ones. Since the first run is slower, I predicted cache preparations take might be the culprit. For CPU, this behavior is well known. But I am not sure why does the 1st run on GPU also gives a slower results. Will a CUDA implementation also have a delay in the first run? What do we call this problem? Is there any way to formally address the problem other than running the problem couple of times?

Thanks in advance!

This is a known issue with MATLAB, not CUDA. I have talked directly to the MATHWORKS people and they acknowledged the issue but did not offer a fix.

Are you calling a CUDA through a mex interface or using MATLAB’s built in GPU functionality?

After that first call the overhead of a CUDA mex file any additional latency becomes very small, so at worst you just have to get past the first ‘initialization’ call.

Thanks for the reply CudaaduC!

I am if I need to formally mention this aberration in my literature, what should I say? Can we say it is due to the “cache warm-up”?

I am using the built in GPU functionality for now. However, I intend to write my own CUDA code to avoid such variations. I hope this I can avoid the this fluctuations if I write my own code, any heads-up for the bad practice?

Thanks for the suggestion, I will not consider the first run in my results then.

I do not use MATLAB, but as a guess the “hickup” on first GPU use is likely some kind of first-use software initialization overhead, rather than anything related to caches in particular.

I will note that CUDA itself, as a stateful software layer, has an initialization overhead caused by context creation. This can pe particularly noticeable in systems with large memory or when a significant amount of JIT compilation takes place. Since context initialization is usually lazy, triggered by the first CUDA API call, often a cudaMalloc(), it may be convenient to trigger this event explicitly at an opportune time, by a call to cudaFree(0) for example.