Matlab GPU Sporadic Delay

Vectoror · January 21, 2016, 4:33am

I am facing a sporadic behavior in some of the matlab GPU toolkit algorithm. I have an NVidia GForce GTX 860M on my laptop. The problem is this: -

There is a delay of about one second in the first GPU computation as compared to subsequent ones. Since the first run is slower, I predicted cache preparations take might be the culprit. For CPU, this behavior is well known. But I am not sure why does the 1st run on GPU also gives a slower results. Will a CUDA implementation also have a delay in the first run? What do we call this problem? Is there any way to formally address the problem other than running the problem couple of times?

Thanks in advance!

CudaaduC · January 21, 2016, 4:53pm

This is a known issue with MATLAB, not CUDA. I have talked directly to the MATHWORKS people and they acknowledged the issue but did not offer a fix.

Are you calling a CUDA through a mex interface or using MATLAB’s built in GPU functionality?

After that first call the overhead of a CUDA mex file any additional latency becomes very small, so at worst you just have to get past the first ‘initialization’ call.

Vectoror · January 21, 2016, 11:12pm

Thanks for the reply CudaaduC!

I am if I need to formally mention this aberration in my literature, what should I say? Can we say it is due to the “cache warm-up”?

I am using the built in GPU functionality for now. However, I intend to write my own CUDA code to avoid such variations. I hope this I can avoid the this fluctuations if I write my own code, any heads-up for the bad practice?

Thanks for the suggestion, I will not consider the first run in my results then.

njuffa · January 21, 2016, 11:45pm

I do not use MATLAB, but as a guess the “hickup” on first GPU use is likely some kind of first-use software initialization overhead, rather than anything related to caches in particular.

I will note that CUDA itself, as a stateful software layer, has an initialization overhead caused by context creation. This can pe particularly noticeable in systems with large memory or when a significant amount of JIT compilation takes place. Since context initialization is usually lazy, triggered by the first CUDA API call, often a cudaMalloc(), it may be convenient to trigger this event explicitly at an opportune time, by a call to cudaFree(0) for example.

Topic		Replies	Views
the mex function first call slow CUDA Programming and Performance	11	8301	October 25, 2010
Is there a sort of access time to the card? CUDA Programming and Performance	2	615	July 31, 2011
CUDA Application Startup Speed on Different Cards CUDA Programming and Performance	2	690	September 2, 2014
well how do I know if cuda runs on the gpu CUDA Programming and Performance	20	13064	July 9, 2008
Why is this slow CUDA Programming and Performance	7	3729	February 7, 2012
GPU is slower than CPU CUDA Programming and Performance	7	17563	August 10, 2017
GPU running time is not stable CUDA Programming and Performance	5	3055	April 24, 2010
Performance first execution First execution very very very slow, next execution OK CUDA Programming and Performance	3	2967	October 17, 2009
CudaMalloc is taking huge time for first time, How to overcome this issue CUDA Programming and Performance cuda	1	1012	April 12, 2021
CUDA slower than CPU? CUDA Programming and Performance	7	768	August 18, 2023

Matlab GPU Sporadic Delay

Related topics