CUDA slows Matlab down After GPU computation Matlab does not use all 4 processors

Radko · March 4, 2009, 10:33am

I am having trouble after I do some computation on my GPU, and then want to process data in Matlab. This is the setup:

In Matlab I call a mex function, which invokes several kernels, and loads resulting data from GPU back to Matlab;
I run matrix inversion in Matlab.

The problem is that the matrix inversion no longer runs on all 4 processors of my QuadCore (which it normally does). In Windows Task Manager I can see that the inversion runs only on one processor, the other three are idling. The result is a long computation time.

I am using CUDA 2.1, Matlab 2007a, Intel Core2 Quad, GeForce GTX 280

Does anyone have any idea what can be causing this??

E.D_Riedijk · March 4, 2009, 2:31pm

did you change your preferences in Matlab?
did you change matlab version?

I see big differences between matlab versions in how good the calculations are distributed over CPUs.

On quadcore with FC8, 2007b I get all 4 cores utilized

On VMware VM with 4 cores, Centos 5.2, 2008b, I get only 1 core utilized.

Radko · March 5, 2009, 6:24am

If there are no calls to GPU (or prior to any calls to GPU), matrix inversion in my Matlab utilizes all four processors. So the preferences regarding multithreading seem to be set correctly. However, I tried to play with them. I made the GPU call (after which the matrix inversion ran on one processor only), and then I tried to disable multithreading and enable it back, or change the number of threads Matlab should use, but to avail. Only one processor was utilized. It was only after I closed Matlab and opened it again, the matrix inversion came back to running on all four processors.

I have not tried different versions of Matlab though as I have only have 2007a.

Radko · March 7, 2009, 11:06am

Today I tried with Matlab 2008a. The behavior was the same. I tried to set maxNumCompThreads(4) after the CUDA call, but with no improvement. It seems that calling cudaMalloc() is enough to cause this. Is anyone else experiencing this or is it just me??

E.D_Riedijk · March 7, 2009, 11:57am

Well, this looks like a matlab problem, so I would suggest asking the mathworks tech support (they are really good)

Radko · March 7, 2009, 1:51pm

Actually, I would say it is a CUDA problem.

I just figured out how to solve it. It would seem that CUDA calls are changing processor affinity of the process, so that it runs on a single processor. Once you change the affinity to its original state, everything is OK (i.e. my matrix inversion gets back to running on all four processors).

This is how I did it:

viod mexFunction(...) {

HANDLE h=GetCurrentProcess();

DWORD_PTR PM,SM;

GetProcessAffinityMask(h,&PM,&SM);

.

.

.

CUDA calls

.

.

.

SetProcessAffinityMask(h,PM);

}

The question now is whether this can be considered a bug. It somehow does not strike me as an expected behavior…

Radko · March 7, 2009, 1:54pm

BTW, I might have forgotten to mention that I use WinXPx64.

E.D_Riedijk · March 7, 2009, 9:01pm

Hmm, this might be because certain CUDA calls will spin the processor while a kernel is still running. Doing so prevents the process from migrating constantly.

You are using the runtime API? It might be that this does not happen with the driver API. Also which version of CUDA are you using?

I have a simulation that used all 4 cores (and also used CUDA), that is now only running on 2 cores. And I think nothing changed apart from the CUDA version (I did not recompile the .mexa64 file)

I am on linux64.

Radko · March 8, 2009, 7:09am

I am using runtime API and CUDA 2.1.

The change of the affinity, however, happens already after cudaMalloc call. My personal understanding is that the higher processor usage appears only during the memory transfers (cudaMemcpy), when the processor copies data between one of the page locked memories, and the host memory where a user has or want to get their data (I am not 100% sure though).

It may be that for some reason it is convenient to pin the process with CUDA calls to one processor; however, I would expect that after CUDA is done, it would change the affinity mask back to its original state. Perhaps, this is how it is done in the Linux driver. nVidia people would be more fit to explain.

tmurray · March 8, 2009, 7:22am

As far as I know the driver does nothing to change the affinity, but I’ll investigate this more soon.

tmurray · March 9, 2009, 6:15pm

Thus far, unable to repro. What driver versions are being used?

tmurray · March 9, 2009, 6:17pm

Also, if you have CUDA_PROFILE=1, that will force affinity to a single core.

Radko · March 9, 2009, 9:22pm

Yep, CUDA_PROFILE=1 was the culprit. After setting CUDA_PROFILE=0, the affinity was not changed (and Matlab is crunching happily on all four cores).

Thank you for clarifying this.

tmurray · March 9, 2009, 10:19pm

hooray, glad that’s all it was

friedo · March 10, 2009, 8:12am

For me the same! Thanks for the hint! External Media

Topic		Replies	Views
Matlab and CUDA: A Tutorial Very basic, zero-order introduction CUDA Programming and Performance	17	55775	February 12, 2010
Help with CUDA+Matlab acceleration CUDA Programming and Performance	24	14493	December 17, 2008
Unusual delays does anyone recognize this pattern... CUDA Programming and Performance	9	1709	May 7, 2009
Matlab & CUDA Cuda scripts executed from Matlab CUDA Programming and Performance	14	10289	September 9, 2015
Is CUDA really that fast? CUDA Programming and Performance	17	11721	September 21, 2009
CUDA slower than MATLAB... again I can't get the simplest examples to show any speed-up using GP CUDA Programming and Performance	5	2518	February 18, 2011
CUDA Double Precision with log() and exp() INEXACT RESULTS Lack of precision in Log and Exp function CUDA Programming and Performance	16	5609	December 15, 2009
MATLAB examples on Linux Speed does not seem to scale with graphics card capabilities CUDA Programming and Performance	2	4554	May 26, 2009
Different running time on same GPU with same code CUDA Programming and Performance	10	2617	March 22, 2014
Matlab 2007b and CUDA Can't compile CUDA example with Matlab CUDA Programming and Performance	19	20103	June 25, 2009

CUDA slows Matlab down After GPU computation Matlab does not use all 4 processors

Related topics