Help with CUDA+Matlab acceleration

a 512x512 matrix is probably not big enough to get benefit from CUDA when you include the datatransfer.

I have bad results too. In my code the initialisation of cublas needs the most time (0.4 sec !!!).

I dont know (and I really want to know) how to avoid cublasInit() at each function call from matlab.

since i have a multiple calls.



#include “mex.h”

#include “cublas.h”


mexFunction( int nlhs, mxArray *plhs, int nrhs, const mxArray *prhs)

{ …





and myfcall.m:

[codebox]while(1); myf(arg); end;[/codebox]

Can anybody help?

Add a static variable to your mex and check if it is the first time you are calling it.

static int initializeCublas=1;


thanx; but unfortunatly it does not work;

I have a simple cublas call (1 times):

int m = cublasIsamax(9, A_d, 1);

it still needs 0.4 sec less if i delete this line.

If you’re operating mostly in Matlab but want to do some specialized code directly in CUDA, you can use a new feature of AccelerEyes’ Jacket as an interface. Jacket then takes care of the memory transfers and initialization costs. Check out the example “Integrate Custom CUDA Functions” at