a 512x512 matrix is probably not big enough to get benefit from CUDA when you include the datatransfer.
I have bad results too. In my code the initialisation of cublas needs the most time (0.4 sec !!!).
I dont know (and I really want to know) how to avoid cublasInit() at each function call from matlab.
since i have a multiple calls.
mexFunction( int nlhs, mxArray *plhs, int nrhs, const mxArray *prhs)
cublasInit(); ... return;
[codebox]while(1); myf(arg); end;[/codebox]
Can anybody help?
Add a static variable to your mex and check if it is the first time you are calling it.
static int initializeCublas=1;
thanx; but unfortunatly it does not work;
I have a simple cublas call (1 times):
int m = cublasIsamax(9, A_d, 1);
it still needs 0.4 sec less if i delete this line.
If you’re operating mostly in Matlab but want to do some specialized code directly in CUDA, you can use a new feature of AccelerEyes’ Jacket as an interface. Jacket then takes care of the memory transfers and initialization costs. Check out the example “Integrate Custom CUDA Functions” at www.accelereyes.com.