using CUDA in Matlab mex-Files (timing problem)

Hi,
I use mex-Files to replace some matlab functions.
I was surprised when I measured the time need for the calculation.
I use the “tic” and “toc” command to measure the time in Matlab (approximately 95ms).
The I measured the time in the mex-File with
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord( start, 0 )
cudaEventRecord( stop, 0 );
cudaEventSynchronize( stop );
cudaEventElapsedTime( &elapsedtime, start, stop );
and I measure about 3ms.
So i measured the time for a (nearly) empty mex-File with “tic”“toc” in the matlab script: 3.5ms.
Then I put a “cudaMalloc” in the mex-File and measured a time of over 70ms.

Has anyone an idea why it took so much time even if there is no calculation?
Did it take such a lon time to “initialize” the graphic card?

Another Problem is, that I want to split the mex-File into two parts. One part is called only once and ccontains the Copy of the data from the host to the device and the other mex-file is called in every iteration and contains the calculation and only one small copy of new data.
I read that the data on the graphic card is valid until ie close matlab.
But i dont know how to save the Adresses of the data on the graphic card to matlab and then load them into the next mex-File.
I have only experience in save the value of a variable in an mxArray but dont know how to put the address of the variable into an mxArray to transfer it into matlab.

I hope someone can give me a tip.

perhaps it might help:
I use the following Systems (with the same observations)
Vista64bit Core2Duo 2,7GHz Geforce8800GT Matlab2009b
Vista 32bit Core2Duo 2,1GHZ Geforce8400M GS Matlab2008a

And at the end: Sorry for my bad English.

Hi,
I use mex-Files to replace some matlab functions.
I was surprised when I measured the time need for the calculation.
I use the “tic” and “toc” command to measure the time in Matlab (approximately 95ms).
The I measured the time in the mex-File with
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord( start, 0 )
cudaEventRecord( stop, 0 );
cudaEventSynchronize( stop );
cudaEventElapsedTime( &elapsedtime, start, stop );
and I measure about 3ms.
So i measured the time for a (nearly) empty mex-File with “tic”“toc” in the matlab script: 3.5ms.
Then I put a “cudaMalloc” in the mex-File and measured a time of over 70ms.

Has anyone an idea why it took so much time even if there is no calculation?
Did it take such a lon time to “initialize” the graphic card?

Another Problem is, that I want to split the mex-File into two parts. One part is called only once and ccontains the Copy of the data from the host to the device and the other mex-file is called in every iteration and contains the calculation and only one small copy of new data.
I read that the data on the graphic card is valid until ie close matlab.
But i dont know how to save the Adresses of the data on the graphic card to matlab and then load them into the next mex-File.
I have only experience in save the value of a variable in an mxArray but dont know how to put the address of the variable into an mxArray to transfer it into matlab.

I hope someone can give me a tip.

perhaps it might help:
I use the following Systems (with the same observations)
Vista64bit Core2Duo 2,7GHz Geforce8800GT Matlab2009b
Vista 32bit Core2Duo 2,1GHZ Geforce8400M GS Matlab2008a

And at the end: Sorry for my bad English.

Great post! These are non-trivial issues that you face, which is exactly why Jacket is available to help (see http://www.accelereyes.com). If you already tried Jacket before, I’d love to hear thoughts on your experience and wonder why you decided not to continue with it.

Cheers!

Great post! These are non-trivial issues that you face, which is exactly why Jacket is available to help (see http://www.accelereyes.com). If you already tried Jacket before, I’d love to hear thoughts on your experience and wonder why you decided not to continue with it.

Cheers!

If you keep everything in 1 mex file and you do something like this:

[codebox]my_mex(‘setup’, data_setup)

while(1)

data_out = my_mex(‘step’, data_in);

end[/codebox]

You can use persistent variables in your mex file to keep the pointers to the data_setup data.

You can see in this thread (almost the last post) how to do it exactly: http://forums.nvidia.com/index.php?showtopic=70192

If you keep everything in 1 mex file and you do something like this:

[codebox]my_mex(‘setup’, data_setup)

while(1)

data_out = my_mex(‘step’, data_in);

end[/codebox]

You can use persistent variables in your mex file to keep the pointers to the data_setup data.

You can see in this thread (almost the last post) how to do it exactly: http://forums.nvidia.com/index.php?showtopic=70192

Hey piv is that you dude ?

(sorry for the noise et allez les verts)

– pium

Hey piv is that you dude ?

(sorry for the noise et allez les verts)

– pium

We just went live with a new blog post explaining more about why the Jacket SDK trumps writing your own MEX code.

The Jacket SDK Trumps Standalone MEX in MATLAB
[url=“http://blog.accelereyes.com/blog/2010/10/29/jacket_sdk_trumps_mex/”]http://blog.accelereyes.com/blog/2010/10/2...sdk_trumps_mex/[/url]

I totally understand using free stuff (even if it is a lot of work) rather than buying stuff. But I think this post makes it more evident the benefits side of buying stuff that isn’t immediately obvious :)

Enjoy!

We just went live with a new blog post explaining more about why the Jacket SDK trumps writing your own MEX code.

The Jacket SDK Trumps Standalone MEX in MATLAB
[url=“http://blog.accelereyes.com/blog/2010/10/29/jacket_sdk_trumps_mex/”]http://blog.accelereyes.com/blog/2010/10/2...sdk_trumps_mex/[/url]

I totally understand using free stuff (even if it is a lot of work) rather than buying stuff. But I think this post makes it more evident the benefits side of buying stuff that isn’t immediately obvious :)

Enjoy!