Windows 7 64bit, Matlab2009a/VS2008, CUDA is SLOW, WHY

I am running Widows 7 64 bit, CUDA 2.3, Matlab 2009a and VS2008 Professional on a Notebook with 6GB RAM with Nividia Quadro 770m. I downloaded Matlab_CUDA-1.1a.zip and following the instructions, speed_fft shows native Matlab is several times faster than CUDA code. After I compiled Szeta.cu, my test results for speed are opposite to the test results of the speed on the zip file. Is anything missing in my GPU configuration? thx.

The attachment is speed_fft’s output.
My log file is the following:

mex fft2_cuda.c -IC:\CUDA\include -LC:\CUDA\lib64 -lcudart -lcufft
mex fft2_cuda_sp_dp.c -IC:\CUDA\include -LC:\CUDA\lib64 -lcudart -lcufft
mex ifft2_cuda.c -IC:\CUDA\include -LC:\CUDA\lib64 -lcudart -lcufft
dir

. bin nvmex.txt
… fft2_cuda.c nvmex_helper.m
FS_2Dflow.pdf fft2_cuda.mexw64 nvmexopt_s.bat
FS_2Dturb.m fft2_cuda_sp_dp.c nvmexopts.bat
FS_vortex.m fft2_cuda_sp_dp.mexw64 nvmexopts_n.bat
README.txt gmmcuda nvmexopts_s2.bat
Szeta.cu ifft2_cuda.c perl_wk
Szeta.linkinfo ifft2_cuda.mexw64 speed_fft.m
Szeta.m nvmex.m
bilininterp nvmex.pl
bilininterp.zip nvmex.pl.txt

speed_fft
256

512

768

    1024

    1280

    1536

    1792

    2048

which Szeta
S:\matlab\Matlab_CUDA_1.1\Szeta.m

tic; FS_2Dturb(128,1,1,1); toc;

CFL =

0.1017

Gsqav =

1.1995

Elapsed time is 2.782409 seconds.

tic; FS_vortex; toc;

ans =

512

Elapsed time is 37.390650 seconds.

nvmex -f nvmexopts_n.bat Szeta.cu -IC:\cuda\include -LC:\cuda\lib64 -lcufft -lcudart
matlabroot: C:\Program Files\MATLAB\R2009a,
cmd_name: C:\Program Files\MATLAB\R2009a\bin\nvmex.pl,
matlabroot: C:\PROGRA~1\MATLAB\R2009a,
Szeta.cu
tmpxft_00000ae0_00000000-3_Szeta.cudafe1.gpu
tmpxft_00000ae0_00000000-8_Szeta.cudafe2.gpu
tmpxft_00000ae0_00000000-3_Szeta.cudafe1.cpp

which Szeta
S:\matlab\Matlab_CUDA_1.1\Szeta.mexw64

tic; FS_2Dturb(128,1,1,1); toc;

CFL =

0.1017

Gsqav =

1.1995

Elapsed time is 15.317960 seconds.

tic; FS_vortex; toc;

ans =

512

Elapsed time is 129.676483 seconds.

speed_fft_exe.jpg

I got similar results… shown below

which Szeta
D:_OCT\SV code and data\Matlab_CUDA_1.1\Szeta.m

tic; FS_2Dturb(128,1,1,1); toc;

HI!

Elapsed time is 3.383610 seconds.

tic; FS_vortex; toc;

Elapsed time is 23.380347 seconds.

which Szeta
D:_OCT\SV code and data\Matlab_CUDA_1.1\Szeta.mexw64

tic; FS_2Dturb(128,1,1,1); toc;

Elapsed time is 4.985347 seconds.

tic; FS_vortex; toc;

Elapsed time is 23.729474 seconds.

Though not as severe as what Steve got, but slower than native MATLAB nonetheless.

My system build is
Core i5-750
4G 1600MHz RAM
GeForce 8800GT
Tesla C1060

With
Windows 7 Pro 64bit
CUDA 2.3
Matlab 2009a
VS2008 Pro

I was wondering if having two CUDA-enabled cards could’ve had some (negative) effect(s), or could it be something else?

Anybody has any ideas? :confused:

Thanks,

ps. HAPPY NEW YEAR~

Hi applepi,

Interesting results you’re seeing. We’ve not noticed that different versions of MATLAB have different speeds. You could try out our blas_example.m and fft_example.m demos which invoke CUBLAS and CUFFT to see if you get anything different.

You can also use our SDK for linking your own CUDA code into Jacket’s runtime memory manager and lazy execution engine.

We’ve never noticed a slowdown in any single GPUs performance in multi-GPU situations either, so I doubt that’s the problem.

Best,

John