Visual Profiler for MATLAB code? Unable to collect results!

Hi everybody,

I hope my post is not too trivial, but I’m quite new to the CUDA community and I cannot solve a very basic issue.
The problem is the following: I want to profile some MATLAB code using the Visual Profiler tool, but I am not able to obtain any result.
Some info:

  • Graphic Card: GeForce GTX 460
  • Operating System: Ubuntu 11.04
  • NVIDIA drivers: 295.49
  • NVCC v4.2, V0.2.1221 (same for NVVP)

I successfully installed the SDK toolkit and I successfully made some tests with the Visual Profiler using Python scripts (PyCUDA). I was able to create a timeline and collect statistics and other information about running times.
I then switched to some MATLAB code, but I cannot figure out how to run it using the Visual Profiler. I serched a lot around but nobody seems to have the same problem. I found that this should be the session configuration:

  • File: /usr/local/matlab2011a/bin/matlab (MATLAB path)
  • Working dir: /home/…/script_path (script path)
  • Arguments: -nojvm -nosplash -r <script_name> (without ‘.m’)

The Profiler starts correctly and runs the script until the end. In the console window I can see the correct output of the program. Everything seems to work fine, but at the end of some runs (NVVP runs the program 24 times to collect statistics) the Profiler window is still empty. I can see the green “checks” next to the Analysis panel (“Timeline”, “Multiprocessor”, “Kernel Memory”, “Kernel Instruction”). Anyway, the Analysis Results panel reports the following:
“Application timeline is required for the analysis”.

What’s wrong with my configuration? I Googled a bit and i followed some advices, like putting the “exit;” or “quit;” command at the end of the script, but it didn’t work.
I’m not using MEX files, my script just uses some gpuArray() operations, even no kernels.
I read something about an additional package for using CUDA and MATLAB, but the NVIDIA website redirect me on wrong pages if I try to download

Any help would be very appreciated!


Your configuration worked fine for me when profiling Matlab -> MEX -> CUDA applications…

Basically what you’re doing is the same as suggested here:

Don’t have experience profiling the parallell toolbox in matlab though.

Couple of comments:

  1. The NVIDIA Profiler was not designed to profile M-code. There is a built-in profiler in MATLAB already that most people use. The most anyone has integrated the NVIDIA profiler with M-code has been through the loose coupling that is possible when you MEXify CUDA code.

  2. Are you sure you want to use gpuArray’s? Have you seen how they compare (see In most cases, they are slower than the CPU, which may be why you’re interested in profiling things.

  3. If you were to go with Jacket, you’d get access to a GPU-specific profiler that runs on M-code.

Good luck on this and shoot me an email if I can be useful to you. Cheers!

Just having a quick look here it does seem like Mathworks manages to get significant speedup on several important kernels:

Of course that information is biased… Just as the previous link should be ;)


If you look carefully at the PDF, you’ll find that Martin (a MathWorks engineer) doesn’t share any results of accelerating real end-user applications. Rather, the PDF only shows functions that directly call good 3rd party libraries from NVIDIA or open source:

  • Spectrogram benchmarks only calls FFT which is a direct call to CUFFT
  • A\b only calls MAGMA
  • Simple arithmetic that must be called inside a cumbersome arrayfun call

So, yes, MathWorks has been able to absorb other people’s GPU libraries, getting benefit on a few functions. But for any real applications, gpuArray’s are slow, often slower than the CPU.

The benchmarks run on this comparison page were run by bloggers and other scientists, not by AccelerEyes people. You can try it yourself if you’d like. I’m happy to provide a license for that purpose.

Thanks a lot for the replies.

Actually, I found that my application obtained a speed-up of about 40x when using GpuArrays compared to the single-CPU version. Basically this is due to the fact that I mainly use very simple and fast matrix operations, like sums and multiplications over very large matrices.

I’m still a bit confused, and also disappointed, because I was expecting NVIDIA to allow profiling any MATLAB code using the Visual Profiler tool.

If somebody has an idea on how to do good CUDA profiling of GpuArrays, I’d be very interested in any suggestion or references.



Makes sense. Both of those operations are just done by CUBLAS (i.e. sums/dot products & matrix multiplies are part of CUBLAS). So this thread just turns out to be a hat tip to CUBLAS.

It is quite a bit more complicated to profile GPU-based M-code. Jacket’s GPROFVIEW is the only tool that does this well.

I’ve utilized the method in the above post to successfully profile MATLAB MEX files that I compiled myself, not through the Parallel Toolbox.

If you’re in Windows, skip the -nojvm command line option.

I actually needed that for it 2 work :/

This is the working procedure on Windows:

You can profile Matlab mexfiles including CUDA codes using the NVIDIA Visual Profiler by the following procedure.

  1. Write your mexfile including CUDA code by the guidelines in
  2. Add cudaDeviceReset() at the end of your mexfunction.
  3. Write your Matlab .m file end add exit at its end.
  4. Launch the NVIDIA Visual Profiler. File -> New Session.
  5. File: add the full path of the Matlab executable file, for example C:\Program Files\MATLAB\R2012b\bin\win64\MATLAB.exe
  6. Working directory: add the full path of the Matlab .m file.
  7. Arguments: -nojvm -nosplash -r file_name_without_m_extension.
  8. Next -> Finish and that’s it!