Profiling cuda fortran code

Hi there,

I’m at the stage of my port where I’m trying to optimise my code to get the best possible speed-up. The code is currently running slower on the GPU than it was on a single CPU.

Are there any profiling tools available so I can pinpoint the bottlenecks in my code? I know there is one C for CUDA but I’m unsure of any tools available from PGI. Any insight would be much appreciated.

Cheers,
Crip_crop

Okay, I discovered pgprof and have used that to analyse my code’s performance. When I do pgcollect with the -cudainit option my code runs a lot faster without carrying out any optimisations. This leads me to believe the main bottleneck in my code is in fact the initialization of the cuda drivers.

Does anyone know if there is a way of initializing the drivers before running the code on the gpu to remove this bottleneck?

Cheers,
Crip_crop

You can either use PGPROF or the CUDA visual profiler to profile CUDA Fortran code. This article explains how to use both of them: http://www.pgroup.com/lit/articles/insider/v2n1a2.htm

I’d recommend using the CUDA visual profiler since you can actually profile the GPU execution itself. Here’s the guide for the CUDA visual profiler which tells you how to install, setup and run it: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/VisualProfiler/Compute_Visual_Profiler_User_Guide.pdf

Cheers Tom. Trying cuda visual profiler now. Having a few problems getting it working though. I’m getting the following error message when i run the app:

unable to load the ‘cuda’ library. compute visual profiler device features will be disabled

reading online it seems that it could be a problem with the driver setting so I’ve contacted my system administrator.

In the meantime, if you have any suggestions or insight into running cudainit in the background before running code on the gpu it’d be much appreciated.

Cheers,
Crip_crop

Does anyone know if there is a way of initializing the drivers before running the code on the gpu to remove this bottleneck?

You can run the PGI ‘pgcudainit’ utility as a background process to prevent the Linux kernel from powering down the devices.

  • Mat