Hi there,
I’m at the stage of my port where I’m trying to optimise my code to get the best possible speed-up. The code is currently running slower on the GPU than it was on a single CPU.
Are there any profiling tools available so I can pinpoint the bottlenecks in my code? I know there is one C for CUDA but I’m unsure of any tools available from PGI. Any insight would be much appreciated.
Cheers,
Crip_crop
Okay, I discovered pgprof and have used that to analyse my code’s performance. When I do pgcollect with the -cudainit option my code runs a lot faster without carrying out any optimisations. This leads me to believe the main bottleneck in my code is in fact the initialization of the cuda drivers.
Does anyone know if there is a way of initializing the drivers before running the code on the gpu to remove this bottleneck?
Cheers,
Crip_crop
You can either use PGPROF or the CUDA visual profiler to profile CUDA Fortran code. This article explains how to use both of them: Account Login | PGI
I’d recommend using the CUDA visual profiler since you can actually profile the GPU execution itself. Here’s the guide for the CUDA visual profiler which tells you how to install, setup and run it: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/VisualProfiler/Compute_Visual_Profiler_User_Guide.pdf
Cheers Tom. Trying cuda visual profiler now. Having a few problems getting it working though. I’m getting the following error message when i run the app:
unable to load the ‘cuda’ library. compute visual profiler device features will be disabled
reading online it seems that it could be a problem with the driver setting so I’ve contacted my system administrator.
In the meantime, if you have any suggestions or insight into running cudainit in the background before running code on the gpu it’d be much appreciated.
Cheers,
Crip_crop
Does anyone know if there is a way of initializing the drivers before running the code on the gpu to remove this bottleneck?
You can run the PGI ‘pgcudainit’ utility as a background process to prevent the Linux kernel from powering down the devices.