Can anyone suggest profiling software (Win7)


Ive exceeded my budget on purchasing hardware and am looking for a low cost profilier for helping me parallelise my code.

Im new to CUDA. Ive got a large Fortran serial code which Im certain can be adjusted (for memory and data precission) and then made significantly more efficient with CUDA.

Im planning to use PGI Fortran Accelarator initially. Im running on Win7 64b and am relitivly comfortable in Visual Studio.

Can anyone suggest a good profiler? im looking primarily to find ammount of time in each sub / loop.
(VTune is more than can currently afford)