Computeprof with mpi application

sunilsathe · April 29, 2011, 7:03pm

I have a MPI application which uses GPUs on individual machines to perform some tasks.
I would like to profile the GPU part using computeprof, but I am unable to do so.
In the session setting panel, I use mpiexec for launch and “-configfile myconfig” for arguments.
The myconfig file has the configuration to run the application, in this case just 1 process.
It fails. Any ideas? how to make it work?
Sunil

fcs · May 2, 2011, 8:55am

Under Linux, i profile my MPI/CUDA applications in terminal mode this way:

export COMPUTE_PROFILE=1

export COMPUTE_PROFILE_CSV=1

export COMPUTE_PROFILE_CONFIG=~/script/prof_counter1.sh

export COMPUTE_PROFILE_LOG=computeprof1.csv

the prof_counter.sh looks like that: (I took profiler counter available for my 1.3 cc GPU from Compute_Profiler.txt in /usr/local/cuda/doc )

timestamp              #: Time stamps for kernel launches and memory transfers. 

gpustarttimestamp      #: Time stamp when kernel starts execution in GPU. 

#gpuendtimestamp        #: Time stamp when kernel ends execution in GPU. This 

gridsize               #: Number of blocks in a grid along the X and Y dimensions 

threadblocksize        #: Number of threads in a block along the X, Y and Z 

dynsmemperblock        #: Size of dynamically allocated shared memory per block in 

stasmemperblock        #: Size of statically allocated shared memory per block in 

regperthread           #: Number of registers used per thread for a kernel launch.

memtransferdir         #: Memory transfer direction, a direction value of 0 is used 

memtransfersize        #: Memory transfer size in bytes. This option shows the amount 

memtransferhostmemtype #: Host memory type (pageable or page-locked). This option 

streamid               #: Stream Id for a kernel launch

local_load        #:  Number of executed local load instructions per warp in a SM

local_store       #:  Number of executed local store instructions per warp in a SM

gld_request       #:  Number of executed global load instructions per warp in a SM

gst_request       #:  Number of executed global store instructions per warp in a SM

#divergent_branch  #:  Number of unique branches that diverge

#branch            #:  Number of unique branch instructions in program

#sm_cta_launched   #:  Number of threads blocks executed on a SM

#gld_incoherent   #: Non-coalesced (incoherent) global memory loads

#gld_coherent     #: Coalesced (coherent) global memory loads

#gld_32b          #: 32-byte global memory load transactions

#gld_64b          #: 64-byte global memory load transactions

#gld_128b         #: 128-byte global memory load transactions

#gst_incoherent   #: Non-coalesced (incoherent) global memory stores

#gst_coherent     #: Coalesced (coherent) global memory stores

#gst_32b          #: 32-byte global memory store transactions

#gst_64b          #: 64-byte global memory store transactions

#gst_128b         #: 128-byte global memory store transactions

#instructions     #: Instructions executed

#warp_serialize   #: Number of thread warps that serialize on address conflicts 

#cta_launched     #: Number of threads blocks executed

But you must take care that all your mpi process will try to write to the same “computeprof1.csv”. One workaround i used were to the setenv() function in C and i set COMPUTE_PROFILE_LOG to a value which included mpirank.

Goog luck!

Topic		Replies	Views
Problem with cudaprof when executing a multi process program CUDA Programming and Performance	1	7161	March 29, 2010
CUDA visual profiler using mpi? CUDA Programming and Performance	1	1181	November 9, 2009
NVPROF- Error: incompatible CUDA driver version. Linux	1	1655	January 4, 2019
How to run these sample multi-gpu programs CUDA Programming and Performance	6	617	July 18, 2024
problem with multi gpu using mpi Legacy PGI Compilers	2	2191	December 2, 2015
Using multiple GPUs Legacy PGI Compilers	7	22098	August 11, 2009
Calling computeprof from a script launching profiler without GUI CUDA Programming and Performance	14	13870	February 9, 2011
Proper way to call CUDA function within MPI code CUDA Programming and Performance	5	499	April 4, 2024
CUDA aware MPI CUDA Setup and Installation	3	1558	April 14, 2016
use gpu and cpu with c language CUDA Programming and Performance	0	2065	May 10, 2010

Computeprof with mpi application

Related topics