CUDA visual profiler using mpi?

Maddy_Scientist · November 4, 2009, 10:52pm

Hi,

I have an mpi code that using gpus. I need to profile the code, and was wondering how one does this using the cuda visual profiler. It doesn’t seem obvious how to do this. My code in question uses only a single GPU but has a separate process running simultaneously.

Thanks!

fcs · November 9, 2009, 9:20am

I am quite interested by the topic. If you are using only one gpu, it seems to me that it is quite simple, launching your mpi application through cuda visual profiler should word, isn’t it ?

The result you can obtain this way should be coherent as it will select your device’s harware counter.

I used to try doing this in terminal mode by the past with:

[codebox]export CUDA_PROFILE=1

export CUDA_PROFILE_CSV=1

export CUDA_PROFILE_CONFIG=~/.cudaprof1.config

[/codebox]

the .config file contains the counters you want to profile (i modifed the nvidia doc in order to just have to comment or uncomment what i need in the following file :

[codebox]#The profiler supports the following options:

#Time stamps for kernel launches and memory transfers.

#This can be used for timeline analysis.

timestamp

#Number of blocks in a grid along the X and Y dimensions for a kernel launch

gridsize

#Number of threads in a block along the X, Y and Z dimensions for a kernel launch

threadblocksize

#Size of dynamically allocated shared memory per block in bytes for a kernel launch

dynsmemperblock

#Size of statically allocated shared memory per block in bytes for a kernel launch

stasmemperblock

#Number of registers used per thread for a kernel launch

regperthread

#Memory transfer direction

#a direction value of 0 is used for host->device memory copies and a value of 1 is used for device->host

memtransferdir

#Memory copy size in bytes

memtransfersize

#Stream Id for a kernel launch

streamid

##The profiler supports logging of following counters during kernel execution

##There is a max of 4 profiler counters

##Non-coalesced (incoherent) global memory loads (always zero on coputa capability 1.3)

#gl_incoherent

##Non-coalesced (incoherent) global memory loads

#gld_coherent

##32-byte global memory load transactions

gld_32b

##64-byte global memory load transactions

gld_64b

##128-byte global memory load transactions

gld_128b

##Global memory loads invalid on compute capability 1.3

#gld_request

##Non-coalesced (incoherent) global memory stores (always zero on coputa capability 1.3)

#gst_incoherent

##Coalesced (coherent) global memory stores

#gst_coherent

##32-byte global memory store transactions

gst_32b

##64-byte global memory store transactions

gst_64b

##128-byte global memory store transactions

gst_128b

##Gobal memory stores invalid on compute capability 1.3

#gst_request

##Local memory loads

local_load

##Local memory stores

local_store

##Branches taken by threads executing a kernel

branch

##Divergent branches taken by threads executing a kernel

divergent_branch

##Instructions executed

instructions

##Number of thread warps that serialize on address conflicts to either shared or constant memory

warp_serialize

##Number of threads blocks executed

cta_launched[/codebox]

please get your feedback in what you have experimented and worked.

Thanks

Topic		Replies	Views
Computeprof with mpi application CUDA Programming and Performance	1	817	May 2, 2011
Problem with cudaprof when executing a multi process program CUDA Programming and Performance	1	7158	March 29, 2010
Cuda Compute Profiler not working with MultiThreading Driver API Applications CUDA Programming and Performance	2	759	February 15, 2012
Profiling cuda fortran code Legacy PGI Compilers	4	3470	March 22, 2011
visual profiler with MPI CUDA Programming and Performance	3	6240	December 31, 2008
Compute Visual Profiler # runs Compute Visual Profiler on CUDA CUDA Programming and Performance	2	1890	March 7, 2011
Opencl Visual profiling CUDA Programming and Performance	3	5356	April 23, 2010
profiling mpi programs CUDA Programming and Performance	6	1395	March 26, 2018
CUDA profiling without GUI and then interpreting results with visual profiler CUDA Programming and Performance	1	1040	July 31, 2011
CUDA Profiler Question about CUDA profiler CUDA Programming and Performance	1	1306	January 9, 2009

CUDA visual profiler using mpi?

Related topics