Some Visual Profiler questions

Ionica · September 11, 2010, 6:06pm

Hi!

Iâ€™ve just started programming on a GTX260 and Iâ€™m trying to use the visual profiler in order to measure the performance.

Iâ€™ve figured out some of the parameters and profiler counters, but I still have trouble in understanding some of them.

Iâ€™ve written the following, very simple matrix multiplication kernel (which doesnâ€™t use shared memory):

__global__ void mulMatrixKernel( float* g_matrix_A, float* g_matrix_B, float* g_matrix_C, int rows, int cols) 

{

  // access thread id

  const unsigned int row = blockIdx.y*TILE_DIM+threadIdx.y;

  const unsigned int col = blockIdx.x*TILE_DIM+threadIdx.x;

  float sum=0.0f;

//perform computation

  if(row<rows && col<cols)

	  for(int i=0;i<rows;i++)

		  sum+=g_matrix_A[row*cols+i]*g_matrix_B[i*cols+col];

g_matrix_C[row*cols+col]=sum;

}

With the visual profiler, Iâ€™m getting the following values:

1.Static shared memory per block: 36. I donâ€™t use shared memory, so the only shared memory that is used, is for the parameters of the kernel. But how are the 36 bytes distributed for the five parameters?

2.Registers per thread: 9. I canâ€™t see more than three: row, col and sum.

Further on, in the summary table (View->Summary table), there is a column called â€œinstruction throughputâ€, which doesnâ€™t have any unit, itâ€™s just a number (for me 0.413).

In the help file, I found the following explanation:

â€œThis is the ratio of achieved instruction rate to peak single issue instruction rate. The achieved instruction rate is calculated using the “instructions” profiler counter. The peak instruction rate is calculated based on the GPU clock speed. In the case of instruction dual-issue coming into play, this ratio shoots up to greater than 1.â€

Can someone tell me this in other words?

Thanks a lot!

Topic		Replies	Views
Visual Profiler displays erroneous output with multiple GPUs Profiler problem on multi-gpu scaling b CUDA Programming and Performance	0	791	May 9, 2012
Question of NVIDIA CUDA Visual Profiler Version 2.2 CUDA Programming and Performance	1	1002	November 13, 2009
visual studio performance profiler on CUDA code CUDA Programming and Performance	1	6919	March 20, 2008
Visual profiler CUDA Programming and Performance	1	2596	October 3, 2011
Question about NVIDIA CUDA Visual Profiler Version 2.2 CUDA Programming and Performance	0	2915	November 13, 2009
Calculating Gflops, memory bandwidth and visual profiler question performance calculation CUDA Programming and Performance	3	13624	October 30, 2023
Performance - Measuring CUDA Programming and Performance	0	573	February 11, 2013
Visual profiler results CUDA Programming and Performance	2	1654	June 16, 2009
Gap between measured perf. and peak CUDA Programming and Performance	8	13074	March 20, 2008
URGENT: Weird CUDA profiler results...need help with analysis CUDA Programming and Performance	1	3131	June 18, 2009

Some Visual Profiler questions

Related topics