Visual Profiler says my occupancy is 221%

vvolkov · April 2, 2013, 8:09am

When I run the following code on GTX480 through Visual Profiler, the profiler claims that achieved occupancy is 2.21. However, it is supposed to be less or equal to 1.0 - “achieved occupancy” is defined as “Ratio of the average active warps per active cycle to the maximum number of warps supported on a multiprocessor”. You can’t get more average active warps than the maximum number of warps, right?

#include <cuda.h>
#include <stdio.h>

__global__ void memset( int4 *p, int x )
{ 
    p[threadIdx.x+blockDim.x*blockIdx.x] = make_int4(x,x,x,x); 
}

int main( int argc, char **argv )
{
    int grid = 10000, blocksize = 768;

    int4 *p;
    cudaMalloc( (void**)&p, sizeof(float4)*blocksize*grid );
    memset<<<grid,blocksize>>>( p, 0 );
    puts( cudaGetLastError() == cudaSuccess ? "success" : "error" );
    cudaFree( p );

    cudaDeviceReset( ); //profiler fails otherwise
}

I use the most recent version of CUDA, downloaded today.

Greg · April 3, 2013, 2:14am

vvolkov,

Can you please provide the operating system and driver version.

The kernel should have a theoretical SM occupancy of 100% (48/48 warps). Unless there is a bug the metric should never exceed 100%. I’ll try to reproduce the issue tomorrow.

vvolkov · April 3, 2013, 8:56am

Windows 7 Enerprise, Service Pack 1, 64-bit
Driver Date: 9/29/2012
Driver Version: 9.18.13.694

I used cuda_5.0.35_winvista_win7_win8_general_64-3.msi to install CUDA and the driver.

I get this issue on both GTX480 and GT440 (DDR3), but not on GTX680.

Vasily

Greg · April 5, 2013, 4:46pm

Achieved occupancy percentage is defined as active_warps / active_cycles / MAX_WARPS_ON_SM * 100.

On the Fermi architecture the PM signal active_cycles does not increment on all cycles that active_warps > 0. Specifically, active_cycles will not increment if warps are allocated on the SM but all threads of the warps have exited but are waiting on instructions to complete.

The kernel in the question does very little work per warp. The kernel executes a global store then exits. The warp is considered active until the global store is accepted by the L1. Due to the short duration of warps it is possible that the SM may not have enough warps that can execute an instruction to keep active_cycles asserted for the full period.

If the metric achieved occupancy is greater than theoretical maximum for the launch then the metric should be clamped to the theoretical maximum. A future version of the profiler will clamp the value.

This issue should not occur on Kepler devices.

tera · April 14, 2013, 9:31pm

Similarly, I get a “Multiprocessor Efficiency” of 108.7% on a GTX Titan under Linux with cuda_5.0.35_linux_64_suse12.1-1.run and the driver from NVIDIA-Linux-x86_64-313.30.run.

According to the tooltip “Multiprocessor Efficiency” is “the ratio of the time at least one warp is active on a multiprocessor to the total time”. This would be a highly interesting metric for optimization. If it gave meaningful results, that is.

Topic		Replies	Views
Question about NVIDIA Visual Profiler's occupancy results CUDA Programming and Performance	2	977	May 29, 2019
Achieved Occupancy vs Theoretical CUDA Programming and Performance	6	5298	September 20, 2011
CUDA Visual Profiler Vista CUDA Programming and Performance	2	4131	September 11, 2009
[SOLVED] Concurrent Kernel Execution CUDA Programming and Performance	7	5900	May 21, 2016
Kernel is bound by instruction and memory latency but achieved high occupancy CUDA Programming and Performance cuda	0	342	August 1, 2020
Exact meaning of "occupancy" Slightly confused CUDA Programming and Performance	2	2271	April 20, 2009
Some questions on GPU utilization CUDA Programming and Performance	5	4244	October 8, 2021
I want to know means about CUPTI metrics in details. CUPTI – CUDA Profiler Tools Interface	2	1283	October 12, 2021
Profiling my code I need some help to understand the output of the visual profiler CUDA Programming and Performance	5	1862	February 3, 2012
Help analyzing Visual Profiler Output CUDA Programming and Performance	0	4089	February 10, 2011

Visual Profiler says my occupancy is 221%

Related topics