Contraditory register count report when calling a non-inlined function

uchytilc · July 14, 2023, 6:36pm

The SASS live-register count produced when profiling a kernel reports an (incorrect?) massive spike in register usage (more than 100 registers) when reaching a CALL.ABS.NOINC instruction (as demonstrated in the attached image). The code below will demonstrate this behavior when profiled in Nsight Compute. Further, any device function decorated with __noinline__ results in the same phenomenon. Nested __noinline__ calls result in a multiplicative effect where the register count increases up to the cap of 255. The information produced by Nsight contradicts the information produced when compiling the code below using nvcc (nvcc -arch=sm_80 -Xptxas=“-v” kernel.cu) which states the kernel takes 17 registers. Is the massive spike actually occurring (and can it be removed) or is this a bug within Nsight?

#include <stdint.h>
#include <cuda_runtime.h>

extern "C" {
    __global__ void kernel(float* out) {
        uint32_t n = threadIdx.x + blockIdx.x*blockDim.x;
        out[n] = atan2f(static_cast<float>(n), 2.0f);
    }
}

int main(int argc, char const* argv[]) {

    float* d_ary;
    cudaMalloc(&d_ary, 32);

    kernel<<<1,32>>>(d_ary);

    float ary[32];

    cudaMemcpy(ary, d_ary, 32, cudaMemcpyDeviceToHost);

    return 0;
}

jmarusarz · July 20, 2023, 9:27pm

Thanks for submitting this. We are aware of this bug and have it filed in our system. It should be fixed in a future version.

uchytilc · July 20, 2023, 9:31pm

Thanks for the reply @jmarusarz ! I’ll keep my eye out for the fix.

michael.f.barad · November 17, 2023, 11:49pm

Has this been fixed? If so, which version? Thanks

veraj · November 20, 2023, 7:52am

Sorry. The issue hasn’t been fixed yet.

veraj · March 20, 2024, 9:52am

Hi, @uchytilc and @michael.f.barad

Sorry for the late update ! Our dev submit some fixes for this issue.
Can you please get latest Nsight Compute 2024.1 to confirm if this issue is gone ?

veraj · April 26, 2024, 10:02am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CALL.ABS.NOINC instruction in SASS CUDA Programming and Performance	26	3316	January 14, 2022
[nsight-compute][source view] misaligned "live registers" with sources Nsight Compute	6	136	October 30, 2025
__forceinline__, __noinline__ and "Too many resources" CUDA NVCC Compiler	1	831	May 20, 2023
Register usage and mixing small and large kernels CUDA NVCC Compiler	0	61	June 23, 2025
Live registers at the start of the kernel? Nsight Compute	2	1562	May 27, 2024
Register usage for same function with different launches CUDA Programming and Performance	5	884	March 18, 2014
Register usage of a device function for vector rotation CUDA Programming and Performance	14	797	June 12, 2022
Possible nvcc register usage bug CUDA Programming and Performance	0	1734	February 19, 2008
Bug in register usage, CUDA 2.1 CUDA Programming and Performance	0	3355	April 29, 2009
some questuons: __noinline__ CUDA Programming and Performance	3	9685	February 17, 2008

Contraditory register count report when calling a non-inlined function

Related topics