Nsight system showing wrong behaviour

Hi,
I’m profilling my code using nsight systems with cuda 11.3 .
Here is the code i"m using for profilling ::

int main(int argc, char **argv)
{
    // Profilling Tags
    nvtxMarkA("XX");
    nvtxRangePushA("XX");

    // Initializing the Python
    Py_Initialize();

    // Importing the Os and appending Path in python
    PyRun_SimpleString("import sys\n"
                       "import os\n"
                       "sys.path.append(os.getcwd())\n");

    // Setting the precision and Type of Program
    Set_precision();

    // Setting the communications
    set_communications();

    // Setting the signal for interupt
    signal(SIGINT, signal_callback_handler);

    if (HYDRO_FLAG)
    {

        if (!precision.compare("single"))
        {
            Hydro<T1_f, T2_f>();
        }

        if (!precision.compare("double"))
        {
            Hydro<T1_d, T2_d>();
        }
    }

    if (SCALAR_FLAG || RBC_FLAG)
    {

        if (!precision.compare("single"))
        {
            Scalar<T1_f, T2_f>();
        }

        if (!precision.compare("double"))
        {
            Scalar<T1_d, T2_d>();
        }
    }

    if (MHD_FLAG)
    {
        if (!precision.compare("single"))
        {
            MHD<T1_f, T2_f>();
        }

        if (!precision.compare("double"))
        {
            MHD<T1_d, T2_d>();
        }
    }

    // Python Finalize
    Py_Finalize();
    nvtxRangePop();
    
    return 0;
}

But somehow when i’m opening the profilling file . Instead of 1 it is showing 3 XX runs . WHY ?? Can anyone help


Here , in the screenshot also you can see 3 calls . while it should be only one .

Hi manver,
NVTX ranges that wrap CUDA kernel launches are projected from the CPU onto the GPU, creating GPU-side annotations.
That is why the Events view displays multiple ranges with the same name instead of the single range you expected.

so whats the solution