[Error] Access counter and nsight system with performance counter

Hi.

I’m studying about nvidia’s unified virtual memory and access counter.

I want to see dram bandwidth or pcie bandwidth that collected by nsight system with gpu-metrics-device option.

My environment is below.

NVIDIA TITAN RTX
Ubuntu 20.04
Nvidia driver 560.28.03
CUDA 12.6

https://github.com/NVIDIA/open-gpu-kernel-modules

...after build...

insmod kernel-open/nvidia-uvm uvm_perf_access_counter_mimc_migration_enable=1

And my application code.

   ...Some codes...

    DATA_TYPE *A_gpu;
    DATA_TYPE *x_gpu;
    DATA_TYPE *y_gpu;
    DATA_TYPE *tmp_gpu;

    cudaMallocManaged(&A_gpu, sizeof(DATA_TYPE) * NX * NY);
    cudaMallocManaged(&x_gpu, sizeof(DATA_TYPE) * NY);
    cudaMallocManaged(&y_gpu, sizeof(DATA_TYPE) * NY);
    cudaMallocManaged(&tmp_gpu, sizeof(DATA_TYPE) * NX);

    cudaMemAdvise(A_gpu, sizeof(DATA_TYPE) * NX * NY, cudaMemAdviseSetAccessedBy, 0);
    cudaMemAdvise(x_gpu, sizeof(DATA_TYPE) * NY, cudaMemAdviseSetAccessedBy, 0);
    cudaMemAdvise(y_gpu, sizeof(DATA_TYPE) * NY, cudaMemAdviseSetAccessedBy, 0);
    cudaMemAdvise(tmp_gpu, sizeof(DATA_TYPE) * NX, cudaMemAdviseSetAccessedBy, 0);

    init_array(x_gpu, A_gpu);
    GPU_argv_init();
    total_time += ataxGpu(A_gpu, x_gpu, y_gpu, tmp_gpu);

    ...Rest of codes...

In the above code, cudaMemAdvise makes that gpu’s access counter directly activates and uvm driver handles interrupt from gpu’s access counter.

The final result is below block.

 GPU 0: General Metrics for NVIDIA TU10x (any frequency)
WARNING: CPU IP/backtrace sampling not supported, disabling.
Try the 'nsys status --environment' command to learn more.

WARNING: CPU context switch tracing not supported, disabling.
Try the 'nsys status --environment' command to learn more.

Start exp 1
CPU Runtime: 9.093303s
Non-Matching CPU-GPU Outputs Beyond Error Threshold of 0.50 Percent: 0

AVG GPU Runtime: 30.403080s
Generating '/tmp/nsys-report-dcd0.qdstrm'
[1/1] [========================100%] report5.nsys-rep
Importer error status: Importation succeeded with non-fatal errors.
**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  Props {
    Items {
      Type: ErrorText
      Value: "GPU Metrics [0]: NVPA_STATUS_ERROR\n- API function: Nvpw.GPU_PeriodicSampler_DecodeCounters_V2(&params)\n- Error code: 1\n- Source function: virtual QuadDDaemon::EventSource::PwMetrics::PeriodicSampler::DecodeResult QuadDDaemon::EventSource::{anonymous}::GpuPeriodicSampler::DecodeCounters(uint8_t*, size_t) const\n- Source location: /dvs/p4/build/sw/devtools/Agora/Rel/CUDA12.6/QuadD/Target/quadd_d/quadd_d/jni/EventSource/GpuMetrics.cpp:263"
    }
  }
}


**** Errors occurred while processing the raw events. ****
**** Please see the Diagnostics Summary page after opening the report file in GUI. ****
Generated:
    /home/ssbaek/workspace/UVM_benchmark/UVM_benchmarks_oversub/polybench/ATAX/report5.qdstrm
    /home/ssbaek/workspace/UVM_benchmark/UVM_benchmarks_oversub/polybench/ATAX/report5.nsys-rep

If i disable gpu’s access counter, error removed.

I think that the access counter option incurs some unexpected error in nsight system.

Can i get some advise? Thanks.

My initial suggestion is going to be to get a newer Nsight Systems from Nsight Systems | NVIDIA Developer.

@pkovalenko this system appears to have a Geforce 20 era Turing, should this work with the gpu-metrics options?

@hwilper

My nsight system already newer version.

Come to think of it, it’s not an access counter problem. When direct memory access from GPU to host memory, the nsight system doesn’t seem to be able to collect metrics.

To check this, Just replace cudaMallocManaged → cudaMallocHost and can get a same error.