I’m running Nsight compute with the following options,
sudo /opt/nvidia/nsight-compute/2022.1.0/target/linux-desktop-glibc_2_11_3-x64/ncu --export nsys_compute.ncu-rep --force-overwrite --target-processes all --replay-mode application --app-replay-match grid --app-replay-buffer file --kernel-name-base function --launch-skip-before-match 0 --section LaunchStats --section Occupancy --section SpeedOfLight --sampling-interval auto --sampling-max-passes 5 --sampling-buffer-size 33554432 --profile-from-start 1 --cache-control all --clock-control base --apply-rules yes --import-source no --check-exit-code yes <executable>
However, it fails to profile any kernels and instead errors at the first cudaMallocHost call with an out of memory error.
out of memory
==ERROR== The application returned an error code (6).
==WARNING== No kernels were profiled.
The cuda version I’m using is 10.1 on a device with compute capability 7.5. I have run my program with cuda-gdb and cuda-memcheck and it runs without error.
I am also using Cooperative groups in my kernel. Any suggestions on how to fix this?