Nsight Compute returns "out of memory" error on cudaMallocHost

I’m running Nsight compute with the following options,

sudo /opt/nvidia/nsight-compute/2022.1.0/target/linux-desktop-glibc_2_11_3-x64/ncu --export nsys_compute.ncu-rep --force-overwrite --target-processes all --replay-mode application --app-replay-match grid --app-replay-buffer file --kernel-name-base function --launch-skip-before-match 0 --section LaunchStats --section Occupancy --section SpeedOfLight --sampling-interval auto --sampling-max-passes 5 --sampling-buffer-size 33554432 --profile-from-start 1 --cache-control all --clock-control base --apply-rules yes --import-source no --check-exit-code yes <executable>

However, it fails to profile any kernels and instead errors at the first cudaMallocHost call with an out of memory error.

 out of memory
==ERROR== The application returned an error code (6).
==WARNING== No kernels were profiled.

The cuda version I’m using is 10.1 on a device with compute capability 7.5. I have run my program with cuda-gdb and cuda-memcheck and it runs without error.

I am also using Cooperative groups in my kernel. Any suggestions on how to fix this?

Is the cudaMallocHost call the first CUDA API in the application? Is the error occurring before anything gets dispatched to the GPU? This is running out of host memory then, correct? Not GPU memory?