Problems with maxrregcount and dynamic parallelism

I am trying to estimate the effect of restricting register usage on achieved occupancy of the application. While running my experiments, when I tried to restrict the number of registers of cdpBezierTessellation application found in Nvidia samples, I got an error.

Flag added to nvcc: -maxrregcount 16

Error: nvlink error : entry function ‘_Z21computeBezierLinesCDPP10BezierLinei’ with max regcount of 16 calls function ‘cudaMalloc’ with regcount of 18

I don’t understand exactly why this is happening. Can anyone help me with this?


in short: 16 < 18

nvcc: -maxrregcount 16

also cross-posted here: