Problems with maxrregcount and dynamic parallelism

Hi,
I am trying to estimate the effect of restricting register usage on achieved occupancy of the application. While running my experiments, when I tried to restrict the number of registers of cdpBezierTessellation application found in Nvidia samples, I got an error.

Flag added to nvcc: -maxrregcount 16

Error: nvlink error : entry function ‘_Z21computeBezierLinesCDPP10BezierLinei’ with max regcount of 16 calls function ‘cudaMalloc’ with regcount of 18

I don’t understand exactly why this is happening. Can anyone help me with this?

Thanks

in short: 16 < 18

increase:
nvcc: -maxrregcount 16

also cross-posted here:

http://stackoverflow.com/questions/30663656/problems-with-maxrregcount-and-dynamic-parallelism