When calling cuGraphInstantiateWithFlags, I get an CUDA_ERROR_ALREADY_MAPPED error

I have a kernel that I capture to a graph with different params, the first three times are captured fine, but the fourth one it fails with CUDA_ERROR_ALREADY_MAPPED. This error does not seem to be documented for this specific call.

What am I missing? Is there a limit on graph instantiation?

For more details, the constructed graph only has 516 nodes, and from what I can tell each graph only adds 8MB to the overall memory usage on the GPU, there are multiple GBs available. However, reducing the number of nodes further results in more graphs being successfully instantiated, so there seems to be some limit I don’t understand. The GPU in question is H100 NVL and the CUDA version is 12.6.