I am studying the cudaGraph API and very confused by the cudaGraphAddKernelNode. In the official https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__GRAPH.html, the statement for cudaGraphAddKernelNode is:
- Kernel parameters can be specified via kernelParams. If the kernel has N parameters, then kernelParams needs to be an array of N pointers. Each pointer, from kernelParams[0] to kernelParams[N-1], points to the region of memory from which the actual parameter will be copied. The number of kernel parameters and their offsets and sizes do not need to be specified as that information is retrieved directly from the kernel’s image.
Does it mean when I call cudaGraphAddKernelNode with a parameter of type cudaKernelNodeParams, the cuda runtime will copy all the arguments pointed by kernelParams[0] to kernelParams[N-1] “immediately”? I am asking this because in my program I have many other functions to set up the pointers to local values. If the cuda runtime copy the actual parameter at the time I execute the graph, the pointer may point to some dangling locations.