Hi,
I’d like to use cuda graph for my ML engine. some input and output params( input & output pointers) may change frequently. so I have to create many cuda graphs but I don’t want to. because the number of IO is so many.
I read Employing CUDA Graphs in a Dynamic Environment | NVIDIA Technical Blog this article, which provides two ways to use cuda graph.
It seems updating cuda graph may work with my situation, but there is problem with it’s performance.
my question is, is there another way to use (or change) cuda graph effiently? or Must I create enough cuda graphs.