If we record a graph, we specify the input to the root node.
When we run the graph, it uses that as the input.
But we don’t want to keep processing the same data with the graph! How can we advance the graph to run the next buffer? You might suppose that we can pass the address of the buffer to the root node by reference, and then keep changing the address, but if we’re recording a sequence of cuBLAS calls of cuFFT, they don’t accept a float **.
// Start graph capture cudaStreamBeginCapture(cudaStreamPerThread, cudaStreamCaptureModeGlobal); cufftExecR2C(plan_, (cufftReal*) input_data, complex_result); // Stop graph capture cudaStreamEndCapture(cudaStreamPerThread, &graph);
When we run the graph, we have made things work by copying the new data into
, but that seems really inefficient and unnecessary.