If we record a graph, we specify the input to the root node.
When we run the graph, it uses that as the input.
But we don’t want to keep processing the same data with the graph! How can we advance the graph to run the next buffer? You might suppose that we can pass the address of the buffer to the root node by reference, and then keep changing the address, but if we’re recording a sequence of cuBLAS calls of cuFFT, they don’t accept a float **.
For example:
// Start graph capture
cudaStreamBeginCapture(cudaStreamPerThread, cudaStreamCaptureModeGlobal);
cufftExecR2C(plan_, (cufftReal*) input_data, complex_result);
// Stop graph capture
cudaStreamEndCapture(cudaStreamPerThread, &graph);
When we run the graph, we have made things work by copying the new data into
input_data
, but that seems really inefficient and unnecessary.