Thank you for your response.
Actually I want Graph to make use of my existing 10 streams on which I can do job queue management. If Graph is going to manage streams internally then it will be difficult for me to distribute workload between them as I am playing with pointer indices of big data structure.
I observed one more issue with graph. I would like to pass pointer to data structure (int * in below example) to my kernel and capture it via cudaStreamBeginCapture(). And for remaining LOOP_COUNT iteration, I am trying to change pointer address by some offset by calling cudaGraphLaunch(). But my kernel is always receiving first pointer address to data structure which is passed between cudaStreamBeginCapture() and cudaStreamEndCapture(). Is there any way I can send updated parameters to kernel while calling cudaGraphLaunch?
Thanks in advance!
long *outputs, *deviceOutput;
cudaMemcpyAsync(deviceInputs, inputs, inputSize * sizeof(int), cudaMemcpyDefault, stream1);
cudaMemcpyAsync(deviceOutput, outputs, inputSize * sizeof(long), cudaMemcpyDefault, stream1);
int *temp = deviceInputs + 0;
addition<<<(inputSize + 255) / 256, 256, 0, stream1>>>(inputSize, temp, deviceOutput);
cudaMemcpyAsync(outputs, deviceOutput, inputSize * sizeof(long), cudaMemcpyDefault, stream1);
cudaGraphInstantiate(&graphExec, graph, NULL, NULL, 0);
for (int i = 0; i < LOOP_COUNT; i++)
temp = deviceInputs + i;
printf("\nPassed pointer = %p, i=%d", temp, i);