I am creating a CUDA Graph manually to represent a sparse solver application. I need to call cublasDgemm() function for some of the operations. Is there a way to call cublasDgemm() from a CUDA Graph node?
I would appreciate if someone could point me towards some example.
@Robert_Crovella , tagging you incase you may help me out.
7_CUDALibraries/conjugateGradientCudaGraphs. Demonstrates conjugate gradient solver on GPU using CUBLAS/CUSPARSE library calls captured and called using CUDA Graph APIs.
Thank you so much for your quick reply, @Robert_Crovella .
In the conjugate-gradient-using-cuda-graphs, the graph is create using stream capture mechanism.
But I am creating my graph MANUALLY using
I am wondering whether I should use
cudaGraphAddHostNode to call the cuBLAS routines.
How about capturing the cublas graph, then add it to your graph using
Not sure if things have changed with the latest gpu version, but stream capture seems to have some issues when run on a thread (where other threads are also accessing the GPU). If there was a way serialize and load a graph saved on a stream capture, the cudaGraphAddChildNode function would probably be sufficient.
Seems like this is the only way to add cuBLAS routines into a CUDA Graph.
Hello @vivek.krishnan ,
Can you please mention the issues that you are referring to? It looks like I have some issues (i.e. getting wrong answers) with stream capture when using managed memory in cublas routines.