Hello,
I am creating a CUDA Graph manually to represent a sparse solver application. I need to call cublasDgemm() function for some of the operations. Is there a way to call cublasDgemm() from a CUDA Graph node?
I would appreciate if someone could point me towards some example.
@Robert_Crovella , tagging you incase you may help me out.
7_CUDALibraries/conjugateGradientCudaGraphs. Demonstrates conjugate gradient solver on GPU using CUBLAS/CUSPARSE library calls captured and called using CUDA Graph APIs.
Thank you so much for your quick reply, @Robert_Crovella .
In the conjugate-gradient-using-cuda-graphs, the graph is create using stream capture mechanism.
But I am creating my graph MANUALLY using cudaGraphAddKernelNode
, cudaGraphAddHostNode
, cudaGraphAddMemcpyNode
and cudaGraphAddMemsetNode
functions.
I am wondering whether I should use cudaGraphAddKernelNode
or cudaGraphAddHostNode
to call the cuBLAS routines.
How about capturing the cublas graph, then add it to your graph using cudaGraphAddChildGraphNode
?
Not sure if things have changed with the latest gpu version, but stream capture seems to have some issues when run on a thread (where other threads are also accessing the GPU). If there was a way serialize and load a graph saved on a stream capture, the cudaGraphAddChildNode function would probably be sufficient.
Seems like this is the only way to add cuBLAS routines into a CUDA Graph.
Hello @vivek.krishnan ,
Can you please mention the issues that you are referring to? It looks like I have some issues (i.e. getting wrong answers) with stream capture when using managed memory in cublas routines.