Calling cuBLAS routine from CUDA Graph Node


I am creating a CUDA Graph manually to represent a sparse solver application. I need to call cublasDgemm() function for some of the operations. Is there a way to call cublasDgemm() from a CUDA Graph node?

I would appreciate if someone could point me towards some example.

@Robert_Crovella , tagging you incase you may help me out.

7_CUDALibraries/conjugateGradientCudaGraphs. Demonstrates conjugate gradient solver on GPU using CUBLAS/CUSPARSE library calls captured and called using CUDA Graph APIs.

Thank you so much for your quick reply, @Robert_Crovella .

In the conjugate-gradient-using-cuda-graphs, the graph is create using stream capture mechanism.

But I am creating my graph MANUALLY using cudaGraphAddKernelNode, cudaGraphAddHostNode, cudaGraphAddMemcpyNode and cudaGraphAddMemsetNode functions.

I am wondering whether I should use cudaGraphAddKernelNode or cudaGraphAddHostNode to call the cuBLAS routines.

How about capturing the cublas graph, then add it to your graph using cudaGraphAddChildGraphNode ?

Not sure if things have changed with the latest gpu version, but stream capture seems to have some issues when run on a thread (where other threads are also accessing the GPU). If there was a way serialize and load a graph saved on a stream capture, the cudaGraphAddChildNode function would probably be sufficient.

Seems like this is the only way to add cuBLAS routines into a CUDA Graph.

Hello @vivek.krishnan ,
Can you please mention the issues that you are referring to? It looks like I have some issues (i.e. getting wrong answers) with stream capture when using managed memory in cublas routines.