Cusparse + cuda graph

Hi,

I am trying to use cuda graph to capture my baseline implementation calling cusparse like below


CHECK_ERROR(cudaStreamCreate(&stream));
CHECK_ERROR(cudaStreamBeginCapture(stream, cudaStreamCaptureModeGlobal));
for (int i = 0; i <. 10000; i++) {
       CHECK_CUSPARSE( cusparseCreate(&handle) );

       CHECK_CUSPARSE( cusparseCreateCsr(&matA, A_num_rows, A_num_cols, A_nnz,
                                      A_rowoff, A_colidx, A_val,
                                      CUSPARSE_INDEX_32I, CUSPARSE_INDEX_32I,
                                      CUSPARSE_INDEX_BASE_ZERO, CUDA_R_32F) );

       .... 


       CHECK_CUSPARSE( cusparseSpMV(handle, CUSPARSE_OPERATION_NON_TRANSPOSE,
                                 &alpha, matA, vecX, &beta, vecY, CUDA_R_32F,
                                 CUSPARSE_SPMV_ALG_DEFAULT, dBuffer) );


}
CHECK_ERROR( cudaStreamEndCapture(stream, &graph) );

I found that if I put the cusparse create inside the loop, the cugraph will pose an error

Capturing CUDA kernel...
 ** On entry to cusparseCreate(): CUDA context cannot be initialized

Cannot I put the cusparse create() inside the loop if I want to use cuda graph?

If I can, how should I modified it?

You could try to create the handle outside of the loop.

Note, however, that graph capture generally cannot capture host-side work, for example cpu code executed by cusparse.
And I believe gpu work on the default stream also cannot be captured (you are not using the stream inside the loop).

  1. Can I use the handle create inside the loop while using cuda graph outside the loop to capture the loop of cusparseSpMV invocation flow?
  2. How to pass the stream into cusparseSpMV API?
    CUDALibrarySamples/cuSPARSE/graph_capture/graph_capture_example.c at master · NVIDIA/CUDALibrarySamples · GitHub
    I am following the code in this example, it also does not pass created stream to the cusparseSpMV API inside the loop?

I do not know. I have not used cusparse before

In general, when encountering problems with cuSPARSE, please set environment variable CUSPARSE_LOG_LEVEL=5 to log more information.

How to pass the stream into cusparseSpMV API?

This is done with cusparseSetStream(handle, stream). The stream is set on the handle and then used for all cuSPARSE operations after that, via the handle. This is on line 149 in the code you linked to.

Can I use the handle create inside the loop while using cuda graph outside the loop to capture the loop of cusparseSpMV invocation flow?

The design intention is that cusparseCreate() is called once (or once per thread) at application startup and used for the duration of the application. Calling it repeatedly, in a loop, is discouraged. The supported workflow is shown in the example code: create the handle, set the stream, prep all of your data/buffers, then start graph capture.

Could you share a bit more about why you want to call cusparseCreate() repeatedly, while graph capture is active?