Recreating cuDSS matrix causes access violation reading location error

I have a situation where I need to recreate a sparse matrix in cuDSS, which may not have same number of non-zeros as previous. I have done this by:

  1. Call cudssMatrixDestroy() on the matrix
  2. Call cudaFree() on the values, row pointers and column indices
  3. Call cudaMalloc() on the values, row pointers and column indices
  4. Call cudaMemcpy() on the values, row pointers and column indices
  5. Call cudssMatrixCreateCsr() on the matrix
  6. Call cudssExecute()
    In cuDSS version 0.1, this approach worked fine. However, since upgrading to version 0.2, I now get an access violation reading location error on step 6 (on the call with CUDSS_PHASE_ANALYSIS). Initially I though this was due to recreating the matrix incorrectly, but I’ve done some testing and found that anytime cudssExecute() is called twice on the same matrices with the same phase, I get an access violation reading location error (I’ve tested and this does not occur on version 0.1 of cuDSS). Any idea why this is happening, and any ideas of any workarounds? I’ve attached two simple examples. The first example shows the error with recreating the matrix and the second examples shows the error without recreating the matrix.
    Example1.txt (4.0 KB)
    Example2.txt (2.5 KB)

Hello there!

First, thanks for reporting the unexpected behavior!
There are multiple things at play here.

  1. As we tried to convey in the documentation, the recommendation is to treat cudssData_t object as a “per problem” thing, so if you change the matrix structure, you better call cudssDataDestroy() on the previously created one and do cudssDataCreate() for a new system.
    The reason for this is that cudssData_t contains linear system specific buffers which might be not working when the matrix structure changes.
    The exception to this recommendation currently is the case when only the values change (maybe later there might be more sophisticated features with changes in a sub-pattern or sub-matrix allowed)
    So, per this comment, if you add
    cudssDataDestroy(cudssHandle, solverData);
    cudssDataCreate(cudssHandle, &solverData);
    between cudssExecute() for different systems, it should have worked.
  2. But, my local experiments showed that a problem can occur even if the matrix remains logically the same.
  3. As it turned out, with cudss 0.2.0, there were a couple of bugs in the implementation (related to the changes which enabled support for user-defined device memory handlers) which prevented the modified code to work. Unfortunately, the only workaround is to create a separate cudssHandle_t for each problem to be solved [unless only the values change and REFACTORIZATION phase is used instead of a full new analysis-factorization-solve sequence).
  4. (just a useful tip) Some extra information about the failures can be obtained by using the logging feature of cudss. It is as simple as setting environment variable CUDSS_LOG_LEVEL=5 before running the application which uses cudss.

The bugs, which I have mentioned, will be fixed in the next version of cudss [there will be an option to get an engineering build to try before next release if you are interested].

Let me know if this is helpful enough.

Thanks,
Kirill

Hi Kirill, thanks for the comprehensive response. As you noticed in my other question I was having issues with running this code with multiple streams and host threads. Fortunately, after adding cudssDataDestroy() and cudssDataDestroy() as you suggested, I’m no longer getting memory errors when running with multiple host threads, so that must have been the cause of the error.

Thanks, Ben.