How to solve a tridiagonal matrix using the cusparse<t>gtsv2_nopivot() functions in the cusparse library

I haven’t studied your code carefully, but it seems fairly evident that you are unaware of the distinction between host and device memory when using a GPU. The cusparse function calls generally require the data to be already deposited in device memory, which you can learn by reading the documentation. Furthermore, when starting with host data, its not sufficient simply to use device allocations. There is usually some copying of data from host to device, and copying of results from device to host, in CUDA, and with typical CUDA library usage, including cusparse.

In Fortran/cudafor, such a device allocation would be evidenced by the device attribute on an allocation. That attribute doesn’t appear anywhere in your code and so all your data is in host memory. That won’t work, and is not how cusparse is intended to be used. You can find other examples of using CUDA library routines (in Fortran) here on these forums, that include proper use of the device attribute for the necessary allocations, here is one example. (Yes, I am aware that that is not an example of cusparse gtsv2.) I don’t have an example of the use of gtsv2 in CUDA fortran to point you to.