cuSOLVER from kernel

Good afternoon at all, I have several problem to solve multiple linear systems from GPU (directly on device) because cuSOLVER is not callable from device.

In order to you can understand I explain my problem.

I have about 1000 points to find for each point the own neighbors after that I must solve several linear system. Since cuSOLVER is not callable from device, can you suggest me other methods?

If I copy the computation on the host I loss the gain of performance.

Thanks at all.

Do whatever preparatory work you need to do in the GPU to set your data up, in a CUDA kernel. In that kernel, organize the data you wish to be processed by cusolver in GPU memory.

Then end that kernel.

Then call cusolver from the host code, on data that is already resident in GPU memory. There may be some batched operations that can help, if you are doing multiple operations (e.g. solving one system per thread, for example).

After the cusolver calls from host code, launch another cuda kernel to do whatever work you need to do on the cusolver results.

Thank you for the answers.

I ask you another question. That you said about data already resident in GPU Memory, how each thread knows relative data. If I call a cusolver routine from host, automatically CUDA recognizes each data related thread?

No, cusolver doesn’t recognize automatically the data corresponding to each thread. You as a programmer need to identify where the data is to cusolver, just as if you were using cusolver normally, because you are using cusolver normally.

Thanks.