Hi guys. I have an application that demands solving a lot of linear systems, so naturally I went to a for loop and called many times the cusolverDnSgetrf function. The problem is that, at a random iteration, CUDA just hangs, the screen goes black and all the subsequent calls to cuSolver are ignored. I’ve made the following minimal example to try and prove my point:

```
Eigen::MatrixXf A;
Eigen::read_binary("mymatrix.bin", A);
cusolverDnHandle_t cusolverhandle;
cusolverDnCreate(&cusolverhandle);
cudaStream_t stream1 = 0;
cudaStreamCreateWithFlags(&stream1, cudaStreamNonBlocking);
cusolverDnSetStream(cusolverhandle, stream1);
float * devPtrgcmgtcd = NULL, *d_work = NULL;
cudaMalloc((void**)&devPtrgcmgtcd, sizeof(float)*A.size());
int lwork = 0;
int m = A.rows();
cusolverDnSgetrf_bufferSize(cusolverhandle, m, m, devPtrgcmgtcd, m, &lwork);
cudaDeviceSynchronize();
cudaMalloc((void**)&d_work, sizeof(float)*lwork);
int *devInfo = NULL;
cudaMalloc((void**)&devInfo, sizeof(int));
int *devIpiv = NULL;
cudaMalloc((void**)&devIpiv, m*sizeof(int));
int *hostipv = new int[m];
int *hostinfo = new int;
for (int i = 0; i < 1000000; i++){
cudaMemcpy(devPtrgcmgtcd, A.data(), A.size()*sizeof(float), cudaMemcpyHostToDevice);
cusolverStatus_t stat = cusolverDnSgetrf(cusolverhandle, m, m, devPtrgcmgtcd, m, d_work, devIpiv, devInfo);
cudaError_t error = cudaDeviceSynchronize();
std::cout << i << std::endl;
}
```

Do I need to create the handles for cusolver inside the for loop?

Please, tell me what I am doing wrong.

The results for the first few iterations are good! The real application changes the A matrix for every for loop.

Thanks, any help is appreciated.