cudaMemcpy and segmentation fault

Hi everybody,

I implemented a very simple eigensolver in CUDA C using the cusolver library and although it works like a charm it has an issue I have not been able to solve. After computing the eigenpairs via cusolverDnDsyevd with the option jobz = CUSOLVER_EIG_MODE_VECTOR, I copy the eigenvalue vector and eigenvector matrix from device to host by

cudaStat1 = cudaMemcpy(h_eigvals, d_eigvals, sizeof(double) * nRows, cudaMemcpyDeviceToHost);
cudaStat2 = cudaMemcpy(h_eigvecs, d_matrix, sizeof(double) * nRows * nCols, cudaMemcpyDeviceToHost);
cudaStat3 = cudaMemcpy(&info_gpu, devInfo, sizeof(int), cudaMemcpyDeviceToHost);
assert(cudaSuccess == cudaStat1);
assert(cudaSuccess == cudaStat2);
assert(cudaSuccess == cudaStat3);

where h_eigvals is the host eigenvalue vector, d_eigvals is the device eigenvalue vector, h_eigvecs is the host eigenvector matrix, and d_matrix is the device initial matrix, i.e., the matrix whose eigenpairs we are computing. As far as I know cusolverDnDsyevd rewrites the original matrix with the eigenvectors. The three assert confirm that the memory copy has been successfull. Nevertheless, when I compile and run the code I get a Segmentation fault (core dumped) error when trying to print the eigenvector matrix. I have not been able to debug this error.

The previously mentioned arrays are initialized by

double **h_matrix;
h_matrix = (double **)malloc(nRows * sizeof(double *));
for (int i = 0; i < nRows; i++) {
        h_matrix[i] = (double *)malloc(nCols * sizeof(double));
}
initialData(h_matrix, nRows, nCols);

double *h_eigvals = (double *)malloc(nRows * sizeof(double));
for (int i = 0; i < nRows; i++) {
        h_eigvals[i] = 0;
}

double **h_eigvecs;
h_eigvecs = (double **)malloc(nRows * sizeof(double *));
for (int i = 0; i < nRows; i++) {
        h_eigvecs[i] = (double *)malloc(nCols * sizeof(double));
}
for (int i = 0; i < nRows; i++) {
        for (int j = 0; j < nCols; j++) {
                h_eigvecs[i][j] = 0;
        }
}

Thank you very much in advance.

This is generally a bad idea with cuda:

double **h_eigvecs;
h_eigvecs = (double **)malloc(nRows * sizeof(double *));
for (int i = 0; i < nRows; i++) {
        h_eigvecs[i] = (double *)malloc(nCols * sizeof(double));
}

There is no guarantee that the individual rows allocated this way are contiguous. Without a loop, there is no way to copy data to/from that.

After that, when you then do this:

cudaStat2 = cudaMemcpy(h_eigvecs, d_matrix, sizeof(double) * nRows * nCols, cudaMemcpyDeviceToHost;

all manner of trouble may ensue. Since the cudaMemcpy function expects a single pointer (*) in the first position, and you are passing a double-pointer (**) it should be evident from first principles that this is probably not correct.

If you want to learn about various methods for “2D array” handling in CUDA, there are many resources and answered questions on the topic. Here is one summary on SO that describes various methods:

https://stackoverflow.com/questions/45643682/cuda-using-2d-and-3d-arrays/45644824#45644824

It is probably worth noting that this issue at hand is not specific to CUDA, but rather is a generic issue on how to copy data structures in C++. The operative phrase you may want to look up is “deep copy”.

If I can add a few more examples of the array flattening technique:

https://stackoverflow.com/questions/7322810/3d-array-representation-cuda
https://stackoverflow.com/questions/15799086/cuda-how-to-copy-a-3d-array-from-host-to-device
https://stackoverflow.com/questions/5631115/2d-array-on-cuda