Hello,
I am encountering a problem where the COO to CSR sparse matrix conversion fails. I get an error saying ‘kernel launch suspended’. I don’t see how this can fail. I believe I have no out of bound access. Can someone help me understand this? Thank you!
Here is my code:
#define NNZ 6609553
#define IVEC_SIZE 16384
#define DATA_SIZE 43200
// load system indices
int *sys_rows, *sys_cols;
int* sys_rows_h = new int[NNZ];
int* sys_cols_h = new int[NNZ];
fp = fopen(SYSMATX_ROWSCOLS,"rb"); // load indices from a file.
fread(sys_rows_h,sizeof(int),NNZ,fp);
fread(sys_cols_h,sizeof(int),NNZ,fp);
fclose(fp);
cudaMalloc<int> (&sys_cols, NNZ);
cudaMemcpy(sys_cols,sys_cols_h,sizeof(int)*NNZ,cudaMemcpyHostToDevice);
cudaMalloc<int> (&sys_rows, NNZ);
cudaMemcpy(sys_rows,sys_rows_h,sizeof(int)*NNZ,cudaMemcpyHostToDevice);
/* initialize cusparse library */
cusparseStatus_t status;
cusparseHandle_t handle=0;
cusparseMatDescr_t descr=0;
status= cusparseCreate(&handle);
if (status != CUSPARSE_STATUS_SUCCESS) { return 1;}
// convert to Compressed Sparse Row (CSR) format
int* sys_row_ptr;
cudaMalloc<int> (&sys_row_ptr, DATA_SIZE + 1);
status = cusparseXcoo2csr(handle, sys_rows, NNZ, DATA_SIZE, sys_row_ptr, CUSPARSE_INDEX_BASE_ONE);
UPDATE:
I fixed the problem. It seems the templated CUDAMALLOC calls were not working properly. When I changed all the CUDAMALLOC calls from C++ to C (for instance, cudaMalloc (&sys_rows, NNZ) to cudaMalloc (&sys_rows, sizeof(int)*NNZ) ), the program works.
Although I fixed the problem, I’d like to use the templated functions in the future. How can I do that?