segfault when using csrcolor in cusparse

GPU: Tesla K20c
Linux Kernel:Linux tianying 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
NVCC: V7.5.17
GCC: 4.9.2

nvcc 7.5 installation log:
csrcolor[17811]: segfault at 2303ee0200 ip 0000000000402fcd sp

Testing:
m = 9, nnz = 20
rowPtr ={0,2,5,6,8,12,14,15,18,20};
colInd = {1,3,0,2,4,1,0,4,1,3,5,7,4,8,7,4,6,8,5,7};

output:
Segmentation fault (core dumped)

code:

#include <stdio.h>
#include <stdlib.h>
#include "cusparse.h"
#include "cuda.h"

int main(int argc, char *argv[]) {

	int m = 9, nnz = 20;
	int rowPtr[] ={0,2,5,6,8,12,14,15,18,20};
	int colInd[] = {1,3,0,2,4,1,0,4,1,3,5,7,4,8,7,4,6,8,5,7};
	float val[20] = {1.0}; 

	
	int *d_m, *d_nnz, *d_rowPtr, *d_colInd;
	float *d_val;

	cudaMalloc((void **)&d_m, sizeof(int));
	cudaMalloc((void **)&d_nnz, sizeof(int));
	cudaMalloc((void **)&d_rowPtr, (m + 1) * sizeof(int));
	cudaMalloc((void **)&d_colInd, nnz * sizeof(int));
	cudaMalloc((void **)&d_val, nnz * sizeof(int));

	cudaMemcpy(d_m, &m, sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpy(d_nnz, &nnz, sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpy(d_rowPtr, rowPtr, (m + 1) * sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpy(d_colInd, colInd, nnz * sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpy(d_val, val, nnz * sizeof(int), cudaMemcpyHostToDevice);

	int ncolors = 0, coloring[9] = {0}, reordering[9] = {0};
	float fraction = 1.0;
	int *d_ncolors, *d_coloring, *d_reordering;
	float *d_fraction;

	cudaMalloc((void **)&d_ncolors, sizeof(int));
	cudaMalloc((void **)&d_coloring, m * sizeof(int)); 
	cudaMalloc((void **)&d_reordering, m * sizeof(int)); 
	cudaMalloc((void **)&d_fraction, sizeof(float));

	cudaMemcpy(d_fraction, &fraction, sizeof(float), cudaMemcpyHostToDevice);	

	cusparseStatus_t status;

	cusparseHandle_t handle;
	status = cusparseCreate(&handle);
	if (status != CUSPARSE_STATUS_SUCCESS) {
		printf("error!");
		exit(1);
	}

	cusparseMatDescr_t descr;
	status = cusparseCreateMatDescr(&descr);
	if (status != CUSPARSE_STATUS_SUCCESS) {
		printf("error!");
		exit(1);
	}

	cusparseColorInfo_t info;
	status = cusparseCreateColorInfo(&info);
	if (status != CUSPARSE_STATUS_SUCCESS) {
		printf("error!");
		exit(1);
	}	

	status = cusparseScsrcolor(handle, *d_m, *d_nnz, descr, d_val, d_rowPtr, d_colInd, d_fraction, d_ncolors, d_coloring, d_reordering, info);
	switch (status) {
		case CUSPARSE_STATUS_SUCCESS:
			printf("success\n");
			break;
		case CUSPARSE_STATUS_NOT_INITIALIZED:
			printf("not initialed\n");
		case CUSPARSE_STATUS_ALLOC_FAILED:
			printf("alloc failed\n");
			break;
		case CUSPARSE_STATUS_INVALID_VALUE:
			printf("invalid value\n");
			break;
		case CUSPARSE_STATUS_ARCH_MISMATCH:
			printf("mismatch\n");
			break;
		case CUSPARSE_STATUS_INTERNAL_ERROR:
			printf("internal error\n");
			break;
		case CUSPARSE_STATUS_MATRIX_TYPE_NOT_SUPPORTED:
			printf("not supported\n");
			break;
		default:
			printf("unknown error\n");
			break;
	};
	
	
	cudaMemcpy(&ncolors, d_ncolors, sizeof(int), cudaMemcpyDeviceToHost);
	printf("ncolors=%p, &ncolors=%p\n", d_ncolors, &d_ncolors);
	cudaMemcpy(coloring, d_coloring, m * sizeof(int), cudaMemcpyDeviceToHost);
	cudaMemcpy(&reordering, d_reordering, m * sizeof(int), cudaMemcpyDeviceToHost);

	printf("%d colors!\n", ncolors);
	for (int i = 0; i < m; i++)
		printf("%d:%d", i, coloring[i]);
	return 0;
}

The segfault here is due to the fact that you are sending incorrect (device vs. host) pointers to the csrcolor call.

If you change your call like this:

status = cusparseScsrcolor(handle, m, nnz, descr, d_val, d_rowPtr, d_colInd, &fraction, &ncolors, d_coloring, d_reordering, info);

Your segfault will go away.

Here is a fully worked sample code that uses csrcolor:

http://stackoverflow.com/questions/18027278/improving-the-solution-of-sparse-linear-systems

(in the answer by JackOLantern)

Note that in your code you may also want to comment out these two lines:

cudaMemcpy(&ncolors, d_ncolors, sizeof(int), cudaMemcpyDeviceToHost);
printf("ncolors=%p, &ncolors=%p\n", d_ncolors, &d_ncolors);

to get sane results.

Thank you so much. I will try it.

According to https://docs.nvidia.com/cuda/cusparse/index.html the number of colors should be less than the size of the matrix. I assume it means less than or equal to the number of nodes because higher doesn’t make sense. In the given example (after fixing the bug) the result is 15 colors but it shouldn’t be higher than 9. The coloring also doesn’t start at 0 so for a different example I get the coloring 3, 8, 4, 9, 9 which uses 4 colors but ncolors is 10.