Hi,
I am trying to use cusparse csrmv routine with CUSPARSE_OPERATION_TRANSPOSE. The section of the code that produces the error looks as the following:
status = cusparseDcsrmv(this%h_P,CUSPARSE_OPERATION_TRANSPOSE,ntb,K_dim,this%nnz,&
1._wp,this%descr_P,this%P_Val_d,this%P_RowPtr_d,this%P_ColInd_d,this%Fx_d,0._wp,&
this%F_meshPx_di)
if (status /= CUSPARSE_STATUS_SUCCESS) print'(" csrmv x-component Error: ",i)',status
When running under cuda-memcheck (mpirun cuda-memcheck --language fortran ./BDpack), I get many errors of the same type:
========= Invalid __shared__ read of size 8
========= at 0x000008f0 in void csrMvT_hyb_kernel<double, double, double, int=7, int=2, int=8, int=5, int=0>(cusparseCsrMvParams<double, double, double>, int*)
========= by thread (8,1,1) in block (23,1,1)
========= Address 0xfffffe30 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib64/libcuda.so.1 (cuLaunchKernel + 0x2cd) [0x24d9dd]
========= Host Frame:/opt/packages/cuda/9.0/lib64/libcusparse.so.9.0 [0x35cadb]
========= Host Frame:/opt/packages/cuda/9.0/lib64/libcusparse.so.9.0 [0x37a23e]
========= Host Frame:/opt/packages/cuda/9.0/lib64/libcusparse.so.9.0 [0x18d243]
========= Host Frame:/opt/packages/cuda/9.0/lib64/libcusparse.so.9.0 [0x18ed51]
========= Host Frame:/opt/packages/pgi/linux86-64/18.1/lib/libcudafor.so (cusparsedcsrmv_sethpm_ + 0xba) [0x1f881a]
========= Host Frame:./BDpack [0x103f11]
========= Host Frame:./BDpack [0x102ea2]
========= Host Frame:./BDpack [0x107349]
========= Host Frame:./BDpack [0x10a1f9]
========= Host Frame:./BDpack [0xfecbc]
========= Host Frame:./BDpack [0xfcac8]
========= Host Frame:./BDpack [0x8e738]
========= Host Frame:./BDpack [0xa94c]
========= Host Frame:./BDpack [0xa7f4]
========= Host Frame:/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libc.so.6 (__libc_start_main + 0xf5) [0x21c05]
========= Host Frame:./BDpack [0x80c9]
In my tests, the matrix dimension is {ntb, K_dim} = {2000, 125} and the number of non-zero element is 128000. I made sure that there is suffiecint memory before calling csrmv – When using status = cudaMemGetInfo( free, total ), I get:
free 16483483648 total 17071734784
A few side notes:
1- The program produces the correct results and no error if I make csc layout and then use CUSPARSE_OPERATION_NON_TRANSPOSE, but unfortunately, the conversion is very slow for the typical large values of K_dim and I would prefer to try csr layout and use TRANSPOSE.
2- The program works fine for small values of K_dim. In some cases it runs but gives wrong results.
3- I am using pgi-18.1 and cuda 9.0
4- During compilation, I get many infos regarding possible copy in, copy out:
Possible copy in and copy out of p_colptr_d in call to cusparsedcsrmv_sethpm
Possible copy in and copy out of f_meshpx_di in call to cusparsedcsrmv_sethpm
Possible copy in and copy out of fx_d in call to cusparsedcsrmv_sethpm
Possible copy in and copy out of p_rowind_d in call to cusparsedcsrmv_s
I tried to make a MWE, but I couldn’t reproduce the error when calling csrmv in a simple program and on a matrix with similar dimensions. My code is on github, however, and I will try to provide any other information as needed.
I would greatly appreciate your help and advise.
Best,
Amir