This is my code to extract the diagonal from a COO format sparse matrix: __global__ void diagonal_inverse_coo_kernel(int* in_row, int* in_col, double* in_value, double* out_value, int nonzeros, int* out_row, int* out_col) { int idx = blockDim.x * blockIdx.x + threadIdx.x; if (idx < nonzeros) { // …

If compute-sanitizer doesn’t report any issues, then you can disregard it as a debug suggestions. The thing to look for is error repoorts. If you are convinced that there are no threads after the thread numbered with an idx of 640, then you could put a printf in your kernel before the if statement,…

CUDA Kernel doesn't execute all threads, stops after the 640th thread

Accelerated Computing CUDA CUDA Programming and Performance

Robert_Crovella October 29, 2024, 7:44pm 4

maybe nonzeros is 640
maybe your input sparse matrix doesn’t have any nonzero elements on the diagonal after row/col 640.

The code is certainly creating threads after 640, if it is creating any threads at all.

General debug suggestions might be useful. Put a printf statement in your kernel that prints out any time a value of 640 or greater is indicated for diag_idx. Use proper CUDA error checking. Run your code with compute-sanitizer.

cusparseSpSM_solve returning INF value for matrices of 641x641 or larger

Topic		Replies	Views
LARGE 2D arrays CUDA Programming and Performance	10	8562	August 11, 2011
syncthreads problem I guess this is a syncthreads problem CUDA Programming and Performance	9	5130	October 12, 2008
Limitations of a CUDA kernel reached? CUDA Programming and Performance	3	4327	March 7, 2011
Kernel works just for small matrices CUDA Programming and Performance	14	3074	October 19, 2009
Can CUDA do sequential processing? CUDA Programming and Performance	7	6578	August 24, 2011
Simple question on passing to the kernel CUDA Programming and Performance	15	3392	January 15, 2012
Kernel launched in for loop with index offset gives incorrect result? CUDA Programming and Performance	21	29	March 4, 2025
Need Help. CUDA kernel fails randomly CUDA Programming and Performance cuda , kernel	3	507	July 27, 2022
Parallel Anti diagonal 'for' loop ? CUDA Programming and Performance	9	3628	July 4, 2008
Cuda kernel is not working and tried to detect errors using gpuAsset() but, no error message CUDA Programming and Performance	14	2863	December 31, 2017

CUDA Kernel doesn't execute all threads, stops after the 640th thread

Related topics