I’m new in CUDA programming and i’m trying to store each column of a matrix (10000 by 5) in shared memory. I wrote this :
shared int *col;
col[threadIdx.x] = mat_dev[threadIdx.x][blockIdx.x]; Then i realized that the number of lines is bigger than the number of threads per block, So how can i store each column by slice
What you declared is a pointer that lives in shared memory. The pointer could be pointing to anything. Global memory, shared memory, local memory.
In your code snippet you don’t initialize the pointer value, so you’ll just write to a random memory location which results in an unspecified launch failure (segfault on GPU)
Now i’m trying to use a dynamic array, but i still can’t fix the problem of the number of lines and the number of threads per block