hello, i’m try to learn CUDA, and i learning tutorilas and guides. In the NVIDIA Programing Guide, this example is writing.
global void matAdd(float A[N][N], float B[N][N],
int i = threadIdx.x;
int j = threadIdx.y;
C[i][j] = A[i][j] + B[i][j];
// Kernel invocation
dim3 dimBlock(N, N);
matAdd<<<1, dimBlock>>>(A, B, C);
my question is, how should i do , to allocate memory fof A, B, C . Should i use cudaMallocPitch, or cudaMallocArray maybe?.
Can somebody help me, please?
The general advise here is to use 1D arrays and do you own indexing as long as you don’t use textures.
cudaMallocArray is for use with textures.
cudaMallocPitch can be useful to ensure coalescing.
I would advise you to read the programming guide, there is a lot of information in there, especially with regards to coalescing.