Ailos
January 31, 2009, 4:17pm
1
hello, i’m try to learn CUDA, and i learning tutorilas and guides. In the NVIDIA Programing Guide, this example is writing.
[b]
global void matAdd(float A[N][N], float B[N][N],
float C[N][N])
{
int i = threadIdx.x;
int j = threadIdx.y;
C[i][j] = A[i][j] + B[i][j];
}
int main()
{
// Kernel invocation
dim3 dimBlock(N, N);
matAdd<<<1, dimBlock>>>(A, B, C);
}[/b]
my question is, how should i do , to allocate memory fof A, B, C . Should i use cudaMallocPitch, or cudaMallocArray maybe?.
Can somebody help me, please?
Thanks External Image
hello, i’m try to learn CUDA, and i learning tutorilas and guides. In the NVIDIA Programing Guide, this example is writing.
[b]
global void matAdd(float A[N][N], float B[N][N],
float C[N][N])
{
int i = threadIdx.x;
int j = threadIdx.y;
C[i][j] = A[i][j] + B[i][j];
}
int main()
{
// Kernel invocation
dim3 dimBlock(N, N);
matAdd<<<1, dimBlock>>>(A, B, C);
}[/b]
my question is, how should i do , to allocate memory fof A, B, C . Should i use cudaMallocPitch, or cudaMallocArray maybe?.
Can somebody help me, please?
Thanks External Media
The general advise here is to use 1D arrays and do you own indexing as long as you don’t use textures.
cudaMallocArray is for use with textures.
cudaMallocPitch can be useful to ensure coalescing.
I would advise you to read the programming guide, there is a lot of information in there, especially with regards to coalescing.