Hi,
I wrote a simple program to add 2 array elements and put the result into another array.I am trying to use shared memory to do the calculation faster.But i am getting this error.
sample.kernal.cuh: In function ‘void _Z6AddGPUPiS_S_i(int*, int*, int*, int)’:
sample.kernal.cuh:21: error: ‘__vla_alloc’ was not declared in this scope
sample.kernal.cuh:43: error: ‘__eh_curr_region’ was not declared in this scope
sample.kernal.cuh:43: error: ‘__vla_dealloc’ was not declared in this scope
My kernel function is like this
[codebox]
global void AddGPU(
int *d_ainp,
int *d_binp,
int *d_Cadd,
const int ARY_N
)
{
//Thread index
const int tid = blockDim.x * blockIdx.x + threadIdx.x;
//Total number of threads in execution grid
const int THREAD_N = blockDim.x * gridDim.x;
//Shared memory for the matrix of A
shared int As[ARY_N];
// Shared memory for the matrix B
shared int Bs[ARY_N];
// // Shared memory for the matrix C
shared int Cs[ARY_N];
// Load the matrices from global memory to shared memory;
for(int i=tid;i<ARY_N;i+=THREAD_N)
{
As[i] = d_ainp[i];
Bs[i] = d_binp[i];
}
for(int ar = tid; ar< ARY_N; ar+= THREAD_N)
{
Cs[ar]= As[ar]+ Bs[ar];
}
for (int k = tid; k < ARY_N ; k+=THREAD_N)
{
d_Cadd[k]=Cs[k];
}
}
[/codebox]
Here ARY_N is constant .Here ARY_N is 100000.
Can anyone help me, How can i use shared memory .And How can i copy elements from global memory to shared memory and do the calculation on the shared memory and copy the result into global memory.
Thank You in advance.
Kirti