Hi,
I have 3 shared memory array in my kernel function as follows.
extern shared int As; //Shared memory for array A
extern shared int Bs; // Shared memory for array B
extern shared int Cs; //Shared memory for array C
I am having 3 global memory array,each having size 100000.I need to copy
all the elments of A to AS and B to BS.And i want to do the calculation in shared memory to reduce the access time.So i wrote a code as follows.
[codebox]
global void AddGPU(
int *d_ainp,
int *d_binp,
int *d_Cadd,
int ARY_N
)
{
const int tid = blockDim.x * blockIdx.x + threadIdx.x;//Thread index
const int THREAD_N = blockDim.x * gridDim.x; //Total number of threads in execution grid
extern shared int As;//Shared memory for array A
extern shared int Bs;// Shared memory for array B
extern shared int Cs;//Shared memory for array C
////////////////// Copy the arrayelements from global memory to shared memory;////////////
for(int i=tid;i<ARY_N;i+=THREAD_N)
{
As[i] = d_ainp[i];
Bs[i] = d_binp[i];
}
/////////////////Do the addition of arrays in shared memory and puting the result into shared memory.///
for(int ar = tid; ar< ARY_N; ar+= THREAD_N)
{
Cs[ar]= As[ar]+ Bs[ar];
}
/////////////////Copying the result to global memory./////////////////////////////////
for (int k = tid; k < ARY_N ; k+=THREAD_N)
{
d_Cadd[k]=Cs[k];
}
}
[/codebox]
I am calling the kernel function as follows.
AddGPU<<<<<<Dg, Db,sharedMemSize>>>(
d_ainp,
d_binp,
d_Cadd,
ARY_N
);
Here i want to take the reading with different number of blocks and different number of threads.
I am confused with how to determine the size of shared memory. And how much memory is allocated to each of the 3 arrays.Can anyone help me to make my code perfect.