I read the examples in shared memory. I will apply it on my project. I need you to decide if I am correct or not. This is a portion of my project, I will test using shared memory or not. This is the global memory part. All inputs are used in different parts all over the program.
extern "C"
__global__ void ComputationdClustersOnGPUShuffle(int numTokenSrc,int numWordSrc,int srcLength, char *src,int *srctokensSFIndices,int *srctokensLength,int *srcIndices, int *srcStartIndices,int totalLengthDistinct, char *patternRemoved,int numTokenPattern,int numWordPattern,int patternLength,char *pattern,int *patterntokensSFIndices,int *patterntokensLength,int *patternIndices,int *patternStartIndices,int *dX,int *ResultFinal,float *TokensFinal,float *WordsTokensFinal1,float *WordsTokensFinal2,float *WordsFinal1,float *WordsFinal2,float *WordsFinal)
{
int ix = blockIdx.x * blockDim.x + threadIdx.x;
int min_val = 0,var1 = 0, var2 = 0;
int Avar, Bvar, Cvar, Dvar;
float maxleven1 = 0.0f,resultName = 0.0f,sumleven = 0.0f,sumfinal = 0.0f;
//int diff;
//sumfinal = 0.0f,resultName = 0.0f,maxleven1 = 0.0f
if(ix<totalLengthDistinct)
{
for (int i = 0; i < srcLength; i++) {
if (src[i] == ',')
dX[ix * srcLength + i] = 0;
else
{
if (src[i] == patternRemoved[ix])
dX[ix * srcLength + i] = srcIndices[i];
else if (src[i] != patternRemoved[ix])
dX[ix * srcLength + i] = dX[ix * srcLength + i-1];
}
}
}
__syncthreads();
}
Shared memory part:
extern "C"
__global__ void ComputationdClustersOnGPUShuffle(int numTokenSrc,int numWordSrc,int srcLength, char *src,int *srctokensSFIndices,int *srctokensLength,int *srcIndices, int *srcStartIndices,int totalLengthDistinct, char *patternRemoved,int numTokenPattern,int numWordPattern,int patternLength,char *pattern,int *patterntokensSFIndices,int *patterntokensLength,int *patternIndices,int *patternStartIndices,int *dX,int *ResultFinal,float *TokensFinal,float *WordsTokensFinal1,float *WordsTokensFinal2,float *WordsFinal1,float *WordsFinal2,float *WordsFinal)
{
int shared_string_len = 256; // The same as block size
__shared__ char patternRemoved _shared [ shared_string_len ] ;
int ix = blockIdx.x * blockDim.x + threadIdx.x;
patternRemoved _shared[ix] = patternRemoved[ix]
int min_val = 0,var1 = 0, var2 = 0;
int Avar, Bvar, Cvar, Dvar;
float maxleven1 = 0.0f,resultName = 0.0f,sumleven = 0.0f,sumfinal = 0.0f;
//int diff;
//sumfinal = 0.0f,resultName = 0.0f,maxleven1 = 0.0f
if(ix<totalLengthDistinct)
{
for (int i = 0; i < srcLength; i++) {
if (src[i] == ',')
dX[ix * srcLength + i] = 0;
else
{
if (src[i] == patternRemoved _shared [ix])
dX[ix * srcLength + i] = srcIndices[i];
else if (src[i] != patternRemoved _shared [ix])
dX[ix * srcLength + i] = dX[ix * srcLength + i-1];
}
}
}
__syncthreads();
}
If I put a correct code, this means that all over the program only the input that have ix input will be substituted by shared memory variable. Shared memory variable is created for every block inside the grid. Will shared memory reduce time than global memory?