How to use shared memory

I read the examples in shared memory. I will apply it on my project. I need you to decide if I am correct or not. This is a portion of my project, I will test using shared memory or not. This is the global memory part. All inputs are used in different parts all over the program.

extern "C"
__global__ void ComputationdClustersOnGPUShuffle(int numTokenSrc,int numWordSrc,int srcLength, char *src,int *srctokensSFIndices,int *srctokensLength,int *srcIndices, int *srcStartIndices,int totalLengthDistinct, char *patternRemoved,int numTokenPattern,int numWordPattern,int patternLength,char *pattern,int *patterntokensSFIndices,int *patterntokensLength,int *patternIndices,int *patternStartIndices,int *dX,int *ResultFinal,float *TokensFinal,float *WordsTokensFinal1,float *WordsTokensFinal2,float *WordsFinal1,float *WordsFinal2,float *WordsFinal)
{
       int ix = blockIdx.x * blockDim.x + threadIdx.x;
       int  min_val = 0,var1 = 0, var2 = 0;
       int Avar, Bvar, Cvar, Dvar;
       float maxleven1 = 0.0f,resultName = 0.0f,sumleven = 0.0f,sumfinal = 0.0f;
       //int diff;
       //sumfinal = 0.0f,resultName = 0.0f,maxleven1 = 0.0f
       if(ix<totalLengthDistinct)
        {
            for (int i = 0; i < srcLength; i++) {
               if (src[i] == ',')
                  dX[ix * srcLength + i] = 0;
               else
                {
                  if (src[i] == patternRemoved[ix])
                dX[ix * srcLength + i] = srcIndices[i];
                  else if (src[i] != patternRemoved[ix])
                dX[ix * srcLength + i] = dX[ix * srcLength +  i-1];
                }
             }
             
        }
        __syncthreads();

}

Shared memory part:


extern "C"
__global__ void ComputationdClustersOnGPUShuffle(int numTokenSrc,int numWordSrc,int srcLength, char *src,int *srctokensSFIndices,int *srctokensLength,int *srcIndices, int *srcStartIndices,int totalLengthDistinct, char *patternRemoved,int numTokenPattern,int numWordPattern,int patternLength,char *pattern,int *patterntokensSFIndices,int *patterntokensLength,int *patternIndices,int *patternStartIndices,int *dX,int *ResultFinal,float *TokensFinal,float *WordsTokensFinal1,float *WordsTokensFinal2,float *WordsFinal1,float *WordsFinal2,float *WordsFinal)
{
       int shared_string_len = 256;      // The same as block size
      __shared__ char patternRemoved _shared [ shared_string_len ] ;

       int ix = blockIdx.x * blockDim.x + threadIdx.x;
       patternRemoved _shared[ix]  = patternRemoved[ix]
       int  min_val = 0,var1 = 0, var2 = 0;
       int Avar, Bvar, Cvar, Dvar;
       float maxleven1 = 0.0f,resultName = 0.0f,sumleven = 0.0f,sumfinal = 0.0f;
       //int diff;
       //sumfinal = 0.0f,resultName = 0.0f,maxleven1 = 0.0f
       if(ix<totalLengthDistinct)
        {
            for (int i = 0; i < srcLength; i++) {
               if (src[i] == ',')
                  dX[ix * srcLength + i] = 0;
               else
                {
                  if (src[i] == patternRemoved _shared [ix])
                dX[ix * srcLength + i] = srcIndices[i];
                  else if (src[i] != patternRemoved _shared [ix])
                dX[ix * srcLength + i] = dX[ix * srcLength +  i-1];
                }
             }
             
        }

        __syncthreads();


}

If I put a correct code, this means that all over the program only the input that have ix input will be substituted by shared memory variable. Shared memory variable is created for every block inside the grid. Will shared memory reduce time than global memory?

when posting code here, please format it correctly. one possible method:

  • edit your post
  • select the code
  • press the </> button at the top of the edit window
  • save the changes

sorry Robert, I have edited the code.

why not measure it?

(FWIW, The code you have posted doesn’t look like it will compile, for example:)

If we fix your code by removing the spaces, then I would say it is unlikely that shared memory will provide much improvement. Each thread is only access “its own” value of shared memory. It does appear to be repeated, so that is where the benefit may come in. But the compiler might very well optimize that access into a register anyway.

The second code is the same as the first one but with shared memory.

I need to define an array of characters in shared memory. For each thread, it stores a character into this array

__shared__ char patternRemoved _shared [ shared_string_len ] ;

inside the code I put:

if (src[i] == patternRemoved _shared [ix])
      dX[ix * srcLength + i] = srcIndices[i];
 else if (src[i] != patternRemoved _shared [ix])
      dX[ix * srcLength + i] = dX[ix * srcLength +  i-1];

It is the first time I use shared memory. Is the definition of array of characters in shared memory is true?

no it is not correct. You have a typo. I already mentioned it. A compiler can tell you that. You don’t need a forum posting to tell you that.

Please Robert, How to define an array of characters using shared memory? sorry , I do not intend to do mistakes.

You have a space here:

That is not correct.

By the way, I don’t consider this to be a valuable use of my time. Please use the compiler and other tools to sort out these things. Speaking for myself, I’m likely to ignore such questions in the future.