Basic Performance Ques.. ? from a non-CS .. noob

I have doubt, which has always been back of my mind…

THIS MAY REALLY SEEM A NOOBISH QUESTION to some of you :"> , but am not a CS guy hence I don’t understand how this all actually works to the core so I want to learn…

if we have situation a:

shared double a;

  b is some global double array variable;

 // then we do read in a from b

  a[tid] = b[indx]; // assume access coalesced

 //compute (no sync between threads required)

 a[tid] = a[tid]*2.0;

//read out to b

 b[indx] = a[tid];

and we have Situation b:

b is some global double array variable;

 // assume access from device memory as coalesced

 b[indx] =b[indx]*2.0;

say if we launch say 32768 threads with 128 threads per block…

would situation A be faster than B ? If so then why ? … because in both case we have one read and write from global memory, and in situation A we have small overhead of copying from and to b to a also…

does doing 1 flop require multiple reads from the memory (it should be just be one read from intuition) ?

or is it because multiplying a global variable with some constant/variable is slower than mutliplying with a shared variable ?

I am not sure what is the exact answer to the above question…



IMHO both situations are comparable in terms of speed. Note that, even in the second situation, the variable gets loaded from DDR memory into a register on chip. Then it gets multiplied by 2 using the register, and afterwards it stores the value of the register back in global memory.
So the only difference between situation A and B is that you’re using shared memory in A and a register in B. And because accessing shared memory is as fast as accessing a register ( assuming no bank conflicts), then the performance will be approximately the same but with a small overhead for situation A.


Dint knew this !!

Thanks… for the input :) …


Shared memory should be used like a cache for staging data.

And, cache thrives on “Locality of Reference” i.e. same data being accessed again and again in a small piece of code.