Efficient way of doing?

Karguvel_RajanRamachandra · July 14, 2010, 7:13am

Hi Experts,

       I have three set of array values in my card global memory.for Ex
       
          A[] = {2,5,3,5....}
       
          B[] = {1,2,5,4....}

          C[] = {0,0,0,0....}

      In my kernel i have to add value of A & B and assign it to C.

      int p = blockIdx * BLOCK_SIZE + threadIdx; 
      int q = threadIdx;
  Way 1:
         
           C[p] = A[p] + B[p]              

   or

   Way 2

            __shared__ int Asub[BLOCK_SIZE];     
            __shared__ int Bsub[BLOCK_SIZE];

            Asub[q] = A[p];
            Bsub[q] = B[p];
           
            __syncthreads();

            C[p] = Asub[q] + Bsub[q];

Which is faster? Way1 or Way2? :unsure:

I want to know which is most time consuming task? Accessing global memory from kernel or copying global memory to shared memory?

avidday · July 14, 2010, 7:22am

There won’t be any benefit in using shared memory in this case.

Karguvel_RajanRamachandra · July 14, 2010, 7:24am

K. Thanks Mr.avidday

then when will Shared memory usage useful?

copying values to shared memory is more time consuming?

avidday · July 14, 2010, 7:36am

When threads in a block need to read the same value from global memory more than once, or when the read pattern of the threads in a warp/half-warp breaks the coalescing rules for efficient reads from global memory. The latter is only really important on compute capability 1.0/1.1 devices. One newer hardware the coalescing rules are greatly relaxed, and Fermi has useful L1/L2 global memory cache which helps even more.

No. But it isn’t any faster if each block just uses the values in shared memory once.

Karguvel_RajanRamachandra · July 14, 2010, 7:42am

K.Thank you :rolleyes:

Topic		Replies	Views
Correct Use of Shared Memory? CUDA Programming and Performance	1	712	January 6, 2010
Question regarding transfer from global to shared memory CUDA Programming and Performance	5	5961	November 27, 2010
Shared memory doubt CUDA Programming and Performance	5	4595	June 11, 2008
coalesced access to global memory CUDA Programming and Performance	6	1157	May 8, 2014
Data load question CUDA Programming and Performance	3	26	December 18, 2024
Coalescing into shared memory CUDA Programming and Performance	1	1964	December 13, 2008
performance for global and shared memory CUDA Programming and Performance	2	6232	January 15, 2008
No performance inprovement shared mem x global mem CUDA Programming and Performance	5	1158	April 26, 2013
Device memory VS Shared memory CUDA Programming and Performance	4	4109	September 22, 2008
access speed of shared memory and global memory CUDA Programming and Performance	1	1070	August 6, 2009

Efficient way of doing?

Related topics