How do multicore access shared memory at same time?

As far as I know, the shared memory has 8 banks.

If two stream processors assess the same bank, it seems that the speed of accssing memory is slow.

If two stream processors assess the different banks, it seems that the speed of accssing memory is fast.

My question is what communcation architecture is between processor and shared memory?? for example, is it ring, or bus?

Thanks for any responses.

I dont know.

But I can give you some information publicly available that might help you guess it. (NVIDIA will never give you the actual internal info… I would presume).


Following information could be wrong or even stupid. Reader discretion advised.

  1. SM is divided into banks

  2. Each CPU can access every bank (looks like a mesh in my mind)

  3. When all CPUs access same data (hence same bank), the data is broadcasted to all CPUs (Does broadcast ring a bell to you?)

  4. When multiple CPUs access diff data in the same bank, they are served one after another.

    (i) So, there must be a queue and a controller. Now read point number 2. CPUs are not directly connected to these memories but rather to the

    controller guarding access to each bank.

    (ii) This could also mean that the SM bank is actually not multiported. Only one data can be read at any given point of time.

    (iii) So, there is no point in having a separate channel of communication between the SM bank and each CPU. This means that all CPUs could share

        1 common bus with 1 bank. This will inturn facilitate BROADCASTING.

    (iv) However if the SM bank had been multi-ported, more data would be available at same time and it would make sense only if the SM bank has a private

     communication channel with each CPU in order to facilitate paralle communication.
     however this would necessitate the SM bank is 8-way multi-ported -- which to my limited knowledge sounds crazy.
    So, I would rule out this private communication channel with each CPU and assume that they are connected via a shared bus -- one per bank.
    Thus each CPU particpates in a shared bus per SM bank.

Now it is an open question whether it is bus or a ring… I would presume it is a bus.