store global parameters in shared memory?

Hi :)

I just migrated some code to cuda, and have a big problem: It’s extremely slow!

The reason is probably, that i have many many parameters stored in global memory, which i have to use in every thread.

the cuda programming guide says, that a global memory acces needs about 400-600 cycles, which is ofc not good.

I will give an example:

global void CUDA_CalcBahn(long foo, float bar, float* foobar, float* result)

neither foo, nor bar, nor foobar get changed inside the kernel. “foobar” and “result” have been allocated on the device with cudaMalloc.

where and how is the usage of shared memory practical? can i assign the value of the parameter “foo” or “bar” once in each block to a shared memory variable, and read from it in all other threads of the block?

yours sincerely, confused snowball :)


foo and bar are allready in shared memory as this is how parameters are being passed into kernels.

As for why its slow - you’re probably not reading things in a coalesced way or/and dont share data among threads.

First thing would be to read the relevant sections in the programming guide and then maybe post the kernel so we can look at it and suggest a few things.


Constant memory, Sir! It was invented just for you ;)

thx for your answers :)

the most heavy bottleneck was the call of “cudaMalloc” in a for-loop very often.

the speed is now ~50% of what it should be, and I’ll try to gain a speed up with the suggested methods :)