Multiple memory access

Hi, I have this problem.
I have a vector of 9 nine elements containing the coefficients of a polynomial, and I want to evaluate this polynomial on all points of volume, in prallel. In order to do this I would associate one thread to each volume element. However doing so, each thread have to read all the 9 coefficients. I think that the access would be serialized, what can I do to avoid this? I mean, how threads can share the coefficients value?
It would be a good idea copy the cofficient in a shared memory vector?
Thanks in advance

If all the threads read each coefficient at the same time, the best place for those data is constant memory, because a simple read from constant memory will be broadcast to all the threads. Note that all the threads must read the same element or the constant memory access would not as good as expected. Furthermore, the constant memory is cached. Take a look at

Regards.

Shared memory also broadcasts to multiple threads reading the same address, and with compute capability 2 “multiple words can be broadcast in a single transaction.”

On Fermi everything is cached.

That’s true but for a few coefficients, load data from global to shared memory would keep too much threads idle while reading the data from constant memory, from my point of view, is more efficient.

Said that, as RezaRob3 commented, on Fermi architecture everything is cached, so if you do not reuse the coefficient, read them directly from global memory could be good enough as data will be cached.

Regards.

Hi, thank you so much.

Yes, I know that everything is cached on Fermi architecture, and actually my first solution was used directly global memory, I was wondering if using global memory there could be some syncrhonization problems, which could be avoided using registers or whatever. Currently I’m trying to use the costant memory for my coefficients, since I agree with you pQB, for so few elements, use shared memory would be less efficient.

Thanks.