globla memory performance

Hello Forum

Im new in cuda, and I have a question.

I have seen many examples where the kernel use an array a, an array b and an array C for the results.

C[tid] = a[tid]*b[tid]

Until here, all is ok.

But can I do it this kernel and get a high performance

__global__ void kernel(int a, int b, int *c)
    int tid = ....;

    c[tid] = a * b + some random value;

When I use ‘a’ and ‘b’ as pointer, every ‘tid’ has its own position in memory and will not be shared for the other threads, but if I have ‘a’ and ‘b’ as integer or instead of integer I have objects, every thread must to access the same memory address.

Can this make my app go more slow?

I say this, because if a use an array with N elements, and every position has the same value, then I prefer use one value, and not an array with x same elements .

So, is better use an array with all elements equals, or can I use only one value for all threads?

PD: I cannot copy this object to constant memory because some times the used var is too big for this kind of memory.


If you’re asking whether or not you can use scalar variables in your kernel, yes you can. And no it won’t make your code run slower, quite the opposite.

Unless I’m misunderstanding, this might be one of those scenarios where it would probably be easier for you to just write that kernel and try it out for yourself.