Hello Forum
Im new in cuda, and I have a question.
I have seen many examples where the kernel use an array a, an array b and an array C for the results.
C[tid] = a[tid]*b[tid]
Until here, all is ok.
But can I do it this kernel and get a high performance
__global__ void kernel(int a, int b, int *c)
{
int tid = ....;
c[tid] = a * b + some random value;
}
When I use ‘a’ and ‘b’ as pointer, every ‘tid’ has its own position in memory and will not be shared for the other threads, but if I have ‘a’ and ‘b’ as integer or instead of integer I have objects, every thread must to access the same memory address.
Can this make my app go more slow?
I say this, because if a use an array with N elements, and every position has the same value, then I prefer use one value, and not an array with x same elements .
So, is better use an array with all elements equals, or can I use only one value for all threads?
PD: I cannot copy this object to constant memory because some times the used var is too big for this kind of memory.
Thanks.