is it possible to execute this during the kernel execution?
int foo[2];
for( int i=0;i<2;++i)
{
foo[i] = i;
}
This does not work… WTF ?
I have made a lot of tests, and any access to foo[i] where i can dynamically change doesn’t work.
Does it come from a strange optimization of the compiler ? Or is it in the documentation that it is not possible (but in that case, how to program anything !!)
Thank you for your help.
– Pium, desperate to lose so mush time on problem like this…
When you access foo[0] or with any other compile time static index, the compiler will map the elements of foo to registers.
If you access foo[i] where i is dynamic, it will be mapped into local memory (registers are not indexable). It will be significantly slower, but should still work.
To get additional help, you will need to post a minimal case that we can compile and run that demonstrates your problem. Saying that it just “doesn’t work” doesn’t tell us anything.
I am just having the same problem, so I’d like to know if you could finally solve it.
I want each thread to build a vector, then to make the 2-norm of the vector and to find which of the vectors have the minimal norm. Problem is that I can not build the vector in the normal way:
float ss[m];
And I don’t want it to be shared (each thread has a different vector). So, how could I do it?, any idea?.
If you need more information, I can give you.
I can not use it because the size of the vector would be dinamically and much larger than 3.
I guess I could allocate it on global memory with size (m*numberofthreads) where m is the size of each vector, and then making each thread to operate only with the i-th column of the Matrix, but I think it would be a waste of memory.
Sorry for my bad english and I hope somebody can help me. Thanks
But to find the vector with the minimal norm the vectors will have to be shared. How else would the threads communicate to agree on a minimal result? If each thread only knows its vector then it cannot compare its norm against the neighbors.
The algorithm for finding the minimum is a parallel reduction and is described in the SDK and
Yes, each thread make the 2-norm of his vector and then store the result on a shared vector. Then I use the parallel reduction to find the minimum.
I guess there is no way to store each vector on each thread memory so I will build the matrix. But I have another doubt then, if it’s shared, would it be posible to modify the matrix by threads at the same time if each thread alter only a diferent column of the matrix?.