accessing the i-th element of a static array in the kernel I am becoming crazy

Hi,

is it possible to execute this during the kernel execution?

int foo[2];
for( int i=0;i<2;++i)
{
foo[i] = i;
}

This does not work… WTF ?

I have made a lot of tests, and any access to foo[i] where i can dynamically change doesn’t work.
Does it come from a strange optimization of the compiler ? Or is it in the documentation that it is not possible (but in that case, how to program anything !!)

Thank you for your help.

– Pium, desperate to lose so mush time on problem like this…

the way you have delcared "foo’ – it looks like a local variable. Its not in global memory… What really do u want it to do?

Possible that compiler will optimize it out… Use volatile…

I want to declare this variable in the thread memory
I want one ‘foo’ by thread.
Is it possible ?

I am having a look on volatile.

thanks

How do you intend to see whats there in thread memory? Unless you copy it to global memory and copy it back to host memory - you gonna c nothing.

just adding volatile or register does not work.

I just want to declare the variable into my kernel function. I would like it to be localised in the “local memory” of the thread

That is very strange is if I do

foo[0]=0;
foo[1]=1;

it works without a problem, it does not work only when foo[i] with a dynamic ‘i’
with a static i, it works

int i=0;
foo[i++]=i;
foo[i]=i;

is OK

strange no ?

maybe it is not the thread memory.

I just want to create it in a thread, use it in the thread, and forget it

Like any local variables.

The only difference it is that this one is an array, where is the big deal ?

ps : I try a few hacks like

*(foo+i) = i;

but it does not work with a dynamic ‘i’ (but works with a static ‘i’)

int j;
for(int i=0;i<2;++i)
j=foo[i];

is working.

I don’t understand anything.

so foo[i] works when reading but not when writing…

When you access foo[0] or with any other compile time static index, the compiler will map the elements of foo to registers.

If you access foo[i] where i is dynamic, it will be mapped into local memory (registers are not indexable). It will be significantly slower, but should still work.

To get additional help, you will need to post a minimal case that we can compile and run that demonstrates your problem. Saying that it just “doesn’t work” doesn’t tell us anything.

int j;
for(int i=0;i<2;++i)
j=foo[i];

is working.

I don’t understand anything.

so foo[i] works when reading but not when writing…

Man… You repeat the same thing…

I am trying to give information to help to understand, because I am lost

If you intend to use this code in a kernel (on device).

You have to use thread indexes : threadIdx.x, threadIdx.y, threadIdx.z

Hope it helps.

Vince

Hi there,

I am just having the same problem, so I’d like to know if you could finally solve it.

I want each thread to build a vector, then to make the 2-norm of the vector and to find which of the vectors have the minimal norm. Problem is that I can not build the vector in the normal way:

float ss[m];

And I don’t want it to be shared (each thread has a different vector). So, how could I do it?, any idea?.

If you need more information, I can give you.

I can not use it because the size of the vector would be dinamically and much larger than 3.

I guess I could allocate it on global memory with size (m*numberofthreads) where m is the size of each vector, and then making each thread to operate only with the i-th column of the Matrix, but I think it would be a waste of memory.

Sorry for my bad english and I hope somebody can help me. Thanks

But to find the vector with the minimal norm the vectors will have to be shared. How else would the threads communicate to agree on a minimal result? If each thread only knows its vector then it cannot compare its norm against the neighbors.

The algorithm for finding the minimum is a parallel reduction and is described in the SDK and

accompanying documentation.

Yes, each thread make the 2-norm of his vector and then store the result on a shared vector. Then I use the parallel reduction to find the minimum.

I guess there is no way to store each vector on each thread memory so I will build the matrix. But I have another doubt then, if it’s shared, would it be posible to modify the matrix by threads at the same time if each thread alter only a diferent column of the matrix?.

Thanks for the quick reply.