kernel params const?

Suppose my kernel func looks like this:

__global__ mykernel( float param )

{

   const float pi = 3.14;

   float foo = sqrt(param + pi);

}

int main()

{

   mykernel<<>>(param);

}

Then my assumption is that “param” is actually stored in shared memory for the threads. Is that correct?

Secondly, I notice that it is not necessary for each thread to compute “sqrt” because they would all produce the same value. I thought it might be better to move this into shared memory, like so…

__shared__ foo;

__global__ mykernel( float param )

{

}

int main()

{

   const float pi = 3.14;

   foo = sqrt(param + pi);

   mykernel<<>>(param);

}

However, it won’t allow me to assign to the shared variable from outside of the kernel like this. Hmm. I suppose I could accomplish this by sending it as an additional parameter,

__global__ mykernel( float param, float foo )

{

}

int main()

{

   const float pi = 3.14;

   float foo = sqrt(param + pi);

   mykernel<<>>(param, foo);

}

So I’m just wondering what is the most efficient practice

No, param will be stored in register of each threads.

remember that shared memory can be read by all threads in the same block, in your case, bank confilct will occurs.

Concurently, in my oppinion.

your first option it has 2 problems.

the accurate of sprt() function when calculating on GPU.

you waste your register.

your secondly option has 2 problem.

the accurate of sprt() function when calculating on GPU.

bank conflict will occurs.

your thirtly option is better.

Did I misinterpret the programming guide? I also thought it would be stored in shared memory…

However, if all threads read the variable at the same time…wouldn’t the broadcast mechanism take over so there wouldn’t really be a bank conflict?

From the Programming Guide…

I double that… but I am unsure how the broadcast works. I tired using it but was never able to get any advantage from broadcast (maybe there was no broad cast happening) … hence am unsure how broadcast works. But theoretically it should work,

[codebox]global mykernel( float param )

{

const float pi = 3.14;

float foo = sqrt(param + pi);

}

int main()

{

mykernel<<>>(param);

}

Then my assumption is that “param” is actually stored in shared memory for the threads. Is that correct?[/codebox]

in this code, compiler will choice register or local memory for storing this “foo” variable. this kernel is too simple and this variable is not take alot of memory so compiler will use register for storing this variable. You can check in visual profiler if thread use local memory or not.

Shared memory will be used if he defined that “foo” variable in shared memory. so I said that “foo” will strore in register.

[codebox]shared foo;

global mykernel( float param )

{

}

int main()

{

const float pi = 3.14;

foo = sqrt(param + pi);

mykernel<<>>(param);

}[/codebox]

shared float foo;

he wants to store this variable in shared memory. and this variable will be read by each threas on block, so in my experience bank conflicts will occurs.

Bank conflict will occurs if more than 2 threads in the same of a half-warp access the same bank.

Just to clarify…I agree that the local variable foo will be stored in registers/local memory. However, I believe the original question was regarding the function parameter named param. It is my understanding that this parameter will be passed via shared memory.

Thank for remind, I got a mistake. I didn’t read it carefully. but i think the param will stored in register.