Hey guys, I 'm building a CUDA C program with GeForce GTS 450 card, and I have this problem when I allocate shared memory in a kernel function.
the kernel function looks like this:
global void PSO_next_particle_position(int *d_a, int *d_b, int *d_c)
{
// code compute the thread ID
…
// allocate shared memory
__shared__ float p[4800];
__shared__ int v[1600];
__shared__ int count[256];
// computing
}
and I got the error:
1>ptxas error : Entry function ‘Z26PSO_next_particle_positionPfS_S_S_S_PiS0_S0_S0’ uses too much shared data (0x6824 bytes + 0x10 bytes system, 0x4000 max
GTS 450 is said to have Total amount of 48KB shared memory per block, but why here it has the limit as 16KB?
Have anyone encountered any problem like this before?
Any advice will be appreciated. Thanks in advance.
CoolLife2011