can I use shared memory of size 3600*sizeof(double) on <<<1,1>>> config?

Hello,

Can I use 3600*sizeof(double) shared memory size inside a kernel which is having only one block and one thread?

so the code is…

global void testKernel()
{
shared real shCoordsY[1800];
shared unsigned short int shNumPtnY[1800];

 ..........

}

testKernel<<<1,1>>>();

I tested the above code on 8400GS card and Quadro CX card using CUDA2.2 Drivers, SDK.

The shared memory used in my code is
sizeof(unsigned short int) * 1800 = 4 * 1800 = 7200 and
sizeof(double) * 1800 = 8 * 1800 = 14k.

so total shared memory is = 7.2k + 14k = 21.2k

but the shared memory is available on 8400GS card is 8k and Quadro CX is 16k

My code exceeds the available shared memory on card, so I thought that it should give the compilation error, but
When I compil the code, its not giving any error and also no crash. running fine.

So is this code acceptable?

but the shared memory is available on 8400GS card is 8k and Quadro CX is 16k
As far as I know the total shared memory for each Multiprocessor is 16K (whatever your device) so you can not use more than 16KB shared memory per block.
Take alook at “CUDA programing guide” for more information.

"shared real … " that would become a float… during compilation.

So your shared memory usage is ~14.2k < 16k .

Hence your kernel runs fine.

Use “-arch=sm_13” option when compiling to enable double support. and decalre shared memory like this

__shared__ double shCoordsY[1800];

And see what happens :)

[quote name=‘nitin.life’ post=‘567975’ date=‘Jul 21 2009, 01:11 PM’]

"shared real … " that would become a float… during compilation.

So your shared memory usage is ~14.2k < 16k .

Hence your kernel runs fine.

Use “-arch=sm_13” option when compiling to enable double support. and decalre shared memory like this

[codebox]__global__ void testKernel()

{

shared double shCoordsX[1800];

shared unsigned short int shPtnX[1800];

}[/codebox]

tested the above on 1.3 architexture, which is running fine. no compilation error and no crash

can you paste a more elaborate version of your host and kernel ?