issues with GTX580 3.0GB card kernel function is not executed with only GTX580 3.0GB card

Hi guys,

I have faced a problem when using GeForce GTX 580 3.0GB card.

[environment]
CPU: Intel Corei7 870
memory: 4GB
OS: Windows7 32bit
GPU: ELSA GeForce GTX580 (1.5GB & 3.0GB)
DevTool: Microsoft Visual Studio 2008 SP2, nVIDIA Parallel Nsight 1.51
CUDA SDK: CUDA SDK 3.2.16, CUDA Toolkit 3.2.16
CUDA runtime: cudart32_32_16.dll(Version 3.2.16)

[problem]
When the Kernel function is called with a 3.0G card, it is not executed as ‘cudaErrorMemoryAllocation’. In case that data size or thread size executed at the same time is changed, the problem is not solved. However, the operation do not have such a problem with a 1.5GB [font=“Verdana”]card.
[/font]
[kernel function]
In the code where the problem occurs, around 7000KB of local memories is used in the kernel function.
In case that this memory area is referred in the kernel function after it is allocated on the device memory beforehand, the operation is normally executed with a 3.0GB card.

[confirmation items]
1)The difference between 1.5GB and 3.0GB card seems to be only the amount of memory. Why are the operation different?
2)In the method of allocating local memory in the kernel function, is there any measure to avoid ‘cudaErrorMemoryAllocation’ with a 3.0GB card?

Thank you for your help,
qazokm1146