"Out of Memory" error, when kernel function is very long

Hello everybody.

My environment is CUDA4.0, 9800GT, vs2008 and XP32.

I have to calculate the value of discrete points in a 3D space.

Because of the many-step calculation, I’ve written about 600-700 lines of codes to calculate the value in the kernel function. (Actually, kernel function itself is not long because I use other functions to encapsulate the calculation process and call them in the kernel to get the final result. So, if the kernel function is expanded, it’s very long).

By the way, the array used in the kernel is about 1MB. So I think the cudamalloc is not the reason for “out of memory” error.

When I compiled the code, the compiler, nvopencc gave me “Out of memory” error information as below :

C:/DOCUME~1/ADMINI~1/LOCALS~1/Temp/tmpxft_000005e8_00000000-9_ITM.cpp3.i(0): Warning: Optimizing huge function _Z13ComputeKernelPf because Olimit has been overridden;

 compiler may run out of memory or run very slowly

C:/DOCUME~1/ADMINI~1/LOCALS~1/Temp/tmpxft_000005e8_00000000-9_ITM.cpp3.i(0): ### Compiler Error (user routine '_Z13ComputeKernelPf') during Code_Expansion phase:

### Out of memory in MEM_POOL_Realloc

nvopencc ERROR: D:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\bin/../open64/lib//be.exe returned non-zero status 1

How to deal with this difficulty???

I would greatly appreciate it if anyone can give me a hint. Thanks a lot!!