PTXAS on 32-bit causes ptxas Memory allocation failure PTXAS on 32-bit causes ptxas fatal : Memory a

Has anyone come across the problem for the CUDA Toolkit 3.0 PTXAS on Windows/Linux 32-bit where a large kernel compilation causes
ptxas fatal : Memory allocation failure (Error 255)

The kernel (which is quite large) compiles successfully for compute capability 1.0, 1.3 and 2.0 on 64-bit operating systems, but for some reason the 32-bit compilation uses an excessive amount of RAM and then fails during the compilation of the compute capability 2.0…

The RAM usage is approximately 900MB for the 1.0 and 1.3 but exceeds 2GB for the 2.0 before crashing on the 32-bit.

The 64-bit with 2.0 reaches a maximum of 1.3GB for the same file.

Unfortunately no example can be provided.