CUDA 3.1 hangs in 32-bit mode on 64-bit Ubuntu 9.1

I’m having lots of trouble trying to get my code compiled in 32-bit using 64-bit Ubuntu 9.1, running on a GTX 480. The motivation here is to save register space by not having to store 64-bit pointers, which eat into the 64 register limit of Fermi.

I am appending “-m32” to g++ and to nvcc, I’m linking against the 32-bit cuda libraries, and the binary is successfully built. However, when I run said binary, it simply hangs when it starts to execute a kernel.

Am I missing something here? The code builds and run absolutely fine in 64-bit mode.