Fermi (2.0) cuda device on 64-bit Linux with 32bit device code

FROM THE NVIDIA “Tuning CUDA Applications for Fermi” MANUAL:
“32-bit versus 64-bit Device Code
If you build your application in 64-bit mode (either by passing -m64 to nvcc or by specifying neither –m64 nor –m32 when compiling on a 64-bit machine), e.g., to gain access
to more than 4GB of system memory, be aware that nvcc will compile both the host code and the device code in 64-bit mode for devices of compute capability 2.0. While this works,
the larger pointers in the device code incur a performance penalty for the device (because of the extra space those pointers occupy in the register file, among other reasons). If you are not targeting GPUs with large amounts of video memory that can take advantage of a 64-bit address space, then this performance penalty is unnecessary. To avoid it, you should
separate out the compilation of your host code from your device code and compile the device code in 32-bit mode.”

How do the above? How about a simple example?
My experience is the above does not work.
The cu files are included as if they were headers. The objects appear to be intermediate in nature. How can they be built using -m32 and then kept and later linked to the CPU code?

Thank you for your help

Build the cuda files with [font=“Courier New”]-Xopencc=-m32[/font] to set their width separately from the host files. Remember that you now have to be very careful as variable sizes will differ between host and GPU.

I also vaguely remember that Tim Murray once posted on the forum that this will not work anymore in a future release (which might well be 3.2 already).

Thanks for the reply. For now I’ll just run with 64 bit pointers. I was able to generate object files and link them statically but the program would not run. I also don’t want to do something that will not be supported in the future. Nonetheless, any other comments on this are appreciated. We need all the performance we can get…

A i came across the same issue for testing purpose. I simply build on my 64bits ubuntu two codes: one fully 64bits (on CPU an GPU side) and one fully 32bits. Obviously for building your 32bits code on a 64bits machine you need to install the 32bits compatibility packages you need.