My post on 64 bit was a bit tongue in cheek, but I really do have a problem with building CUDA apps on 64 bit, thought it was my makefiles but the standard ones do it too. The problem is that all device code is compiled 64 bit by default meaning all devmem pointers are 8 bytes - even those in shared memory that are not accessible from the host! Even if ptxas partially converts back (always assumes top 32bits == 0 to save registers) there is a significant waste of shared memory (cubin definitely has both shared mem and device mem pointer arrays 8 bytes per entry) and also device memory and bandwidth, given it makes no sense to hold host pointers on the device. High price to pay for 64 bit longs implemented as 32 bit long long routines. Is the only way out to turn on -m32 for everything? Then 32 bit versions of all the CUDA libs should be provided as well, along with a switch facility. The 64 bit video driver should not have a problem with a 32bit CUDA app.
OR is there a way to build and sensibly communicate within a mixed environment?
Guess this was a major design decision. Was there some info in the tools release notes (that were left out of the distro)? There is nothing about 64 bit at all in the 0.9 nvcc manual either.