I am currently developping an application under 64-bit Windows 7 environment.
When I generate 64-bit code, cpp files are processed by Visual C++ compiler and .cu files go to nvcc (with -m64 option).
Obviously everything works fine, but in FAQ (and some other sources) we can read that:
“Current GPUs are essentially 32-bit machines, but they do support 64 bit integers (long longs). Operations on these types compile to multiple instruction sequences.”
This suggests the following question:
Q1: Is that true that 64-bit code produced from nvcc will be needlessly slower and will consume more resources (in particular - registers) than if I tried -m32 option, for example - when dealing with pointers?
That is why I tried compiling cpp into 64-bit code, but running nvcc with -m32 option. That lead to an obvious error from the linker:
“fatal error LNK1112: module machine type ‘X86’ conflicts with target machine type ‘x64’”
I would like not to transform my whole application into 32-bit version which is an obvious, yet not satisfactory walkaround.
Q2: Any ideas on how to launch 32-bit kernels from 64-bit host program?
One of my idea is to split the code into 2 parts:
a) Pure device code (global and device functions)
b) Pure host code which invokes the kernels using runtime api.
Code (a) would be compiled into 32-bit code and global function pointers would be made visible to the linker as a 64-bit unsigned integers, with value 0 set as a high part of the pointer.
Code (b) would be compiled into 64-bit code. I can allocate device memory here and prepare for the launch. Upon the launch, all pointer parameters would be demoted into 32-bit pointers. This obviously assumes that the high part of all pointers is always 0.
Q3: Is is the case that all device pointers (data and function pointers) can be expressed as 32-bit integers, even if created by a 64-bit application?
Finally a small question. I need an integer type (“native int”) which size matches the type of code being produced. For 32-bit, it is an int, for 64-bit it is a long long. A simple way is to simply typedef it somehwere, but:
Q4: Is there a predefined macro in nvcc which tells me if currently produced code is 32-bit or 64-bit?