I was wondering if it is possible to compile my host code to pass 32-bit addresses to my GPU code. nvcc has an option to compile for a 32-bit machine, but that causes all my host code to compile for a 32-bit architecture, instead of simply using a 32-bit address space, which does not solve my problem.
Ultimately I could install a 32-bit OS, but I figured there should be an easier way.
The reason I am interested is because I’m going to end up using several more registers than necessary for my address pointers, while not coming anywhere close to addressing anything close to 4GB of space. Also, although not a major concern, the 64-bit addresses are probably going cost a little more in terms of address arithmetic.
To allow tight integration of host and device code and the portability of data types, CUDA makes the sizes of device data types equal to the corresponding host data types. This affects the sizes of pointers, “long”, and size_t in particular. When you build for a 64-bit platform, all pointers will be 64-bit pointers, on the host and on the device.
The compiler may be able to optimize out some of the 64-bit operations (and free the corresponding registers) in the device code. This mostly helps on sm_1x platforms where the GPU’s device memory is known not to exceed 4 GB. For sm_2x platforms, GPUs with more than 4 GB of device memory exist, and thus there is comparatively little the compiler can do in the way of optimizing the pointer operations.
You are correct that due to the fundamentally 32-bit nature of the architecture the use of 64-bit address arithmetic costs additional registers and additional instructions. In my experience the increased register usage typically has a larger impact on the performance than the increased dynamic instruction count (on sm_2x relatively few applications are strictly bound by instruction throughput), although it will depend on the specifics of the application.
Alright, looks like I will be compiling for a 32-bit architecture for now to keep the less expensive 32-bit addressing.
It would be nice if in a future version we’re allowed some nvcc option to disable the 64-bit unified addressing (I’m sure the to-do list isn’t long enough as it is…)
Also, it would be nice if there were a note in the api reference indicating that the datatype for the cuDevicePtr is either unsigned or unsigned long long depending on what architecture you are compiling for. As it stands, it makes no mention of the 64-bit cuDevicePtr.
Anyway, thank you for your reply.