cudaPointerGetAttributes returns cudaErrorInvalidValue for host-pinned mem on Win 32-bit build

in my code, I check that a pointer points to a host-pinned memory using the cudaPointerGetAttributes() API function. On Linux/Win64 this works just fine (it returns cudaSuccess and cudaMemoryTypeHost as attributes.type) but the 32-bit build on Windows (cross-compiled x86_32 on x86_64) always returns 1 => cudaErrorInvalidValue.

My PC: Windows10 19.09 18363.1256, GeForce RTX3070, VisualStudio2019, CUDA 11.0.3, driver version 460.89
I also tried other machines with multiple CUDA versions: 11.0, 10.2, 9.2, multiple VS versions: VS2013, VS2015, VS2019, multiple GPUs: GeForce RTX3070, Quadro RTX4000, Titan XP (Pascal) and multiple driver versions: 460.89, 432.00

Then I thought maybe I’m using it somehow incorrectly so I modified the official CUDA11.0 sample - 1_Utilities\bandwidthTest by adding debug prints

cudaPointerAttributes attributes;
cudaError_t status = cudaPointerGetAttributes(&attributes, h_idata);
printf("cudaPointerGetAttributes status: %d attributes.type %d\n", status, attributes.type);

after the cudaHostAlloc() function found in the sample but the result was the same - 32-bit Windows build again returned cudaErrorInvalidValue even though the sample finished just fine with the same bandwidth as Win64-bit build (so the host-pinning itself works fine on 32-bit).

Is this a bug or is this an undocumented limitation of Win32 Cuda runtime? Any other ideas? Thanks.

That API call is in a section called Unified Addressing

Those API functions are only expected to function “normally” in a Unified Addressing environment. If you read the preamble of that section, it will explain to you how to determine if you are in such an environment:

Whether or not a device supports unified addressing may be queried by calling cudaGetDeviceProperties() with the device property cudaDeviceProp::unifiedAddressing.

My expectation would be that if you made that call, you would find in your 32-bit environment that unified addressing is not supported. In that case usage of the API function you describe is invalid.

32 bit windows is not a UVA environment. I’m fairly sure what you are reporting is expected behavior. In a 32-bit windows or non-UVA environment, it is not guaranteed that all addressing will be unique. The host and device address spaces may overlap. Given that, it is impossible to introspect a pointer by its value in the general case, and determine whether it belongs to host or device address space.

Many thanks for your explanation. You were right with the cudaDeviceProp::unifiedAddressing - it returns false on my system for 32-bit builds. I’ve changed my code using this as a condition.
Thank you!