Vulkan memory import size truncated on Windows

Hi!
I have posted on Khronos forums about this, but wanted to ask here as well: is the memory size for vkAllocateMemory forcibly truncated to 32 bits on Windows, and if so, why?

We use Vulkan for processing large amounts of data for scientific applications. The data sizes are usually much more than the VRAM size. Usually we work with systems that have 64-128 Gb of RAM and 8-12 Gb of VRAM, and the data volume for processing is usually about 32-64 Gb. Some algorithms require all the data to be accessible to the GPU at the same time. Obviously, such access is done via PCIe bus, which is not very fast, but it is acceptable for our purposes.

To achieve such behavior in CUDA we formerly used pinned memory and everything worked fine. Now we have moved to Vulkan and tried more or less the same approach - allocating aligned memory in RAM by OS functions and then using VK_KHR_external_memory extension to make it accessible to the GPU via the PCIe bus. To access this memory from the compute shader we use uint64_t addresses (provided by VK_KHR_buffer_device_address extension), so the buffer range limitations are not a problem for us.

On Linux this approach actually works fine and we can access any amount of memory up to the total amount of RAM available, although the validation layers give the following message:

vkAllocateMemory(): pAllocateInfo->allocationSize (7516192768) is larger than maxMemoryAllocationSize (4292870144). While this might work locally on your machine, there are many external factors each platform has that is used to determine this limit. You should receive VK_ERROR_OUT_OF_DEVICE_MEMORY from this call, but even if you do not, it is highly advised from all hardware vendors to not ignore this limit.

It is our opinion that, logically, the allocation size limit should not apply in this case (that is, the validation layers should not report anything), since we’ve already allocated the memory, and want to merely map it for access from the GPU.

On Windows we observed a rather strange behavior. A maximum of 4 Gb memory can be imported, and the allocation size seems to be truncated to 32 bits. For example, if 7Gb of memory is allocated and then imported, the vkAllocateMemory function called for memory import returns no error. However, only 3 Gb can then be accessed from the GPU. If 5 Gb of memory is allocated, only 1 Gb can be accessed, etc. That makes us think only the lower 32 bits of the requested buffer size are taken for import size, and the upper 32 bits are ignored, despite the type for allocationSize being uint64_t. It seems unrelated to maxMemoryAllocationSize in our opinion (although maxMemoryAllocationSize is also 4 Gb in our case). We are aware that on Windows only 50% of RAM can be mapped to the GPU, and the amount of memory we’re trying to allocate for our tests does not exceed this limit, so the 50% limitation is not related to our problem either.

On Linux (Ubuntu 22.04) we use NVIDIA 570 proprietary drivers. On Windows we use the latest Game drivers installed by NVIDIA Center (as for 29.04.2025, version 576.02). Are you aware what’s the reason for this behavior? Can anything be done about it?

without using the pageable memory extension, you can only allocate up to the maximum available contiguous block of video memory (heap 1 usually) due to fragmentation (ie, allocate 5GB, use 3GB, deallocate 4.5GB - you might only have 2.5GB contiguous to use), and no vulkan driver implements the automatic overflowing of heap1 into heap 2.

further, only 214MB’s are supplied for the cpu addressible heap 3.

From my experience this is a different situation. I’ll also look on pageable memory, but the problem described is different. On Windows nVidia driver (I believe it is namely the driver, not VulkanSDK) just use internally the buffer size as uint32_t somewhere. And externally it accepts uint64_t. Therefore @CoffeeExterminator has such a weird behavior: map 7 Gb - can access only 3, map 6 Gb - can access only 2, map 5 Gb - can access only 1, 4 Gb cannot be mapped at all (because it gives 0 in uint32_t type). And this bug exists only on Windows and only for Vulkan. I have checked by myself. Linux + Vulkan, Linux + Cuda, Window + Cuda allow to map much more memory than available VRAM amount. Only Windows + Vulkan wraps the mapped (external) buffer size by taking only lower 32 bits of tis size.

Remember that we are talking about mapping RAM to GPU through PCIe. This is quite often required for some applications which has extreme memory requirements. And Cuda is often used for that with success.

That’s a valuable point, I’ll make sure to look into pageable memory. However, I belive it’s not related to the problem I’m currently facing, since I’m experiencing problems with allocating shared memory (that is, memory that is host visible and host coherent in Vulkan terms). The memory I’m allocating and then trying to import is not located in video memory (heap 0 in my case), but rather in RAM (heap 1), so it doesn’t seem like VK_EXT_pageable_device_local_memory can change anything about that.