Memory alignment when using cudaMalloc

Hi there,

I’m currently testing memory alignment when using cudaMalloc on Windows 11.

-Driver Version: 572.16

-CUDA Version: 12.0

Here’s what I did:

1. I used cudaMalloc to allocate 8.25 MB of memory on the device (mem1).

2. Then, I used cudaMalloc to allocate 1 byte of memory on the device (mem2).

#include <iostream>
#include <cuda_runtime.h>

int main()
{
	uint8_t* mem1 = nullptr;
	size_t size = (size_t)(8.25 * 1024 * 1024) * sizeof(uint8_t); // 8,650,752 bytes = 8.25 MB 
	cudaError_t err = cudaMalloc(&mem1, size);
	std::cout << std::hex << "mem1=" << (uint64_t)mem1 << " - " << (uint64_t)(mem1 + size) << std::dec << " (size=" << size << ")" << std::endl;

	uint8_t* mem2 = nullptr;
	size = (size_t)1; // 1 byte 
	err = cudaMalloc(&mem2, 1);
	std::cout << std::hex << "mem2=" << (uint64_t)mem2 << " - " << (uint64_t)(mem2 + size) << std::dec << " (size=" << size << ")" << std::endl;

	cudaFree(mem1);
	cudaFree(mem2);

	return 0;
}
mem1=130f000000 - 130f840000 (size=8650752)
mem2=130fa00000 - 130fa00001 (size=1)

I noticed that mem1 seems to be aligned to 10 MB (I calculated this from 130fa00000 - 130f000000), so there’s a 1.75 MB gap between mem1 and mem2. I checked deviceProp.textureAlignment, and it’s 512 bytes, so I’m not sure why mem1 is aligned to much more than 512 bytes.

Would anyone like to clear my doubts?

Thanks!

First of all, there is no guarantee that consecutive memory allocations are placed in consecutive memory addresses. It is not meaningful to compute the gap between mem1 and mem2.

The memory alignment of cudaMalloc is specified in the programming guide as “at least 256 bytes”, i.e. the starting address of the allocation is divisible by 256.

However, I believe you are asking about allocation granularity, which typically is 2 MB. You can query the granularity using cuMemGetAllocationGranularity from the driver API.

4 Likes

Thanks a lot striker159!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.