Hi there,
I’m currently testing memory alignment when using cudaMalloc on Windows 11.
-Driver Version: 572.16
-CUDA Version: 12.0
Here’s what I did:
1. I used cudaMalloc to allocate 8.25 MB of memory on the device (mem1).
2. Then, I used cudaMalloc to allocate 1 byte of memory on the device (mem2).
#include <iostream>
#include <cuda_runtime.h>
int main()
{
uint8_t* mem1 = nullptr;
size_t size = (size_t)(8.25 * 1024 * 1024) * sizeof(uint8_t); // 8,650,752 bytes = 8.25 MB
cudaError_t err = cudaMalloc(&mem1, size);
std::cout << std::hex << "mem1=" << (uint64_t)mem1 << " - " << (uint64_t)(mem1 + size) << std::dec << " (size=" << size << ")" << std::endl;
uint8_t* mem2 = nullptr;
size = (size_t)1; // 1 byte
err = cudaMalloc(&mem2, 1);
std::cout << std::hex << "mem2=" << (uint64_t)mem2 << " - " << (uint64_t)(mem2 + size) << std::dec << " (size=" << size << ")" << std::endl;
cudaFree(mem1);
cudaFree(mem2);
return 0;
}
mem1=130f000000 - 130f840000 (size=8650752)
mem2=130fa00000 - 130fa00001 (size=1)
I noticed that mem1 seems to be aligned to 10 MB (I calculated this from 130fa00000 - 130f000000), so there’s a 1.75 MB gap between mem1 and mem2. I checked deviceProp.textureAlignment, and it’s 512 bytes, so I’m not sure why mem1 is aligned to much more than 512 bytes.
Would anyone like to clear my doubts?
Thanks!