I have a .cu program with various variable allocated in the device memory.
device float gpu_v[N];
device float gpu_sd[N];
device float c[N];
device float d[N];
The CUDA programming manual says “Any address BaseAddress of a variable residing in global memory or returned by one of the memory allocation routines from Sections D.5 or E.8 is always aligned to at least 256 bytes, so to satisfy the memory alignment constraint”
But I found that the address of a,b,c,d where not aligned to 256-bytes ( and for that matter even 64-bytes). If I use a small value for N, then sometimes the compiler sees to generate aligned addresses.
Is there any option that need to be set so that the addresses are always aligned to 256 bytes ?
For the above problem I reproduce the .cubin for gpu_v and gpu_sd variables.
reloc {
name = gpu_sd
segname = reloc
segnum = 14
offset = 32
bytes = 13440000
}
reloc {
name = gpu_v
segname = reloc
segnum = 14
offset = 0
bytes = 40000
}