Hello guys,
I’m encountering weird results from CUDA memory checker. It gives me misaligned access to shared memory errors while address it reports is clearly not misaligned. Also,it always happens in random lines and iterations of the code,while all inputs are the same. Also,moving that array out of the shared memory doesn’t trigger any erros in CUDA memory checker anymore.
The code is too big to show it there and I don’t really think the error is related to the code itself, but basically it something like that:
[i]
shared uint32_t test[256];
for(uint32_t i=0; i<256; i++) test[i] = someDataFromConstantMemory;
uint32_t aVal = test[aPseudoRandomVal & 0xFF];// here the error happens
and if we move it out of shared memory,it doesn’t trigger any errors reports from CUDA memory checker:
[i]
uint32_t test[256]; //not in shared memory,no errors anymore
for(uint32_t i=0; i<256; i++) test[i] = someDataFromConstantMemory;
uint32_t aVal = test[aPseudoRandomVal & 0xFF];// everything works fine
And the error is:
[i]
GPU State:
Address Size Type Mem Block Thread blockIdx threadIdx PC Source
000001bc 4 mis ld s 0 0 {0,0,0} {0,0,0} _Z14TestPj+0076f0 f:\kernel.cu:136
000001e8 4 mis ld s 0 1 {0,0,0} {1,0,0} _Z14TestPj+0076f0 f:\kernel.cu:136
Summary of access violations:
f:\kernel.cu(136): error MemoryChecker: #misaligned=2 #invalidAddress=0
Memory Checker detected 2 access violations.
error = misaligned load (shared memory)
gridid = 1
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x000001bc
accessSize = 4[/i]
So, address = 0x000001bc and accessSize = 4, seems legit, doesn’t it?
Does anyone have any idea of what may cause that problem?
Windows 10 x64, CUDA 8, GTX1070.
Thanks.