Unaligned memory access not supported -- driving me batty! This error appears to have nothing to

I have the following kernel code, truncated to just the relevant bits.

[codebox]259: global void md5_search(ulong base, ulong left, uint *success)

260: {

274: char passwd[MAX_PASSWD_LEN];

294: passwd[56] = passwd_len * 8;

313: *success = linidx;

321: }[/codebox]

I get the following error on compile.

[codebox]/usr/local/cuda/bin/nvcc -c -I. -I/usr/local/cuda/include -I/home/btimby/NVIDIA_CUDA_SDK/common/inc -o md5_gpu.cu_o md5_gpu.cu

./md5_gpu.cu(294): Error: Unaligned memory accesses not supported[/codebox]

At this point, I can almost believe that there is a problem with my code… However, if I compile with --device-emulation the code works unmodified. And the kicker is, by commenting out line 313, the error on 294 magically goes away, even without --device-emulation! That’s right, the following compiles and runs…

[codebox]259: global void md5_search(ulong base, ulong left, uint *success)

260: {

274: char passwd[MAX_PASSWD_LEN];

294: passwd[56] = passwd_len * 8;

313: //*success = linidx;

321: }[/codebox]


Nobody? I searched the forums before I posted, there are three other reports of this issue with no resolution.

Do Nvidia developers read this forum? I am guessing this problem is NOT with my code but more likely with the compiler.

I am blocked on this project until I can figure out a solution to this issue, help is greatly appreciated.

if you declare an (not shared) array inside a kernel, then it is placed in global memory (in the manual it is referenced as local memory). Mabye such acceses have to be multiple of 4bytes, to acces a char you use multiple of 1 byte adress.
Just check it with floats, mabye i am right :).

Any solution or help with this lately? I’m getting the “unaligned accesses not supported” error when accessing (seeming) perfectly aligned memory.

[codebox]typedef align(4) unsigned int uint;

typedef align(4) float afloat;

typedef struct align(8){

uint key;

uint value;

} KeyValuePair;

myfunction(afloat* row, KeyValuePair* rand)


norm1 = row[rand[0].value]; //<-- compiler complains about this step


It looks like the "rand[0].value’ part is ok. The problem is accessing “row” at that value.

Oh and it works fine in emulation mode and also in true device mode if I make the key and value members of KeyValuePair char’s (the struct is aligned to 2 bytes in this case) instead of uint’s.

I should mentioned that all this memory (row and rand) is in shared memory, which was declared as char’s but then pointers to different parts were cast to their appropriate types (afloat, KeyValuePair). This kind of thing is done a lot in the Programming Guide and SDK projects.


Ok I think I solved me problem. I had do declare the shared memory array as an aligned type. I used chars aligned to 4 bytes. Anything of that size would work I guess.
typedef align(4) char achar;

The problem was accessing the 0th index of the shared data memory, which was declared as unaligned char’s. Subsequent cells in shared memory are aligned types, so they were ok-- just the 0th one was out of alignment.

This probably isn’t the same problem as the first one posted in this thread. Sorry.

a [programming style] alternative is to use a union, e.g.

union MyUnion {

	uint8_t bytes[4];

	uint32_t word;


I also found this.

The code

[codebox]char block[64];

block[63] = 0;[/codebox]

causes Unaligned memory access

but if I write it such way:

[codebox]long data[16];

char *block = (char *)data;

block[63] = 0;[/codebox]

it works!

Change into this:
passwd[56] = (char)passwd_len << 3;

nameOfDeviceFunction(structurelabel,&vector) produces Error: Unaligned memory accesses not supported. Okay. I get it that when I declare data it is possible for the starting byte to be located off a word boundary. Supposedly I could add filler bytes and produce the alignment, altho experimentation has proved that doesn’t work. In this case the error is produced by a function call to a device function with two arguments: a structure and a character vector. It is unclear what I’m to do about this. And, anyway, if pad bytes would cure the ill, why doesn’t the compiler simply put them in? This is rather mysterious the thing about alignment. I’d appreciate clarification.