What is a Warp Misaligned Address error? kernel debugging

My Fermi kernel is failing to execute properly, but when I insert a printf() or compile with -G to use cuda-gdb the problem goes away (but the kernel runs too slowly).

When I run without the printf() and without compiling with -G, I get the uninformative ‘unspecified launch error’ message.

When I run under cuda-gdb (despite not compiling with -G) it bails with the complaint ‘Program received signal CUDA_EXCEPTION_6, Warp Misaligned Address.’ This is probably the root issue, although I don’t understand exactly what the complaint means, and without the -G the debugger will give me only a program counter where the error was encountered and no way that I know of to track that back to a location in my source code.

Any help is appreciated.

-Steve Fischer

You should simply align your accesses on the GPU Global memory. The aligned address is a multiple of the size of the object

your are reading or writing, e.g. if you want to read or write an integer, the address should be a multiple of 4. And, reading

or writing a char is always aligned.

Suppose you have a big space allocated with cudaMalloc called dummySpace, this pseudo kernel code would probably results in

CUDA_EXCEPTION_6 Warp Misaligned Address:

__global__ void func (char* stringInput, int stringSize, int* integerInput, char* dummySpace) //input: a string, an integer, output: a big space with that string and integer in it

{

//dummySpace is created by cudaMalloc, so it is aligned to at least 256 bytes

int counter = 0;

for(int i=0;i<stringSize;i++)

   dummySpace[counter++] = stringInput[i]; //==>this is copying several chars, sizeof(char) is one, so they are always aligned

for(int i=0;i<sizeof(int);i++)

   dummySpace[counter++] = ((char*)integerInput)[i];   //==> this is going to be a problem because the first for has advanced the counter by stringSize which is unknown an can make the address unaligned

}

The fixed one:

__global__ void func (char* stringInput, int stringSize, int* integerInput, char* dummySpace)

{

int counter = 0;

for(int i=0;i<stringSize;i++)

   dummySpace[counter++] = stringInput[i];

int sub = counter % 4; //or 8 or 16..

counter += (4-sub);

for(int i=0;i<sizeof(int);i++)

   dummySpace[counter++] = ((char*)integerInput)[i];   //==> everything is ok as you are saving an integer in an aligned address

}

Hope it helps.