Unaligned memory access not supported

I have rather big project which compiled and worked perfectly under CUDA 3.0.

Now I am moving forward to CUDA 3.1 but I stumble under the following error:

Error: Unaligned memory accesses not supported

at the line:

thisData[0]=w;

Where

  • w is a register variable of type Data* (it’s a pointer)

  • thisData is a shared variable declared directly, inside the kernel (no externs):

__shared__ Data* thisData[160];

It is compiled with -m32 flag, so pointers are 32bit values.

No tricky pointer movement…

I see no way this could lead to some unaligned memory?

Where can be the problem?

Obviously I don’t expect you to find it in my code, since I didn’t attach it. What I am asking, have you ever faced similar situation? What to look for?

Is there a way to tell the compiler “trust me, everything is aligned”? Some flag maybe?

Have you “if” befor this code line. I have a similar problem with following code:

if(cond){

   data[0]=...;

}

What is the size of (type) Data?

Have you “if” befor this code line. I have a similar problem with following code:

if(cond){

   data[0]=...;

}

What is the size of (type) Data?

Indeed, it is something like this:

if ((threadIdx.x & 31)==0) {

		numHandled=1;

		thisNode[0].index=(threadIdx.x >> 5);

		thisNode[0].load();

		Data* w=thisNode[0].getData();

		thisData[0]=w;						   //<-- this causes the error

		loadIndex=0;

	}

	__threadfence_block();

The size of Data is… hm… it’s a struct of 10 arrays of floats, each of length 32. That makes 1280 bytes in total if I am counting right.

Note however that I am working here only with pointers.

What was your problem and how did you resolve it, Deus?

Indeed, it is something like this:

if ((threadIdx.x & 31)==0) {

		numHandled=1;

		thisNode[0].index=(threadIdx.x >> 5);

		thisNode[0].load();

		Data* w=thisNode[0].getData();

		thisData[0]=w;						   //<-- this causes the error

		loadIndex=0;

	}

	__threadfence_block();

The size of Data is… hm… it’s a struct of 10 arrays of floats, each of length 32. That makes 1280 bytes in total if I am counting right.

Note however that I am working here only with pointers.

What was your problem and how did you resolve it, Deus?

The problem is that the shared and textur memory access need to be aligned.

Try following:

Data* temp =  thisData[0];//or only thisData[0];

 if ((threadIdx.x & 31)==0) {

		numHandled=1;

		thisNode[0].index=(threadIdx.x >> 5);

		thisNode[0].load();

		Data* w=thisNode[0].getData();

		thisData[0]=w;	

		loadIndex=0;

   }

or try to aligne you struct e.g.

struct __align__(16) {

  float a;

  float b;

  float c;

  float d;

  float e;

};

The problem is that the shared and textur memory access need to be aligned.

Try following:

Data* temp =  thisData[0];//or only thisData[0];

 if ((threadIdx.x & 31)==0) {

		numHandled=1;

		thisNode[0].index=(threadIdx.x >> 5);

		thisNode[0].load();

		Data* w=thisNode[0].getData();

		thisData[0]=w;	

		loadIndex=0;

   }

or try to aligne you struct e.g.

struct __align__(16) {

  float a;

  float b;

  float c;

  float d;

  float e;

};

Yes I know. But what I am storing in shared memory here are mere 32-bit pointers!

When I declare an static array, at least the first element is guaranteed to be aligned to something, isn’t it the case?

Anyway, I have tried adding this temp variable declaration, although I am not sure what for since I am never using it.

I also tried (note the reference)

Data* &temp =  thisData[0];

 if ((threadIdx.x & 31)==0) {

		numHandled=1;

		thisNode[0].index=(threadIdx.x >> 5);

		thisNode[0].load();

		Data* w=thisNode[0].getData();

		temp=w;	

		loadIndex=0;

   }

With exactly the same faulty result.

I see no point in aligning the struct I am pointing to, since I never access it in this piece of code.

Yes I know. But what I am storing in shared memory here are mere 32-bit pointers!

When I declare an static array, at least the first element is guaranteed to be aligned to something, isn’t it the case?

Anyway, I have tried adding this temp variable declaration, although I am not sure what for since I am never using it.

I also tried (note the reference)

Data* &temp =  thisData[0];

 if ((threadIdx.x & 31)==0) {

		numHandled=1;

		thisNode[0].index=(threadIdx.x >> 5);

		thisNode[0].load();

		Data* w=thisNode[0].getData();

		temp=w;	

		loadIndex=0;

   }

With exactly the same faulty result.

I see no point in aligning the struct I am pointing to, since I never access it in this piece of code.

Is the error in a same code line?
I am not sure, but I think all threads in a warp(or half warp) need to access a shared memory in a same code line. And with the if condition is it not possible.
Therefore helped the temp variable in my case…

Is the error in a same code line?
I am not sure, but I think all threads in a warp(or half warp) need to access a shared memory in a same code line. And with the if condition is it not possible.
Therefore helped the temp variable in my case…

Yes

I don’t think so. I did similar construction many times in the past and it worked.

What I thought it could be is that the first usage of a shared variable must appear together… but obviously it was not to case either.

I think for some reason the compiler is not certain it is aligned, hence the error.

I am constantly trying to reproduce the error in some smaller project…

I think I will copy whole thing and try removing stuff piece by piece.

Yes

I don’t think so. I did similar construction many times in the past and it worked.

What I thought it could be is that the first usage of a shared variable must appear together… but obviously it was not to case either.

I think for some reason the compiler is not certain it is aligned, hence the error.

I am constantly trying to reproduce the error in some smaller project…

I think I will copy whole thing and try removing stuff piece by piece.