I have rather big project which compiled and worked perfectly under CUDA 3.0.
Now I am moving forward to CUDA 3.1 but I stumble under the following error:
Error: Unaligned memory accesses not supported
at the line:
thisData[0]=w;
Where
w is a register variable of type Data* (it’s a pointer)
thisData is a shared variable declared directly, inside the kernel (no externs):
__shared__ Data* thisData[160];
It is compiled with -m32 flag, so pointers are 32bit values.
No tricky pointer movement…
I see no way this could lead to some unaligned memory?
Where can be the problem?
Obviously I don’t expect you to find it in my code, since I didn’t attach it. What I am asking, have you ever faced similar situation? What to look for?
Is there a way to tell the compiler “trust me, everything is aligned”? Some flag maybe?
Deus
September 2, 2010, 10:15pm
2
Have you “if” befor this code line. I have a similar problem with following code:
if(cond){
data[0]=...;
}
What is the size of (type) Data?
Deus
September 2, 2010, 10:15pm
3
Have you “if” befor this code line. I have a similar problem with following code:
if(cond){
data[0]=...;
}
What is the size of (type) Data?
Indeed, it is something like this:
if ((threadIdx.x & 31)==0) {
numHandled=1;
thisNode[0].index=(threadIdx.x >> 5);
thisNode[0].load();
Data* w=thisNode[0].getData();
thisData[0]=w; //<-- this causes the error
loadIndex=0;
}
__threadfence_block();
The size of Data is… hm… it’s a struct of 10 arrays of floats, each of length 32. That makes 1280 bytes in total if I am counting right.
Note however that I am working here only with pointers.
What was your problem and how did you resolve it, Deus?
Indeed, it is something like this:
if ((threadIdx.x & 31)==0) {
numHandled=1;
thisNode[0].index=(threadIdx.x >> 5);
thisNode[0].load();
Data* w=thisNode[0].getData();
thisData[0]=w; //<-- this causes the error
loadIndex=0;
}
__threadfence_block();
The size of Data is… hm… it’s a struct of 10 arrays of floats, each of length 32. That makes 1280 bytes in total if I am counting right.
Note however that I am working here only with pointers.
What was your problem and how did you resolve it, Deus?
Deus
September 3, 2010, 3:10pm
6
The problem is that the shared and textur memory access need to be aligned.
Try following:
Data* temp = thisData[0];//or only thisData[0];
if ((threadIdx.x & 31)==0) {
numHandled=1;
thisNode[0].index=(threadIdx.x >> 5);
thisNode[0].load();
Data* w=thisNode[0].getData();
thisData[0]=w;
loadIndex=0;
}
or try to aligne you struct e.g.
struct __align__(16) {
float a;
float b;
float c;
float d;
float e;
};
Deus
September 3, 2010, 3:10pm
7
The problem is that the shared and textur memory access need to be aligned.
Try following:
Data* temp = thisData[0];//or only thisData[0];
if ((threadIdx.x & 31)==0) {
numHandled=1;
thisNode[0].index=(threadIdx.x >> 5);
thisNode[0].load();
Data* w=thisNode[0].getData();
thisData[0]=w;
loadIndex=0;
}
or try to aligne you struct e.g.
struct __align__(16) {
float a;
float b;
float c;
float d;
float e;
};
Yes I know. But what I am storing in shared memory here are mere 32-bit pointers!
When I declare an static array, at least the first element is guaranteed to be aligned to something, isn’t it the case?
Anyway, I have tried adding this temp variable declaration, although I am not sure what for since I am never using it.
I also tried (note the reference)
Data* &temp = thisData[0];
if ((threadIdx.x & 31)==0) {
numHandled=1;
thisNode[0].index=(threadIdx.x >> 5);
thisNode[0].load();
Data* w=thisNode[0].getData();
temp=w;
loadIndex=0;
}
With exactly the same faulty result.
I see no point in aligning the struct I am pointing to, since I never access it in this piece of code.
Yes I know. But what I am storing in shared memory here are mere 32-bit pointers!
When I declare an static array, at least the first element is guaranteed to be aligned to something, isn’t it the case?
Anyway, I have tried adding this temp variable declaration, although I am not sure what for since I am never using it.
I also tried (note the reference)
Data* &temp = thisData[0];
if ((threadIdx.x & 31)==0) {
numHandled=1;
thisNode[0].index=(threadIdx.x >> 5);
thisNode[0].load();
Data* w=thisNode[0].getData();
temp=w;
loadIndex=0;
}
With exactly the same faulty result.
I see no point in aligning the struct I am pointing to, since I never access it in this piece of code.
Deus
September 8, 2010, 10:49am
10
Is the error in a same code line?
I am not sure, but I think all threads in a warp(or half warp) need to access a shared memory in a same code line. And with the if condition is it not possible.
Therefore helped the temp variable in my case…
Deus
September 8, 2010, 10:49am
11
Is the error in a same code line?
I am not sure, but I think all threads in a warp(or half warp) need to access a shared memory in a same code line. And with the if condition is it not possible.
Therefore helped the temp variable in my case…
Yes
I don’t think so. I did similar construction many times in the past and it worked.
What I thought it could be is that the first usage of a shared variable must appear together… but obviously it was not to case either.
I think for some reason the compiler is not certain it is aligned, hence the error.
I am constantly trying to reproduce the error in some smaller project…
I think I will copy whole thing and try removing stuff piece by piece.
Yes
I don’t think so. I did similar construction many times in the past and it worked.
What I thought it could be is that the first usage of a shared variable must appear together… but obviously it was not to case either.
I think for some reason the compiler is not certain it is aligned, hence the error.
I am constantly trying to reproduce the error in some smaller project…
I think I will copy whole thing and try removing stuff piece by piece.