error, expected a declaration CUDA 2.0 error only

sorry for new topic, but no one gives satisfactory answer in previous.

the following code gives error only on CUDA 2.0 betha but not on CUDA 1.1

#define SSE_ALIGNED __align__(16)

struct SSE_ALIGNED SSE_Aligned_Object {}; // error: expected a declaration 

template<int n,class T>

class Auto_Align {};


class Auto_Align<4,float> : public SSE_Aligned_Object {};

class AABB4f : public Auto_Align<4,float>


float vmin[4];

float vmax[4];


class Ray3f : public Auto_Align<3,float>


float pos[3];

float dir[3];


__device__ bool IntersectRayBox(const Ray3f& r,const AABB4f& box)


return false;


what am i do in that case? i really need code such like that

please, answer me some thing, CUDA developers :nukeclap:

I don’t quite see what you’re trying to reach with this alignment trick. Could you first elaborate a little, why you need to set this align(16) option for class AABB4f anyway?

(On the one hand, cudaMalloc/cuMemAlloc will always allocate base addresses which meet your alignment requirement. On the other hand, there is no way to make reads of a 32-byte ie. 256-bit type, like AABB4f, coalesce. You should preferably make separate arrays of floats, float2’s or float4’s instead of arrays of a combination of these.)



in my code i use Auto_Align like that

template<int n,class T>

class SPHERE: public Auto_Align<n,T> {...}

template <int _n,class T> 

class PLANE: public Auto_Align<_n,T> {...}

template<int n,class T>

class LINE: public Auto_Align<n,T> {...}

i use my library not only on GPU but also on CPU.

And, when i have SPHERE<4,float> the SSE optimisations switched on automaticly on CPU.

thats why i really need this trick. The goal is to use the same code on CPU and on GPU - that’s really revolution if compare CUDA with outher GPGPU technicks.

anyway, it works on CUDA 1.1, i have the code with this trick and it is not so simple to remove it now