CUDA 2.0, 100 errors with C++

Hi. i have too many errors to post them here, so i just give you a project cause them.
it compiles perfectly on CUDA 1.1 but not on 2.0 betha.

i am using C++ template metaprogramming in my lib. I know - documetation says - “only С”, but i was very happy when my C++ code compiles on device. My library was CPU/GPU portable and it has some optimisations (like automatic SSE on CPU).

so i dont think that C++ features cause any problems in future versions.

1)a question is will CUDA 2.0 support all C++ features that CUDA 1.1 supported? espesially, what about templates?

if it is not - this is very-very bad :crying: , because i have a lot of code and plans using MGML_MATH library.

and what i am to do in that case?

Have you tried compiling with the “-host-compilation=c++” compiler option (see nvcc documentation)?

no i havent. i will try.
but i use those features not only on host.
i use them on device.

Well, you are lucky then because for device code only C++ is supported, C is not (though I do not know how standards-compliant the C++ compiler is).

–host-compilation=c++ does not helps.

one of errors is :

#ifndef __CUDACC__

	#define __align__(N) __declspec(align(N))


#define SSE_ALIGNED __align__(16)


struct SSE_ALIGNED SSE_Aligned_Object // error: expected a declaration




While I doubt this is the cause for the problem, since that code should never do anything when compiling with nvcc:

I do not know for C++, but for C identifiers starting with __ or _ and uppercase letter are reserved. If you use (or as in this case even redefine) any of those, there is no reason to expect your program to work at all.

the problem is that

struct __align__(16) SSE_Aligned_Object 


gives this error. but it mustn’t. and there is only one of them. i have a lot of errors

up. :nukeclap: i still can not compile my project with CUDA 2.0 betha.

help, i am stuck!

Can you post a small snippet of code that doesn’t work? Your initial post has a big project but it’s annoying to get all the paths and such correct just to start to compile it.

I tried this code below in 2.0B2 and it compiled just fine:

__global__ void FROLtest()


  struct __align__(16) SSE_Aligned_Object {float x;}  foo;



It also works as an unnamed struct (removing the SSE_Aligned_Object tag).

This is in Windows XP, which may matter.

I am also using template metaprogramming with kernel code. It can be finicky (no static constant definitions, since the “static” keyword is not allowed) but it works.

with static, or without, this code gives error

#define universal_call __device__ __host__

template <int n,class T> 

struct MulVec


   inline static universal_call void exec(T* c,const T* a,const T* b)


    *c = (*a)*(*b);



  inline static universal_call void exec(T* c,const T* a,const T b)


    *c = (*a)*b;



  inline static universal_call void exec(T* c,const T* b)


    *c *= *b;



  inline static universal_call void exec(T* c,const T b)


    *c *= b;




error: cannot determine which instance of overloaded function “MulVec<n,T>::exec” is intended

error: type name is not allowed

so - type name is not allowed when instancing templates. and its perfectly compiles on CUDA 1.1

btw - i work under windows XP too


How about replacing this line in the first function:

with this:


…as the error message suggests?

Yes, it works! Thanks a lot!
It is even disgraceful me. Now i will try to correct my mistakes by myself.
thanks once again.

There are no disgraces due to compiler errors, except the disgrace that the errors messages themselves are. Thanks for posting - trouble shared is trouble halved.

so, another error is

__device__ int getSomeFromTexture(const texture<uint3 , 1, cudaReadModeElementType>& texture,unsigned int offset)


  uint3 node = tex1Dfetch(texture,offset);

  return  node.x;


error: no instance of overloaded function “tex1Dfetch” matches the argument list

argument types are: (const texture<uint3, 1, cudaReadModeNormalizedFloat>, unsigned int)

(may be need to rebuild solution to that error appears)

this is strange, may be i have SDK installed crookedly

That’s because only single, tuple, and quad texture fetches are supported. You’re trying for a mispacked uint3.

If you pad everything to a uint4 it should work.

thanks, it’s works now.

finally i localized my old error with SSE_Aligned_Object.

the following code

#define SSE_ALIGNED __align__(16)

struct SSE_ALIGNED SSE_Aligned_Object {}; // error: expected a declaration 

template<int n,class T>

class Auto_Align {};


class Auto_Align<4,float> : public SSE_Aligned_Object {};

class AABB4f : public Auto_Align<4,float>


	float vmin[4];

	float vmax[4];


class Ray3f : public Auto_Align<3,float>


	float pos[3];

	float dir[3];


__device__ bool IntersectRayBox(const Ray3f& r,const AABB4f& box)


	return false;


and then in some kernel

AABB4f box;

Ray3f ray;

bool result = IntersectRayBox(ray,box);

gives this error.

error: expected a declaration

I do not know about C++, but at least in C empty structs are not allowed, so this would just not be valid code. Your could try some of the hacks like adding a “int a[0];” or “int a” though I doubt it will help.

Even if it is valid in C++, both CUDA and brook are Franken-monster mixtures of C and C++, neither the one nor the other (e.g. I still have to cast all malloc return values - even when I set nvcc to compile C host code or with brook which claims to be C - which to my knowledge is a clear violation of any C standard in existence). Doesn’t stop anyone from advertising it as C, which is a clear lie, even CUDA is at best C++ - though possibly some very early version.

(Sorry for the rant ;-) )

it is normal C++ code and it’s works on CUDA 1.1 perfectly!

btw. the code


class Auto_Align<4,float> : public SSE_Aligned_Object {};

even is not error in CUDA 2.0. Only combination i posted cause this error.

so it is look like compiler bug

your hack with int a[0] doen not works

error C2503: ‘MGML_MATH::SSE_Aligned_Object’ : base classes cannot contain zero-sized arrays