C++ and NVCC What parts of C++ can we use with NVCC?


I have a question about tha part of C++ that is supported by nvcc. According to the documentation:

However, in the thread:


tomschi sucessfully used a class with an overloaded operator. I’m porting a C++ application to CUDA and I don’t want to rewrite all the C++ code to C. Can anyone tell me, what is the actual part of the C++ that is supported by nvcc and if it is a good and save idea to use overloaded operators and other C++ features?

Thanks a lot



CUDA indeed supports not only operator overloading but also basic inheritance. anything “virtual” (derivations or methods) didn’t work with 1.0, but I think due to the lack of true function pointers on the device these don’t work with 1.1, too. i think most of C++ works, apart from everything that needs function pointers.

using C++ instead of C is basically nice, but be warned: multiprocessor resources are really scarce and for achieving good performance you have to break OO encapsulation.

i’m currently writing a ray tracer with CUDA and i got the following ray class for example:

class Ray



  //ray origin

  float3 o;

  //ray direction

  float3 d;

  //ray tmin

  float tmin;

  //ray tmax

  float tmax;


for performance’s sake you should follow some mem alignment reqirements when loading a ray from device mem. so i did:

 Ray ray;

  ray.o.x = dpRayOrigins[KERNEL_TID];

  ray.o.y = dpRayOrigins[KERNEL_TID + KERNEL_SIZE];

  ray.o.z = dpRayOrigins[KERNEL_TID + KERNEL_SIZE*2];

  ray.tmin = dpRayOrigins[KERNEL_TID + KERNEL_SIZE*3];

 ray.d.x = dpRayDirections[KERNEL_TID];

  ray.d.y = dpRayDirections[KERNEL_TID + KERNEL_SIZE];

  ray.d.z = dpRayDirections[KERNEL_TID + KERNEL_SIZE*2];

  ray.tmax = dpRayDirections[KERNEL_TID + KERNEL_SIZE*3];

although one might think of a clean and elegant OO-way to wrap these lines, i guess you would have to spend a lot of time rewriting and restructuring your application to make things work fine with CUDA. what kind of application/program are you dealing with?


The way I understand it, the code may work, but it’s not really supported right now.

Most things, other than inheritance, seem to work, most of the time. You will have to add host device qualifiers to the C++ functions if you expect to use them on the device and from host code.


Thanks for the reply.
Btw. I’m working on the same thing as you… a raytracer :). I’m using a kD-Tree to partition the scene and I want to see, what CUDA and parallelization of the rays can bring. The traversal sequence for rays with similar direction should be also similar so I hope the threads will execute the same code most of the time.
Well, even adding the words device would prevent me from using exactly the same classes, so I guess I will rewrite them anyway. That raises another question. What is the advantage of using vector built-in types? Is it better to use float3 instead of three floats? I see that you use them, but I would like to know the advantage. CUDA doesn’t provide any vector instructions (if I’m correct), so do you thing there is any outcome of using them?


hello jan,

I see an advantage using them for arithmetic calculations, e.g. in ray-object intersection routines. I wrote some operators for float3 and so on, just very basic things like vector subtraction and addition and dot product. these simple ops don’t hurt performance and make the code more readable. so it’s just a sw aesthetic thing… and sticking to standard classes and types might make that vector operator code useful for other things and people, too.
good luck,