How to seperate declare header and implement files in cuda programming?

Before I go in, I apologize for my poor English.
In c++ programming, when i declare a class, i create a file (.h) that is responsible for declaring the class and a file (.cpp) that is responsible for the implementation of the declared class.
Based on this coding style, i tried to create a class that could also work with cuda device. But, this occured MSB3721 error.

  1. declare class in .h file.
  2. implement class memebr function in .cpp(.cu) file.

however, below case builds well.

  1. declare class and implement member function in .h file.

why nvcc can not build first case and why it can build second case?

My example code:
First case:

// Vec3.h
#pragma once
#include <cuda_runtime.h>

// A simple 3D vector class with operator overloading
class Vec3 {

public:
    float elements[3];

    // Constructor
    __host__ __device__ Vec3(float x = 0, float y = 0, float z = 0);

    // Overload the [] operator for read-only access
    __host__ __device__ float operator[](int index) const;

    // Overload the [] operator for read-write access
    __host__ __device__ float& operator[](int index);
};
// Vec3.cpp
#include "Vec3.h"

// Device-specific constructor
__host__ __device__ Vec3::Vec3(float x, float y, float z) {
    elements[0] = x;
    elements[1] = y;
    elements[2] = z;
}

// Device-specific [] operator for read-only access
__host__ __device__ float Vec3::operator[](int index) const {
    return elements[index];
}

// Device-specific [] operator for read-write access
__host__ __device__ float& Vec3::operator[](int index) {
    return elements[index];
}

Second case:

// Vec3.h
#pragma once
#include <cuda_runtime.h>

// A simple 3D vector class with operator overloading
class Vec3 {

public:
    float elements[3];

    // Constructor
    __host__ __device__ Vec3(float x = 0, float y = 0, float z = 0) {
        elements[0] = x;
        elements[1] = y;
        elements[2] = z;
    };

    // Overload the [] operator for read-only access
    __host__ __device__ float operator[](int index) const {
        return elements[index];
    };

    // Overload the [] operator for read-write access
    __host__ __device__ float& operator[](int index) {
        return elements[index];
    };
};

a .cpp file in a CUDA project in visual studio (VS) is by default compiled with the host compiler (cl.exe that ships with MSVC++/VS).

The host compiler knows nothing about syntax such as __host__ __device__ and will give a syntax error of some sort. (MSB3721 in VS is a very generic error; it does not have enough specificity to indicate “syntax error” or any other type of error).

The general recommendation is that CUDA implementations that involve any CUDA syntax (e.g. __host__ __device__) should be placed in a .cu file.

The second case likely builds correctly because you are only including it in .cu files and not in any .cpp files.

1 Like

Also have a look at Separate Compilation: NVIDIA CUDA Compiler Driver and the -dc option NVIDIA CUDA Compiler Driver

With it you can also call __device__ functions between different translation units (cu files in this case), otherwise you can make calls only within the same translation unit (including within all headers you included from there). It can have a small performance cost.

1 Like

Thank you for your reply! I saw NVIDIA CUDA Compiler Driver. In my case, it seems like chapter 6.4 example code of NVIDIA CUDA Compiler Driver. According to chapter 6, use the “nvcc --device-c” command to create the .cu files as a .obj/.o file and then link this device object file to the host object file. However, my coding environment is Windows10/VS 2022 17.8.5, so to add above command, i should set the vs configuration. Where i add this command? Project property → C/C++ or CUDA C/C++ or Linker or CUDA Linker → command? If you know, please let me know.

1 Like

Thank you for your fast reply! I saw stackoverflow solution and i set VS configuration → project property → CUDA C/C++ → Common → Generate Rlocatable Device Code = Yes(-rdc=true) and vervosity level is Diagnostic. When i build project, the new error “nvlink error: Undefined reference to '_ZN4Vec3ixEi” in ‘C:/Users/dan70/Desktop/cuda-tutorial/x64/Debug/main.cu.obj’ occured. The ‘Vec3’ in ‘_ZN4Vec3ixEi’ may indicate my Vec3.cu file but i don’t know what is the ‘_ZN4~ixEI’ means. If you know, please let me know?

You can use a C++ demangler, they are available online.

It means:

Vec3::operator[](int)

that is the function prototype that cannot be found at the device-link stage. Somehow your project is still misconfigured (maybe you are not actually including or compiling the Vec3.cu file?)

It’s difficult to diagnose this kind of thing in a forum setting, because a VS project hides all manner of important information in the project property pages/dialogs.

I usually suggest at this point to provide the full VS console output from a compile session. With verbosity set to “Diagnostic” that is probably going to be a huge amount of output. If you can use judgment in what you select to post, it might be helpful.

There are also CUDA sample codes you could study, for example simpleSeparateCompilation.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.