Cuda and child parent/class

user137079 · July 12, 2022, 11:20am

Hi every one!

I’dont understant the way cuda works with inheritance. Here is simple exemple

	template <typename T>
	__global__ void foo(T f)
	{
		printf("From GPU\n");
		f.test();
	}
	
class Animal {
   public:
   		Animal(){};
        __host__ __device__ virtual void test() {printf("I'm youre dad\n");};
};
class Dog : public Animal {
   public:
      __host__ __device__ virtual void test() override {printf("I'm a dog'\n");}
};
class Cat : public Animal {
   public:
      __host__ __device__ virtual void test() override {printf("I'm a cat\n");}

};

	int main()
	{
		std::vector<Animal*> myThings;
		myThings.push_back(new Dog);
		myThings.push_back(new Cat);
		foo<<<1,1>>>(*myThings[0]);
		cudaDeviceSynchronize();
		foo<<<1,1>>>(*myThings[1]);
		cudaDeviceSynchronize();
		myThings[0]->test();
		myThings[1]->test();
		return 0;
	}

On CPU, call to test give the child class result. On GPU, that’s the parent class that respond.
Why? And how to correct it?

Robert_Crovella · July 12, 2022, 1:17pm

At least one of the problems in your code is indicated here:

It is not allowed to pass as an argument to a __global__ function an object of a class with virtual functions.

One possible way to address this is to construct objects in device code that have virtual methods that will be used in device code:

$ cat t2059.cu
#include <cstdio>
#include <vector>
#include <new>

class Animal {
   public:
      __host__ __device__ Animal(){};
      __host__ __device__ virtual void test() {printf("I'm youre dad\n");};
};
class Dog : public Animal {
   public:
      __host__ __device__ virtual void test() override {printf("I'm a dog'\n");}
};
class Cat : public Animal {
   public:
      __host__ __device__ virtual void test() override {printf("I'm a cat\n");}

};


        __global__ void foo()
        {
                Animal **a;
                a = new Animal*[2];
                a[0] = new Dog;
                a[1] = new Cat;
                printf("From GPU\n");
                //f.test();
                a[0]->test();
                a[1]->test();
        }

        int main()
        {
                std::vector<Animal*> myThings;
                myThings.push_back(new Dog);
                myThings.push_back(new Cat);
                foo<<<1,1>>>();
                cudaDeviceSynchronize();
                myThings[0]->test();
                myThings[1]->test();
                return 0;
        }
$ nvcc -o t2059 t2059.cu
$ compute-sanitizer ./t2059
========= COMPUTE-SANITIZER
From GPU
I'm a dog'
I'm a cat
I'm a dog'
I'm a cat
========= ERROR SUMMARY: 0 errors
$

This question comes up from time to time, so with a bit of searching you will find other examples.

user137079 · July 21, 2022, 8:52am

I found a way to do this with Curiously Recurring Template Pattern. It’s a static inheritance that avoid use of virtual function.
Here is a basic sample!

#include <vector>
#include <memory>
#include <typeinfo>
#include <iostream>
#include <variant>
#include <utility>

template <typename Derived>
class Base{
public:
    double implementation()
    {
        return static_cast<Derived*>(this)->implementation();
    }
    __device__  void test()
    {
        return static_cast<Derived*>(this)->test();
    }
};

class DerivedA : public Base<DerivedA>{
public:
     double implementation(){ return 2.0;}
     __device__  void test(){printf("i am A from GPU\n");}
};

class DerivedB : public Base<DerivedB>{
public:
     double implementation(){ return 1.0;}
     __device__  void test(){printf("i am B from GPU\n");}
};

using Child = std::variant<DerivedA, DerivedB>;

template <typename T>
__global__ void kernel(T a)
{
    a.test();
}


int main() {
    auto obj1 = new DerivedA;
    auto obj2 = new DerivedB;

    cudaDeviceSynchronize();

    std::vector<Child> forces;
    forces.push_back(*obj1);
    forces.push_back(*obj2);


    for(auto f:forces)
    {
        int grid = 1;
        int threads = 1;
        std::visit( [&grid,&threads](auto&& e) 
        {
            kernel<<<grid,threads>>>(e);
            cudaDeviceSynchronize();
        }, f);
    }
return 0;
}

compile with ncvv file.cu -std=c++17

Since the question of class heritage in Cuda comes up so often, I hope this example will help!

system · August 4, 2022, 8:52am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to use class in CUDA C++? CUDA Programming and Performance	1	19397	May 29, 2018
Cleanest way to call a device member function (of a children object) from a kernel CUDA Programming and Performance cuda	0	515	September 5, 2020
Copying objects to device with virtual functions CUDA Programming and Performance	5	3741	November 9, 2017
Cuda confusions a few clarifications on the programming methodology CUDA Programming and Performance	5	1459	October 1, 2011
How to implement a generic array in unified memory? CUDA Programming and Performance	7	2028	July 2, 2017
cuda and C++ CUDA Programming and Performance	4	2136	September 5, 2010
Inheritence issue in Cuda CUDA Programming and Performance	12	1137	December 25, 2021
Calling a class from cuda-kernel CUDA Programming and Performance	6	65545	March 1, 2010
Pass pointer to class as a kernel argument and access class methods CUDA Programming and Performance	1	3459	July 5, 2018
My program is going to slow. CUDA problem? CUDA Programming and Performance	9	1632	June 3, 2014

Cuda and child parent/class

Related topics