How to porting a rather complicated c++ code (Geant4) into GPU ??

ronaldo1018 · November 3, 2015, 3:42am

Hi, I’m pretty new to CUDA programming and I’m having a problem trying to port a part of Geant4 code into GPU.

Geant4 is a particle simulation tool based on c++ program.

The main reasons why we think it difficult is as following:

Geant4 simulation uses c++ instead of c programming.
Geant4’s program structure is a multi-level class ( In other words, it uses class calls class method to
complete the work }
In order to let kernel function uses the class function we need to add Tag device host before each function. Could anyone gives some samrt ideas to add this Tag in front of every class’s member function ?
Geant4’s each class uses many class pointer variable. Hence, we need to use CUDA Unified Memory
Mechanism to handle it ( this is a rather difficult part)

And we have use a simple c++ program to test for it

#include <iostream>
    #include <cuda_runtime.h>
    using namespace std;
    
     // Derived class
     class Rectangle
     {
        public:
           Rectangle()
          {
           
          }
          int getArea()
          {
             return (*width * *height);
          }
          int* width;
          int* height;
    };
   
    // Base class
    class Shape
    {
       public:
        Shape()
        {
        }
        Rectangle* rect;
    };
   
    __global__ void change_width(Shape* sha)
    {
        *(sha->rect->width) = 210;
        *(sha->rect->height) = 10;
        return;
    }
   
    int main(void)
    {
        Shape* sha;
        cudaMallocManaged(&sha,sizeof(Shape));
        cudaMallocManaged(&sha->rect, sizeof(Rectangle));
        cudaMallocManaged(&sha->rect->width,sizeof(int));
        cudaMallocManaged(&sha->rect->height,sizeof(int));
        *(sha->rect->width) = 20;
        *(sha->rect->height) = 10;
       change_width<<<1,1,0>>>(sha);
       cudaDeviceSynchronize();

       // Print the area of the object.
       cout << "Total area: " << sha->rect->getArea() << endl;
   
       return 0;
    }

In this code, we can port a 2-level class structure into GPU. And it works.
But you know , Geant4 is a rather big project. I am afraid that our simple idea could not fulfill the job.

Could anyone with good experience in porting c++ code to GPU give some good advice to me??
I really don’t have idea how to handle for such a big program.
Thanks for all your help.

Sincerely,
KEVIN

cbuchner1 · November 3, 2015, 9:38am

I think you should focus on the topic which parts of the large code base will benefit from execution on a GPU, and which parts don’t. Profile the CPU code to see which loops and function are critical ones. Focus on these.

However, often it is easier to design a project for the GPU from the ground up, instead of retroactively trying to fit an existing CPU based solution onto the CUDA GPU.

Probably a 1:1 mapping of Geant’s C++ classes and objects into CUDA is not the best way to go forward. CUDA benefits greatly from “Struct of Arrays” data storage, as opposed to arrays or lists of structs (or objects). SoA storage allows for coalesced access to the main memory, therefore allowing memory access speeds at nearly the full hardware capability. Also consider using textures and constant for read-only data storage. This will use of the GPU’s texture caches on any localized reads.

What’s generally not working well with CUDA is trying to squeeze a gazillion of classes and code into one Mega-Kernel. You will run out of registers quickly and as a result performance suffers so badly that you lose all benefits of execution on the GPU. You will most likely have to split the computation into several smaller kernels.

Christian

trf86 · November 9, 2015, 3:04pm

I think Christian is correct regarding first determining which pieces of the code will benefit from being run on the GPU.

As for implementation, you should refer to the C++ example in the CUDA samples. It demonstrates how to call kernels from inside C++. The sample is called “CppIntegration” and is inside the “0_Simple” folder in the CUDA samples.

Topic		Replies	Views
How to use Cuda Compiler CUDA Programming and Performance	0	1616	October 11, 2015
How to use Cuda Compiler CUDA Programming and Performance	0	1269	October 12, 2015
How to use Cuda Compiler CUDA Programming and Performance	0	1294	August 6, 2018
How to use Cuda Compiler "nvcc" to comile Geant4 projects CUDA Programming and Performance	0	676	October 12, 2015
Help on fixing some poor performances (rookie) CUDA Programming and Performance	10	7168	November 28, 2007
best way to convert a c++ program to use on the kernal CUDA Programming and Performance	2	504	May 22, 2019
C++ Integration Limitations on using CUDA with C++ CUDA Programming and Performance	8	4814	July 4, 2007
Genetic / Neural Implementation I'm new to CUDA and I'm totally stuck CUDA Programming and Performance	6	4913	July 9, 2009
Same c++/cuda code base to work on demand on cpu OR gpu? CUDA Programming and Performance	10	96	September 11, 2024
Integration with Managed Code (C#) CUDA Programming and Performance	7	10939	June 17, 2010

How to porting a rather complicated c++ code (Geant4) into GPU ??

Related topics