Need help with Classes (oop) and CUDA

JohnDoes · November 29, 2012, 10:01am

Hi!
I try to make my CUDA code more object oriented but have problems with the CUDA limitations, hope you can help me. I want to do something like this (simplified).

//main.cu

__global__ void compute(Swarm *swarm){

	int threadID = threadIdx.x;

	swarm->getParticleOnPositionI(threadID)->computeFitness();
}
int main(){

	Swarm *deviceSwarm, *hostSwarm;

	
	hostSwarm = new Swarm();
	

	size_t sizeOfSwarm = sizeof(Swarm); // I think here is the first mistake?

	cudaMalloc((void**)&deviceSwarm, sizeOfSwarm);

	cudaMemcpy(deviceSwarm, hostSwarm, sizeOfSwarm, cudaMemcpyHostToDevice);

	compute(deviceSwarm);

	cudaMemcpy(hostSwarm, deviceSwarm, sizeOfSwarm, cudaMemcpyDeviceToHost);

	

	
return 0;
}

//Swarm.h

#ifdef __CUDACC__
#define CUDA_CALLABLE_MEMBER   __device__
#else
#define CUDA_CALLABLE_MEMBER
#endif

class Swarm
{
public:
	Swarm(void);
	~Swarm(void);

	

	#ifdef __CUDACC__
	 CUDA_CALLABLE_MEMBER Particle* getParticleOnPositionI(int i){
	return &particles[i];
}
	  #endif

private:

	Particle *particles;

};

//Particle.h
#ifdef __CUDACC__
#define CUDA_CALLABLE_MEMBER   __device__
#else
#define CUDA_CALLABLE_MEMBER
#endif

class Particle
{
public:
	Particle(void);
	~Particle(void);

	#ifdef __CUDACC__
	CUDA_CALLABLE_MEMBER void computeFitness(){
	int i = threadIdx.x;
	*fitness = i*i;
	}
	#endif
	

	
private:
	double *fitness;
};

I have just posted to header files for convenience.
This code does not work as intended, perhaps you can help me to modify it and get it work. I have tried to find literature about oop with CUDA, but nothing fits my problem :(

Ailleur · November 29, 2012, 12:46pm

A few items:
the compute/kernel call has no grid/block information ?
The memory pointer by “fitness” and “particles” is never allocated (and subsequently no data is ever copied to it).

As I understand it, “swarm” is not an array but a single element. You could pass that element directly to the kernel through its copy constructor (provided you do not have an host array malloc-ed within the object, as you do now).
Similarly for “fitness”, if it is a scalar value, you can drop the pointer and just give it a value.

I would say for now your biggest problem is the (array?) of particles that is never cudamalloced.

Hope this helps.

JohnDoes · November 29, 2012, 1:33pm

Yes, thank you very much, that helps allot!
How can I cudamalloc the particles array of the deviceSwarm? I don’t know how to allocate an element within an element (the particles are elements of the swarm).

Ailleur · November 29, 2012, 2:07pm

You do as you would with any device pointer that you want to malloc. In this case you could do it in the constructor of the Swarm class if you already know how many particles you will need. And have a cudaFree in the destructor of the class. Or through some setParticles(const Particle* const h_particles, const size_t numParticles); where you would cudaMalloc the device Particle array and cudaMemcpy them.

Unless the particles are populated on the device itself, you will also need a host pointer of particles to memcpy from.

JohnDoes · November 29, 2012, 2:13pm

Great! I think I have understood the concept now! I will try it on Monday, as I have to do something else at the moment and let you know if it worked for me :)
Thank you very very much for your help!

JohnDoes · December 4, 2012, 12:27pm

So, I tried to do it with some kind of setParticles(…) (as you said) but still no success.
I have problems accessing the objects of the deviceSwarm. Doing so in the main.cu file:

Swarm *deviceSwarm, *hostSwarm;

	
	hostSwarm = new Swarm();
	hostSwarm->initializeParticles();

	size_t sizeOfSwarm = sizeof(Swarm);
	size_t sizeOfParticles = sizeof(hostSwarm->getParticles());

	cudaMalloc((void**)&deviceSwarm, sizeOfSwarm);
        deviceSwarm->allocateDeviceMemories(sizeOfParticles);

will not work, as I can not access device objects like this on the host. My question is: How do I get the particles of the deviceSwarm, so I can set them and copy the data of the hostParticles to the deviceParticles?

Ailleur · December 4, 2012, 4:05pm

That is certainly quite impossible!

I have not seen the rest of the object, but you most likely do not need a device copy of the object on the device. When you call your global kernel, you will give it as a parameter ‘hostSwarm’ and a copy of the object will be available from within your kernel.

As in regular c++, what you need to worry about when copying objects (as is done here when you call the kernel, a device copy of the object will be created), are shallow copies of objects with dynamic memory. In this case, if your Swarm object has a pointer to an array of particles, that pointer either points to memory on the host or on the device.

So what you will need in your object are two pointers, one pointing to the array of particles on the host that you use from host code, and one pointing to the array of particles on the device that you have cudamalloced and cudamemcopied.

Ill put some code, but don’t expect that to compile (or even be close to compiling)

class Swarm
{
public : 
      void mallocAndCopyCurrentParticlesOnDevice();
   
private:
...
Particle* h_particles;
Particle* d_particles;
unsigned int numParticles;
};

//This takes your current host array of particles and mallocs/copies it to the device.
//After this function is called, the d_particles pointer can be used from device code to access
//the particles.
void Swarm::mallocAndCopyCurrentParticlesOnDevice()
{
     cudaMalloc(&d_particles, sizeof(Particle)*numParticles;
     cudaMemcpy(d_particles, h_particles, sizeof(Particle)*numParticles);
}

__global__ myKernelCallThatUsesSwarm(Swarm swarm, OtherStuff otherstuff)
{
.... do stuff
//The swarm.d_particles points to particles on the device, you can use it here.
Particle myParticle = swarm.d_particles[idx];
}

void main(void)
{
    Swarm* myHostSwarm = new Swarm();
    //whatever code is needed here in order to fill h_particles and numParticles
    myHostSwarm->mallocAndCopyCurrentParticlesOnDevice();
    
    myKernelCallThatUsesSwarm(*myHostSwarm, otherStuff)
}

JohnDoes · December 6, 2012, 10:14am

Thanks again very much for your help! Sorry, that I have so many questions regarding to such a simple problem -.-
I have tried the way you have suggested, but there is still something wrong (damn it).
My global function looks like this:

__global__ void computeNewFitness(Swarm *swarm){
	int threadID = threadIdx.x;
	Particle *myParticle = swarm->getParticleOnPositionI(threadID);
	myParticle->computeFitness();	
	
}

while the particle class looks like mentioned in my first posting. The computeFitness() function seems not to work (at least, this is one of the problems…), but why?

Ailleur · December 6, 2012, 12:42pm

If it’s exactly like in your first post, then the ‘double *fitness’ points to nothing. It does not need to be a pointer (as far as I understand it, it will only hold a scalar value).

I suggest you use nsight and debug your host code. You’ll be able to see which lines of code do not work, see if you’re able to even get inside the code for getParticleOnPosition, etc. It’s easy to forget something and, for me anyway, the easiest way to find what has been forgotten is to step the code line by line and check every variable/pointer value.

JohnDoes · December 6, 2012, 3:41pm

Ok, I will try it. I use Nsight, but having problems with it. Will reinstall a new version of it or use the terminal version o cuda-gdc. Thanks very much, let’s see what the next problems will be :D

Topic		Replies	Views
cudaMalloc and cudaMemcpy for pointer in struct CUDA Programming and Performance	7	3281	November 26, 2014
Global arrays? CUDA Programming and Performance	24	10632	August 18, 2010
Invalid Device Pointer CUDA Programming and Performance	9	24503	January 15, 2009
Copying 2D array from host to device CUDA Programming and Performance	7	7258	July 27, 2010
seems that cuda doesn't support pointer to pointer problem report CUDA Programming and Performance	11	11710	March 29, 2012
CUDA and char* programming CUDA Programming and Performance	8	14970	July 31, 2008
pointer in global device memory CUDA Programming and Performance	9	11612	November 23, 2011
Strange memory gremlins Getting pwned by pointers CUDA Programming and Performance	9	12175	July 1, 2009
cudaMemcpy2d result from copying 2d array from device to Host , gives garbage, help CUDA Programming and Performance	3	1516	July 7, 2009
most general form for thread access? CUDA Programming and Performance	10	2531	February 21, 2010

Need help with Classes (oop) and CUDA

Related topics