How to access host objects from the device? (A malloc question)

guilherme.s.franco · March 4, 2020, 5:18pm

Good evening everyone.

I have been coding in CUDA during last year. And I guess I learnt a lot from your forum here. Thanks a lot for that.

However, I still have some points that are not totally clear for me. It turns out that these points impact deeply my code design and I would like to have a better understanding on them. Here they are:

Q1: Consider we have a class allocated on CPU memory with one method with device (accessible by GPU). Are the parameters of the class, accessible inside the device method copied to GPU every time we access the method? Or are they copied in beforehand when the class is instantiate?

Q2: We are working with Tensorflow. At tensorflow, when we receive a tensor already allocated in the gpu, we save its pointer to GPU’s memory by: tensor.flat()->data(). We have a class like:

class MClass
{
    MClass(float* pointerGPU) : mPointer(pointerGPU) {}

   __device__
  auto operator [](const int& idx) cons { … }

   protected:
	float* mPointer;
}

If the answer of the first question is: « it is copied every time you access » , I suppose that in this example, the pointer will be copied as well, but its content won’t (cause you can’t tell the size of the content to where the pointer is pointing). So how does it work anyways? (It is a working code here. It is computing the math it should compute.)

Q3: If the answer of the first question is: « it is copied beforehand when the class is instantiate », how cuda selects which parameters it is going to copy? Cause in the same class, I have some std::vector that can’t be copied to the GPU memory.

Q4: Related to the last question. Consider a class E that, in theory, can’t be copied to GPU because it has a std::vector parameter. I have a working code like:

global
void computeGPU(const E e) { … }

I know that the class can’t be entirely copied to the GPU. However, I have the object accessible there and I access the method with device decorator. Is the object « partially copied » or is it pointing to an object in the CPU memory?

Q5: Finally but not least, how could I mimic the same type of copy as Q4? Before I call my global method, in the constructor of my classes E, I have a std:vector of Es. I wanna make them accessible inside the method device. I can’t use thrust::device_vector because it has parameters that can’t be copied to GPU. I am trying to do use cudaMalloc instead:

cudaMalloc((void**)&e_array_ptr, sizeof(E)*e_array_size);
cudaMemcpy(e_array_ptr, e_array_.data(), sizeof(E)*e_array_size, cudaMemcpyHostToDevice);

This code does not give me any error, but it does not have the desired data in e_array_ptr.
If I run exactly the same code on CPU but using the CPU’s malloc and memcpy instead, it works just fine.
Any idea of how to perform the same type of copy as the global function does?

Thank you in advance for any word of advice you can share.
Have a nice day