Initializing array member of object Copy device pointer or copy data across device memory

I have a class Object defined and one of its member is a pointer arrayOfFloats, intended to store an array of floats.

In the kernel I define for each thread a variable of the class Object. However, since the content of arrayOfFloats is read from file, I wanted to read on the host side and allocate device memory and then pass the pointer inside the kernel to the arrayOfFloats, like this

class Object{


float *arrayOfFloats;


__device__ void initArrayOfFloatsPtr(float *devPtr){



__device__ void initArrayOfFloatsCpy(float *devPtr,int arraySize){


  for(int i=0;i<arraySize;i++)



__device__ printfArrayOfFloats(){

/* calls a simple printf to some elements of arrayOfFloats */


/* the remaining public members */

/* et cetera, et cetera */


__global__ void kernelCopy(float *devPtr,int arraySize){

Object var();




__global__ void kernelPtr(float *devPtr){

Object var();



int main(){

float *arrayOfFloats_host=readFloatsFromFile("some_file.csv");

  float *arrayOfFloats_device;





[Sorry for code verbosity, I tried some contention] The arrayOfFloats is readonly, so no write-after-write or read-after-write or similar memory hazards combinations are liable to occur. kernelCpy prints the content of the files, kernelPtr does not.

Is it related to the arrayOfFloats_device being a host pointer to the device memory space? How can I “copy” the arrayOfFloats_device by address, saving up time otherwise taken by an explicit copy?


I’m not sure to fully get what you intent to do here, but the usual method for initialising a class on the device, where one of the components is actually a pointer to some dynamically allocated memory is as follow:


allocate the device memory where the data should be store with cudaMalloc;

copy the data to store on the device memory with cudaMemcpy;

create a host instance of your class and initialise the data pointer to the device memory you allocated;

allocate a device instance or your class with cudaMalloc;

copy your host instance of the class to the device using cudaMemcpy.

At this stage, you should have a fully functional instance of your class on the device, with it’s data pointer pointing to initialised memory located on the device.

Does that make sense?