Transfering struct with pointers to device memory Used for variable argument list

Hi,

I’m trying to pass a struct containing a pointer to CUDA device memory to a global function, but I can’t get it to work.

The goal of my function is to execute a postfix-expression (stored in a struct) containing zero or more (usually between 2 and 6) arrays. The expressions and the input data will come from other software, so I want to dynamically allocate and copy the data to the GPU. This allocation will be done while analyzing the expression and translating it from infix to postfix.

My struct and the array containing these structs looks like this:

struct cuExprInput {

  char *name;

  float *ptr;

  int begin;

  int end;

  int length;

};

struct cuExprInput cu_input[50];

For each variable encountered when breaking down the expression:

CUDA_SAFE_CALL( cudaMalloc( (void**)&cu_input[var_cnt].ptr, mem_size );

cudaMemcpy(&cu_input.ptr, data, mem_size, cudaMemcpyHostToDevice);

which stores the pointer in the struct.ptr, I assume.

To transfer the array of structs containing the pointers etc to the GPU and use them, I use:

CUDA_SAFE_CALL( cudaMalloc( (void**)&d_input_structs, var_cnt * sizeof(cuExprInput)) );

cudaMemcpy(d_input_structs, cu_input, (var_cnt * sizeof(cuExprInput)), cudaMemcpyHostToDevice);

__global__ void devRPN(cuExprInput *cu_input, int var_count, float *output) {

  /* in emulation, this works: */

  int i;

#ifdef EMU

  for(i = 0; i < var_count; i++)

	printf("%d: name = %s\n", i, cu_input[i].name);

  /* Doesn't work: */

  for(i = 0; i < var_count; i++)

	printf("%d: first val = %f\n", i, cu_input[i].ptr[0];

#endif

/* doesn't work either: */

  for(i = 0; i < var_count; i++)

	output[threadIdx.x] = cu_input[i].ptr[0];

}

This is just a simplified version, just to test ofcourse.

Also, while compiling, I get the warning:

"/tmp/tmpxft_00004afa_00000000-5.i", line 342: Advisory: Cannot tell what pointer points to, assuming global memory space

My question is then, how can I dynamically store pointers to global device memory and transfer/use this afterwards on the GPU? I’m probably doing something completely wrong here, so any help is very appreciated!

Thanks in advance,

Marcel.

Your method is almost correct, but you transfer d_input_structs whose *ptr, and *name still point to host memory locations. How I solved a problem similar to this was (pseudocode):

for i, 0-50 {

  cudaalloc(&LOC, mem_size);

  cu_input[i].ptr = LOC;

}

cudacopy(cu_input, device, 50);

You can (mostly) ignore the warning, it just says that NVCC cannot determine at compile time where ptr points to. Nothing you can do for the moment :)

The pointers initially reside on the host, but they are correctly pointing to device memory because they are initialized using cudaMalloc.

cudaMalloc( (void**)&cu_input[var_cnt].ptr, mem_size );

But the way you copy the data is incorrect

cudaMemcpy(&cu_input.ptr, data, mem_size, cudaMemcpyHostToDevice);

You are passing the address of ptr, which is host memory since cu_input.ptr resides on the host. Try this instead:

cudaMemcpy(cu_input.ptr, data, mem_size, cudaMemcpyHostToDevice);

The way you allocate and copy the structs to the device looks ok to me.

Thank you both very much, it’s working like it should now!! External Image
And I’ve learned a lot while playing with this…

Thanks again!

Hello to all,

Did you get rid of the previously refered warning?

What do you mean by that? I am having a similar problem here:

// allocate device memory

	solver* d_s;

	cutilSafeCall( cudaMalloc( (void**) &d_s, sizeof(solver)));

	cutilSafeCall( cudaMalloc( (void**) &d_s->trail, sizeof(lit)*h_s->cap));

	

	// copy host memory to device

	cutilSafeCall( cudaMemcpy( d_s, h_s, sizeof(solver), cudaMemcpyHostToDevice) );

	cutilSafeCall( cudaMemcpy( d_s->trail, h_s->trail, sizeof(lit)*h_s->cap, cudaMemcpyHostToDevice) );

NVCC is complaining about the following memory access:

d_s->trail[tid+d_s->qhead]

Hi,

Could you share your solution? Im trying to do something similar but cannot get the syntax correctly.

Thanks

Hi,

Could you share your solution? Im trying to do something similar but cannot get the syntax correctly.

Thanks

Hi.

I have a very similar problem and can’t get it to work with the hints above. So i resurrected this thread External Image.

I have a pointer to a struct, that contains a pointer to a struct.

And I have to copy all this stuff onto the graphics card.

Here’s some code:

struct extendedVertex

{

	float x;

	float y;

	//! The length of the ContentPixels.

 	unsigned int uiCPLength;

	//! All ContentPixels.

 	Point *pContentPixel;

};

with:

struct Point

{

	unsigned int x;

 	unsigned int y;

 	unsigned int c;

};

The struct gets filled with data ON THE HOST SIDE like this:

m_pEV = (extendedVertex *)calloc(m_uiEVLength, sizeof(struct extendedVertex));

for (unsigned int i = 0; i < m_uiEVLength; ++i) {

	is.read(reinterpret_cast<char *>(&m_pEV[i].x), 	sizeof(m_pEV[i].x));

	is.read(reinterpret_cast<char *>(&m_pEV[i].y), 	sizeof(m_pEV[i].y));

	is.read(reinterpret_cast<char *>(&m_pEV[i].uiCPLength), sizeof(m_pEV[i].uiCPLength));

	m_pEV[i].pContentPixel = (Point *)calloc(m_pEV[i].uiCPLength, sizeof(struct Point));

	for (unsigned int k = 0; k < m_pEV[i].uiCPLength; ++k) {

 	is.read(reinterpret_cast<char *>(&m_pEV[i].pContentPixel[k].x), sizeof(m_pEV[i].pContentPixel[k].x));

 	is.read(reinterpret_cast<char *>(&m_pEV[i].pContentPixel[k].y), sizeof(m_pEV[i].pContentPixel[k].y));

 	m_pEV[i].pContentPixel[k].c = calcC(m_pEV[i].pContentPixel[k].x, m_pEV[i].pContentPixel[k].y);

	}

}

So… how to copy that whole thing?

My current approach is like this:

extern "C"

cudaError_t CUDA_MallocAndCopyEV(const extendedVertex *hostPtr, unsigned int uiLength)

{

	cudaMalloc((void**)&devPtrEV, sizeof(struct extendedVertex) * uiLength);

	cudaMemcpy(devPtrEV, hostPtr, sizeof(struct extendedVertex) * uiLength, cudaMemcpyHostToDevice);

	unsigned int i;

	for (i = 0; i < uiLength; ++i) {

 	cudaMalloc((void**)&devPtrEV[i].pContentPixel, sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength);

 	cudaMemcpy(devPtrEV[i].pContentPixel, hostPtr[i].pContentPixel,

 sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength, cudaMemcpyHostToDevice);

	}

}

But after that, I can’t read from sth. like devPtr[ix].pContentPixel[i].c from a kernel, it crashes because of reading unallocated memory.

I think, there’s still a host-pointer in devPtr[ix].pContentPixel.

But how do I get there a device pointer?

Do I have to overwrite the pointer in some way? Or is my copy-function wrong?

Any help is appreciated!

Thanks!

Hi.

I have a very similar problem and can’t get it to work with the hints above. So i resurrected this thread External Image.

I have a pointer to a struct, that contains a pointer to a struct.

And I have to copy all this stuff onto the graphics card.

Here’s some code:

struct extendedVertex

{

	float x;

	float y;

	//! The length of the ContentPixels.

 	unsigned int uiCPLength;

	//! All ContentPixels.

 	Point *pContentPixel;

};

with:

struct Point

{

	unsigned int x;

 	unsigned int y;

 	unsigned int c;

};

The struct gets filled with data ON THE HOST SIDE like this:

m_pEV = (extendedVertex *)calloc(m_uiEVLength, sizeof(struct extendedVertex));

for (unsigned int i = 0; i < m_uiEVLength; ++i) {

	is.read(reinterpret_cast<char *>(&m_pEV[i].x), 	sizeof(m_pEV[i].x));

	is.read(reinterpret_cast<char *>(&m_pEV[i].y), 	sizeof(m_pEV[i].y));

	is.read(reinterpret_cast<char *>(&m_pEV[i].uiCPLength), sizeof(m_pEV[i].uiCPLength));

	m_pEV[i].pContentPixel = (Point *)calloc(m_pEV[i].uiCPLength, sizeof(struct Point));

	for (unsigned int k = 0; k < m_pEV[i].uiCPLength; ++k) {

 	is.read(reinterpret_cast<char *>(&m_pEV[i].pContentPixel[k].x), sizeof(m_pEV[i].pContentPixel[k].x));

 	is.read(reinterpret_cast<char *>(&m_pEV[i].pContentPixel[k].y), sizeof(m_pEV[i].pContentPixel[k].y));

 	m_pEV[i].pContentPixel[k].c = calcC(m_pEV[i].pContentPixel[k].x, m_pEV[i].pContentPixel[k].y);

	}

}

So… how to copy that whole thing?

My current approach is like this:

extern "C"

cudaError_t CUDA_MallocAndCopyEV(const extendedVertex *hostPtr, unsigned int uiLength)

{

	cudaMalloc((void**)&devPtrEV, sizeof(struct extendedVertex) * uiLength);

	cudaMemcpy(devPtrEV, hostPtr, sizeof(struct extendedVertex) * uiLength, cudaMemcpyHostToDevice);

	unsigned int i;

	for (i = 0; i < uiLength; ++i) {

 	cudaMalloc((void**)&devPtrEV[i].pContentPixel, sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength);

 	cudaMemcpy(devPtrEV[i].pContentPixel, hostPtr[i].pContentPixel,

 sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength, cudaMemcpyHostToDevice);

	}

}

But after that, I can’t read from sth. like devPtr[ix].pContentPixel[i].c from a kernel, it crashes because of reading unallocated memory.

I think, there’s still a host-pointer in devPtr[ix].pContentPixel.

But how do I get there a device pointer?

Do I have to overwrite the pointer in some way? Or is my copy-function wrong?

Any help is appreciated!

Thanks!

Well… I found a solution after reading this post, which I didn’t read before:

http://forums.nvidia.com/index.php?showtopic=80736&st=0&p=518733&#entry518733

My solution is now sth. like that, for everyone, who’s interested:

extern "C"

cudaError_t CUDA_MallocAndCopyEV(const extendedVertex *hostPtr, unsigned int uiLength)

{

 cudaMalloc((void**)&devPtrEV, sizeof(struct InterpolationInterface::extendedVertex) * uiLength);

 cudaMemcpy(devPtrEV, hostPtr, sizeof(struct InterpolationInterface::extendedVertex) * uiLength, cudaMemcpyHostToDevice);

	unsigned int i;

	for (i = 0; i < uiLength; ++i) {

 Point *tmpPoint;

cudaMalloc((void**)&tmpPoint, sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength);

 cudaMemcpy(tmpPoint, hostPtr[i].pContentPixel,

 	sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength, cudaMemcpyHostToDevice);

	CUDA_CopyCPs_Kernel<<<1, 1>>>(tmpPoint, devPtrEV, i);

	}

}

with the following kernel, that doesn’t do very much:

__global__ void CUDA_CopyCPs_Kernel(Point *devPtrPoint, extendedVertex *devPtr, unsigned int uiIndexEV)

{

	devPtr[uiIndexEV].pContentPixel = devPtrPoint;

}

This works now…

But I’m kind of confused, why in my previous solution the pointer pContentPixel doesn’t get overwritten and points to global memory, after a cudaMalloc is done with pContentPixel as the destination…

Well, I suppose that the whole struct is stored in global memotry (that’s for sure) and that pContentPixel in this struct can’t be “accessed” by normal host functions and not even by cudaMalloc…

It must be sth. like that…

Well… I found a solution after reading this post, which I didn’t read before:

http://forums.nvidia.com/index.php?showtopic=80736&st=0&p=518733&#entry518733

My solution is now sth. like that, for everyone, who’s interested:

extern "C"

cudaError_t CUDA_MallocAndCopyEV(const extendedVertex *hostPtr, unsigned int uiLength)

{

 cudaMalloc((void**)&devPtrEV, sizeof(struct InterpolationInterface::extendedVertex) * uiLength);

 cudaMemcpy(devPtrEV, hostPtr, sizeof(struct InterpolationInterface::extendedVertex) * uiLength, cudaMemcpyHostToDevice);

	unsigned int i;

	for (i = 0; i < uiLength; ++i) {

 Point *tmpPoint;

cudaMalloc((void**)&tmpPoint, sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength);

 cudaMemcpy(tmpPoint, hostPtr[i].pContentPixel,

 	sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength, cudaMemcpyHostToDevice);

	CUDA_CopyCPs_Kernel<<<1, 1>>>(tmpPoint, devPtrEV, i);

	}

}

with the following kernel, that doesn’t do very much:

__global__ void CUDA_CopyCPs_Kernel(Point *devPtrPoint, extendedVertex *devPtr, unsigned int uiIndexEV)

{

	devPtr[uiIndexEV].pContentPixel = devPtrPoint;

}

This works now…

But I’m kind of confused, why in my previous solution the pointer pContentPixel doesn’t get overwritten and points to global memory, after a cudaMalloc is done with pContentPixel as the destination…

Well, I suppose that the whole struct is stored in global memotry (that’s for sure) and that pContentPixel in this struct can’t be “accessed” by normal host functions and not even by cudaMalloc…

It must be sth. like that…

Your solution helped me a lot. Even though I know the thread is quite dated, I just wanted to show my appreciation. Thanks!!