Transfering struct with pointers to device memory Used for variable argument list

_Marcel · April 2, 2009, 10:05am

Hi,

I’m trying to pass a struct containing a pointer to CUDA device memory to a global function, but I can’t get it to work.

The goal of my function is to execute a postfix-expression (stored in a struct) containing zero or more (usually between 2 and 6) arrays. The expressions and the input data will come from other software, so I want to dynamically allocate and copy the data to the GPU. This allocation will be done while analyzing the expression and translating it from infix to postfix.

My struct and the array containing these structs looks like this:

struct cuExprInput {

  char *name;

  float *ptr;

  int begin;

  int end;

  int length;

};

struct cuExprInput cu_input[50];

For each variable encountered when breaking down the expression:

CUDA_SAFE_CALL( cudaMalloc( (void**)&cu_input[var_cnt].ptr, mem_size );

cudaMemcpy(&cu_input.ptr, data, mem_size, cudaMemcpyHostToDevice);

which stores the pointer in the struct.ptr, I assume.

To transfer the array of structs containing the pointers etc to the GPU and use them, I use:

CUDA_SAFE_CALL( cudaMalloc( (void**)&d_input_structs, var_cnt * sizeof(cuExprInput)) );

cudaMemcpy(d_input_structs, cu_input, (var_cnt * sizeof(cuExprInput)), cudaMemcpyHostToDevice);

__global__ void devRPN(cuExprInput *cu_input, int var_count, float *output) {

  /* in emulation, this works: */

  int i;

#ifdef EMU

  for(i = 0; i < var_count; i++)

	printf("%d: name = %s\n", i, cu_input[i].name);

  /* Doesn't work: */

  for(i = 0; i < var_count; i++)

	printf("%d: first val = %f\n", i, cu_input[i].ptr[0];

#endif

/* doesn't work either: */

  for(i = 0; i < var_count; i++)

	output[threadIdx.x] = cu_input[i].ptr[0];

}

This is just a simplified version, just to test ofcourse.

Also, while compiling, I get the warning:

"/tmp/tmpxft_00004afa_00000000-5.i", line 342: Advisory: Cannot tell what pointer points to, assuming global memory space

My question is then, how can I dynamically store pointers to global device memory and transfer/use this afterwards on the GPU? I’m probably doing something completely wrong here, so any help is very appreciated!

Thanks in advance,

Marcel.

Ojiisan · April 2, 2009, 10:14am

Hi,

I’m trying to pass a struct containing a pointer to CUDA device memory to a global function, but I can’t get it to work.

The goal of my function is to execute a postfix-expression (stored in a struct) containing zero or more (usually between 2 and 6) arrays. The expressions and the input data will come from other software, so I want to dynamically allocate and copy the data to the GPU. This allocation will be done while analyzing the expression and translating it from infix to postfix.

My struct and the array containing these structs looks like this:
struct cuExprInput {

  char *name;

  float *ptr;

  int begin;

  int end;

  int length;

};

struct cuExprInput cu_input[50];

Your method is almost correct, but you transfer d_input_structs whose *ptr, and *name still point to host memory locations. How I solved a problem similar to this was (pseudocode):

for i, 0-50 {

  cudaalloc(&LOC, mem_size);

  cu_input[i].ptr = LOC;

}

cudacopy(cu_input, device, 50);

You can (mostly) ignore the warning, it just says that NVCC cannot determine at compile time where ptr points to. Nothing you can do for the moment :)

Jamie_K · April 2, 2009, 2:00pm

The pointers initially reside on the host, but they are correctly pointing to device memory because they are initialized using cudaMalloc.

cudaMalloc( (void**)&cu_input[var_cnt].ptr, mem_size );

But the way you copy the data is incorrect

cudaMemcpy(&cu_input.ptr, data, mem_size, cudaMemcpyHostToDevice);

You are passing the address of ptr, which is host memory since cu_input.ptr resides on the host. Try this instead:

cudaMemcpy(cu_input.ptr, data, mem_size, cudaMemcpyHostToDevice);

The way you allocate and copy the structs to the device looks ok to me.

_Marcel · April 2, 2009, 4:37pm

Thank you both very much, it’s working like it should now!! External Image
And I’ve learned a lot while playing with this…

Thanks again!

FilipeM · June 29, 2009, 6:03pm

Hello to all,

Did you get rid of the previously refered warning?

What do you mean by that? I am having a similar problem here:

// allocate device memory

	solver* d_s;

	cutilSafeCall( cudaMalloc( (void**) &d_s, sizeof(solver)));

	cutilSafeCall( cudaMalloc( (void**) &d_s->trail, sizeof(lit)*h_s->cap));

	

	// copy host memory to device

	cutilSafeCall( cudaMemcpy( d_s, h_s, sizeof(solver), cudaMemcpyHostToDevice) );

	cutilSafeCall( cudaMemcpy( d_s->trail, h_s->trail, sizeof(lit)*h_s->cap, cudaMemcpyHostToDevice) );

NVCC is complaining about the following memory access:

d_s->trail[tid+d_s->qhead]

berkinoz · October 11, 2010, 10:11pm

Hi,

Could you share your solution? Im trying to do something similar but cannot get the syntax correctly.

Thanks

berkinoz · October 11, 2010, 10:11pm

Hi,

Could you share your solution? Im trying to do something similar but cannot get the syntax correctly.

Thanks

sic_6_SaNdMaN · November 22, 2010, 3:47pm

Hi.

I have a very similar problem and can’t get it to work with the hints above. So i resurrected this thread External Image.

I have a pointer to a struct, that contains a pointer to a struct.

And I have to copy all this stuff onto the graphics card.

Here’s some code:

struct extendedVertex

{

	float x;

	float y;

	//! The length of the ContentPixels.

 	unsigned int uiCPLength;

	//! All ContentPixels.

 	Point *pContentPixel;

};

with:

struct Point

{

	unsigned int x;

 	unsigned int y;

 	unsigned int c;

};

The struct gets filled with data ON THE HOST SIDE like this:

m_pEV = (extendedVertex *)calloc(m_uiEVLength, sizeof(struct extendedVertex));

for (unsigned int i = 0; i < m_uiEVLength; ++i) {

	is.read(reinterpret_cast<char *>(&m_pEV[i].x), 	sizeof(m_pEV[i].x));

	is.read(reinterpret_cast<char *>(&m_pEV[i].y), 	sizeof(m_pEV[i].y));

	is.read(reinterpret_cast<char *>(&m_pEV[i].uiCPLength), sizeof(m_pEV[i].uiCPLength));

	m_pEV[i].pContentPixel = (Point *)calloc(m_pEV[i].uiCPLength, sizeof(struct Point));

	for (unsigned int k = 0; k < m_pEV[i].uiCPLength; ++k) {

 	is.read(reinterpret_cast<char *>(&m_pEV[i].pContentPixel[k].x), sizeof(m_pEV[i].pContentPixel[k].x));

 	is.read(reinterpret_cast<char *>(&m_pEV[i].pContentPixel[k].y), sizeof(m_pEV[i].pContentPixel[k].y));

 	m_pEV[i].pContentPixel[k].c = calcC(m_pEV[i].pContentPixel[k].x, m_pEV[i].pContentPixel[k].y);

	}

}

So… how to copy that whole thing?

My current approach is like this:

extern "C"

cudaError_t CUDA_MallocAndCopyEV(const extendedVertex *hostPtr, unsigned int uiLength)

{

	cudaMalloc((void**)&devPtrEV, sizeof(struct extendedVertex) * uiLength);

	cudaMemcpy(devPtrEV, hostPtr, sizeof(struct extendedVertex) * uiLength, cudaMemcpyHostToDevice);

	unsigned int i;

	for (i = 0; i < uiLength; ++i) {

 	cudaMalloc((void**)&devPtrEV[i].pContentPixel, sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength);

 	cudaMemcpy(devPtrEV[i].pContentPixel, hostPtr[i].pContentPixel,

 sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength, cudaMemcpyHostToDevice);

	}

}

But after that, I can’t read from sth. like devPtr[ix].pContentPixel[i].c from a kernel, it crashes because of reading unallocated memory.

I think, there’s still a host-pointer in devPtr[ix].pContentPixel.

But how do I get there a device pointer?

Do I have to overwrite the pointer in some way? Or is my copy-function wrong?

Any help is appreciated!

Thanks!

sic_6_SaNdMaN · November 22, 2010, 3:47pm

Hi.

I have a very similar problem and can’t get it to work with the hints above. So i resurrected this thread External Image.

I have a pointer to a struct, that contains a pointer to a struct.

And I have to copy all this stuff onto the graphics card.

Here’s some code:

struct extendedVertex

{

	float x;

	float y;

	//! The length of the ContentPixels.

 	unsigned int uiCPLength;

	//! All ContentPixels.

 	Point *pContentPixel;

};

with:

struct Point

{

	unsigned int x;

 	unsigned int y;

 	unsigned int c;

};

The struct gets filled with data ON THE HOST SIDE like this:

m_pEV = (extendedVertex *)calloc(m_uiEVLength, sizeof(struct extendedVertex));

for (unsigned int i = 0; i < m_uiEVLength; ++i) {

	is.read(reinterpret_cast<char *>(&m_pEV[i].x), 	sizeof(m_pEV[i].x));

	is.read(reinterpret_cast<char *>(&m_pEV[i].y), 	sizeof(m_pEV[i].y));

	is.read(reinterpret_cast<char *>(&m_pEV[i].uiCPLength), sizeof(m_pEV[i].uiCPLength));

	m_pEV[i].pContentPixel = (Point *)calloc(m_pEV[i].uiCPLength, sizeof(struct Point));

	for (unsigned int k = 0; k < m_pEV[i].uiCPLength; ++k) {

 	is.read(reinterpret_cast<char *>(&m_pEV[i].pContentPixel[k].x), sizeof(m_pEV[i].pContentPixel[k].x));

 	is.read(reinterpret_cast<char *>(&m_pEV[i].pContentPixel[k].y), sizeof(m_pEV[i].pContentPixel[k].y));

 	m_pEV[i].pContentPixel[k].c = calcC(m_pEV[i].pContentPixel[k].x, m_pEV[i].pContentPixel[k].y);

	}

}

So… how to copy that whole thing?

My current approach is like this:

extern "C"

cudaError_t CUDA_MallocAndCopyEV(const extendedVertex *hostPtr, unsigned int uiLength)

{

	cudaMalloc((void**)&devPtrEV, sizeof(struct extendedVertex) * uiLength);

	cudaMemcpy(devPtrEV, hostPtr, sizeof(struct extendedVertex) * uiLength, cudaMemcpyHostToDevice);

	unsigned int i;

	for (i = 0; i < uiLength; ++i) {

 	cudaMalloc((void**)&devPtrEV[i].pContentPixel, sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength);

 	cudaMemcpy(devPtrEV[i].pContentPixel, hostPtr[i].pContentPixel,

 sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength, cudaMemcpyHostToDevice);

	}

}

But after that, I can’t read from sth. like devPtr[ix].pContentPixel[i].c from a kernel, it crashes because of reading unallocated memory.

I think, there’s still a host-pointer in devPtr[ix].pContentPixel.

But how do I get there a device pointer?

Do I have to overwrite the pointer in some way? Or is my copy-function wrong?

Any help is appreciated!

Thanks!

sic_6_SaNdMaN · November 23, 2010, 8:14am

Well… I found a solution after reading this post, which I didn’t read before:

http://forums.nvidia.com/index.php?showtopic=80736&st=0&p=518733&#entry518733

My solution is now sth. like that, for everyone, who’s interested:

extern "C"

cudaError_t CUDA_MallocAndCopyEV(const extendedVertex *hostPtr, unsigned int uiLength)

{

 cudaMalloc((void**)&devPtrEV, sizeof(struct InterpolationInterface::extendedVertex) * uiLength);

 cudaMemcpy(devPtrEV, hostPtr, sizeof(struct InterpolationInterface::extendedVertex) * uiLength, cudaMemcpyHostToDevice);

	unsigned int i;

	for (i = 0; i < uiLength; ++i) {

 Point *tmpPoint;

cudaMalloc((void**)&tmpPoint, sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength);

 cudaMemcpy(tmpPoint, hostPtr[i].pContentPixel,

 	sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength, cudaMemcpyHostToDevice);

	CUDA_CopyCPs_Kernel<<<1, 1>>>(tmpPoint, devPtrEV, i);

	}

}

with the following kernel, that doesn’t do very much:

__global__ void CUDA_CopyCPs_Kernel(Point *devPtrPoint, extendedVertex *devPtr, unsigned int uiIndexEV)

{

	devPtr[uiIndexEV].pContentPixel = devPtrPoint;

}

This works now…

But I’m kind of confused, why in my previous solution the pointer pContentPixel doesn’t get overwritten and points to global memory, after a cudaMalloc is done with pContentPixel as the destination…

Well, I suppose that the whole struct is stored in global memotry (that’s for sure) and that pContentPixel in this struct can’t be “accessed” by normal host functions and not even by cudaMalloc…

It must be sth. like that…

sic_6_SaNdMaN · November 23, 2010, 8:14am

Well… I found a solution after reading this post, which I didn’t read before:

http://forums.nvidia.com/index.php?showtopic=80736&st=0&p=518733&#entry518733

My solution is now sth. like that, for everyone, who’s interested:

extern "C"

cudaError_t CUDA_MallocAndCopyEV(const extendedVertex *hostPtr, unsigned int uiLength)

{

 cudaMalloc((void**)&devPtrEV, sizeof(struct InterpolationInterface::extendedVertex) * uiLength);

 cudaMemcpy(devPtrEV, hostPtr, sizeof(struct InterpolationInterface::extendedVertex) * uiLength, cudaMemcpyHostToDevice);

	unsigned int i;

	for (i = 0; i < uiLength; ++i) {

 Point *tmpPoint;

cudaMalloc((void**)&tmpPoint, sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength);

 cudaMemcpy(tmpPoint, hostPtr[i].pContentPixel,

 	sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength, cudaMemcpyHostToDevice);

	CUDA_CopyCPs_Kernel<<<1, 1>>>(tmpPoint, devPtrEV, i);

	}

}

with the following kernel, that doesn’t do very much:

__global__ void CUDA_CopyCPs_Kernel(Point *devPtrPoint, extendedVertex *devPtr, unsigned int uiIndexEV)

{

	devPtr[uiIndexEV].pContentPixel = devPtrPoint;

}

This works now…

But I’m kind of confused, why in my previous solution the pointer pContentPixel doesn’t get overwritten and points to global memory, after a cudaMalloc is done with pContentPixel as the destination…

Well, I suppose that the whole struct is stored in global memotry (that’s for sure) and that pContentPixel in this struct can’t be “accessed” by normal host functions and not even by cudaMalloc…

It must be sth. like that…

SimonKR · January 19, 2011, 7:06am

Well… I found a solution after reading this post, which I didn’t read before:

http://forums.nvidia.com/index.php?showtopic=80736&st=0&p=518733&#entry518733

My solution is now sth. like that, for everyone, who’s interested:
extern "C"

cudaError_t CUDA_MallocAndCopyEV(const extendedVertex *hostPtr, unsigned int uiLength)

{

 cudaMalloc((void**)&devPtrEV, sizeof(struct InterpolationInterface::extendedVertex) * uiLength);

 cudaMemcpy(devPtrEV, hostPtr, sizeof(struct InterpolationInterface::extendedVertex) * uiLength, cudaMemcpyHostToDevice);

	unsigned int i;

	for (i = 0; i < uiLength; ++i) {

 Point *tmpPoint;

cudaMalloc((void**)&tmpPoint, sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength);

 cudaMemcpy(tmpPoint, hostPtr[i].pContentPixel,

 	sizeof(struct InterpolationInterface::Point) * hostPtr[i].uiCPLength, cudaMemcpyHostToDevice);

	CUDA_CopyCPs_Kernel<<<1, 1>>>(tmpPoint, devPtrEV, i);

	}

}
with the following kernel, that doesn’t do very much:
__global__ void CUDA_CopyCPs_Kernel(Point *devPtrPoint, extendedVertex *devPtr, unsigned int uiIndexEV)

{

	devPtr[uiIndexEV].pContentPixel = devPtrPoint;

}
This works now…

But I’m kind of confused, why in my previous solution the pointer pContentPixel doesn’t get overwritten and points to global memory, after a cudaMalloc is done with pContentPixel as the destination…

Well, I suppose that the whole struct is stored in global memotry (that’s for sure) and that pContentPixel in this struct can’t be “accessed” by normal host functions and not even by cudaMalloc…

It must be sth. like that…

Your solution helped me a lot. Even though I know the thread is quite dated, I just wanted to show my appreciation. Thanks!!

Topic		Replies	Views
How to pass large arguments in CUDA kernels Kernel arguments CUDA Programming and Performance	10	19044	December 18, 2009
Constant memory when having more than one file external does not work CUDA Programming and Performance	24	3214	August 27, 2010
pointer in global device memory CUDA Programming and Performance	9	11600	November 23, 2011
Dump/inspect NVIDIA GPU global memory contents corresponding to arbitrary (but not invalid) addresses CUDA-GDB	3	255	November 8, 2024
most general form for thread access? CUDA Programming and Performance	10	2531	February 21, 2010
Passing structures into CUDA kernels CUDA Programming and Performance	9	20333	November 19, 2020
Passing Structs to kernel CUDA Programming and Performance	4	6117	November 6, 2009
How to copy a structure of arrays on GPU? CUDA Programming and Performance	7	24769	October 6, 2011
IS copying an array of character strings to device memory absolutely impossible? CUDA Programming and Performance	14	14727	March 24, 2011
Complex structs in CUDA CUDA Programming and Performance	5	882	May 7, 2024

Transfering struct with pointers to device memory Used for variable argument list

Related topics