Complex structs in CUDA

cudasat · December 27, 2019, 1:30am

Hello,
I am a beginner in CUDA. I am trying to copy a complex struct with pointers and arrays on the device but I get illegal memory access with the arrays. I am using cudaMalloc() and cudaMemCpy().
How I can correctly copy the struct on the device without errors?

This is the struct:

struct solver { 
	int* DB, nVars, nClauses, mem_used, mem_fixed, mem_max, maxLemmas, nLemmas,
		* buffer, nConflicts, * model, * reason, * falseStack, * _false, * first,
		* forced, * processed, * assigned, * next, * prev, head, res, fast, slow;
};

This is the code where I get errors (lines 3-6-7):

__device__
int propagate(struct solver* S, int* _reason, int* _processed) {
	int forced = S->reason[abs(*S->processed)];    
	while (S->processed < S->assigned) {  
		printf("processed loop\n");
		int lit = *(S->processed++);        
		int* watch = &S->first[lit];
                ...

mnicely · December 27, 2019, 2:13pm

Copying a struct of arrays to device memory is no different from independent arrays. You need to make sure you use cudaMalloc on each struct item. Could you provide that code you’re trying to implement?

Here an example

typedef struct fCUB_t {
    float *       d_minScorePerBlock {};
    float *       d_minScoreSorted {};
    unsigned int *d_minComboPerBlock {};
    unsigned int *d_minComboSorted {};
    void *        d_temp_storage     = NULL;
    size_t        temp_storage_bytes = 0;
} cubData;

cubData cubStruct {};

checkCudaErrors( cudaMalloc( reinterpret_cast<void **>( &cubStruct.d_minScorePerBlock ),
                                     numBlocksRequired * sizeof( float ) ) );
checkCudaErrors( cudaMalloc( reinterpret_cast<void **>( &cubStruct.d_minComboPerBlock ),
                                     numBlocksRequired * sizeof( unsigned int ) ) );
checkCudaErrors( cudaMalloc( reinterpret_cast<void **>( &cubStruct.d_minScoreSorted ),
                                     numBlocksRequired * sizeof( float ) ) );
checkCudaErrors( cudaMalloc( reinterpret_cast<void **>( &cubStruct.d_minComboSorted ),
                                     numBlocksRequired * sizeof( unsigned int ) ) );

And check your code with cuda-memcheck, as it will help you narrow down your issues.

cudasat · December 28, 2019, 7:17pm

Thank you a lot!

Actually I am doing this:

...

	gpuErrchk(cudaMalloc((void**)&_reason, nbytes));
	gpuErrchk(cudaMalloc((void**)&_processed, nbytes));

	gpuErrchk(cudaMemcpy(_reason, S.reason, nbytes, cudaMemcpyHostToDevice));
	gpuErrchk(cudaMemcpy(_processed, S.processed, nbytes, cudaMemcpyHostToDevice));

        ...

The problem is that the arrays _reason and _processed have not the correct values

mnicely · December 28, 2019, 8:33pm

That looks correct to me. I suggest starting with a smaller example that just performs cudaMemcpy to confirm you’re doing everything correctly. You might find one at the NVIDIA Developer blog https://devblogs.nvidia.com/.

Also, you might want to look at managed memory https://devblogs.nvidia.com/maximizing-unified-memory-performance-cuda/.

thomastangcy · May 6, 2024, 12:14pm

Hi I know it might be long ago, but do you still remember what the problem is and how did you solve the problem?

Curefab · May 7, 2024, 7:22am

When you are copying pointer types to the GPU, you have to do a deep copy. Each pointer has to get a new value pointing to the newly allocated device memory. If there is a pointer to host memory left, it will lead to a crash of the kernel.

Alternatives to manual deep copies:

use index-oriented or use value-oriented data structures instead of pointers
use managed memory (see mnicely’s link) or zero-copy host memory. Both are slower than global device memory.
use C++ data structures with defined ownership of member variables. Then you can either use member variable types that manage their memory on their own or do have a copy constructor and copy assignment operator, which handle the deep copies.

All in all, the GPU is not well suited for data structures with pointers. A single index (as in the example of the OP) typically works well, but pointers to pointers are often a sign to reformulate the algorithm.

Topic		Replies	Views
Pointer in "complex" structure CUDA Programming and Performance	4	3244	March 8, 2009
Clean way of copying a struct with pointers to the GPU CUDA Programming and Performance cuda	8	2681	September 12, 2022
Copying an array of structs to device memory CUDA Programming and Performance	0	400	June 17, 2022
C Structures CUDA Programming and Performance	1	4659	May 23, 2007
miserable struct array and cudaMemCpy CUDA Programming and Performance	4	5369	June 24, 2011
Transfering struct with pointers to device memory Used for variable argument list CUDA Programming and Performance	11	27100	January 19, 2011
Struct CUDA Programming and Performance	11	1740	June 30, 2016
How to copy Device Struct with pointers to Host? CUDA Programming and Performance	10	8432	July 8, 2014
Passing a struct into a Kernel CUDA Programming and Performance	1	4386	June 24, 2009
cudaMalloc and cudaMemcpy for pointer in struct CUDA Programming and Performance	7	3428	November 26, 2014

Complex structs in CUDA

Related topics