Complex structs in CUDA

Hello,
I am a beginner in CUDA. I am trying to copy a complex struct with pointers and arrays on the device but I get illegal memory access with the arrays. I am using cudaMalloc() and cudaMemCpy().
How I can correctly copy the struct on the device without errors?

This is the struct:

struct solver { 
	int* DB, nVars, nClauses, mem_used, mem_fixed, mem_max, maxLemmas, nLemmas,
		* buffer, nConflicts, * model, * reason, * falseStack, * _false, * first,
		* forced, * processed, * assigned, * next, * prev, head, res, fast, slow;
};

This is the code where I get errors (lines 3-6-7):

__device__
int propagate(struct solver* S, int* _reason, int* _processed) {
	int forced = S->reason[abs(*S->processed)];    
	while (S->processed < S->assigned) {  
		printf("processed loop\n");
		int lit = *(S->processed++);        
		int* watch = &S->first[lit];
                ...

Copying a struct of arrays to device memory is no different from independent arrays. You need to make sure you use cudaMalloc on each struct item. Could you provide that code you’re trying to implement?

Here an example

typedef struct fCUB_t {
    float *       d_minScorePerBlock {};
    float *       d_minScoreSorted {};
    unsigned int *d_minComboPerBlock {};
    unsigned int *d_minComboSorted {};
    void *        d_temp_storage     = NULL;
    size_t        temp_storage_bytes = 0;
} cubData;
cubData cubStruct {};

checkCudaErrors( cudaMalloc( reinterpret_cast<void **>( &cubStruct.d_minScorePerBlock ),
                                     numBlocksRequired * sizeof( float ) ) );
checkCudaErrors( cudaMalloc( reinterpret_cast<void **>( &cubStruct.d_minComboPerBlock ),
                                     numBlocksRequired * sizeof( unsigned int ) ) );
checkCudaErrors( cudaMalloc( reinterpret_cast<void **>( &cubStruct.d_minScoreSorted ),
                                     numBlocksRequired * sizeof( float ) ) );
checkCudaErrors( cudaMalloc( reinterpret_cast<void **>( &cubStruct.d_minComboSorted ),
                                     numBlocksRequired * sizeof( unsigned int ) ) );

And check your code with cuda-memcheck, as it will help you narrow down your issues.

Thank you a lot!

Actually I am doing this:

...

	gpuErrchk(cudaMalloc((void**)&_reason, nbytes));
	gpuErrchk(cudaMalloc((void**)&_processed, nbytes));

	gpuErrchk(cudaMemcpy(_reason, S.reason, nbytes, cudaMemcpyHostToDevice));
	gpuErrchk(cudaMemcpy(_processed, S.processed, nbytes, cudaMemcpyHostToDevice));

        ...

The problem is that the arrays _reason and _processed have not the correct values

That looks correct to me. I suggest starting with a smaller example that just performs cudaMemcpy to confirm you’re doing everything correctly. You might find one at the NVIDIA Developer blog https://devblogs.nvidia.com/.

Also, you might want to look at managed memory https://devblogs.nvidia.com/maximizing-unified-memory-performance-cuda/.

Hi I know it might be long ago, but do you still remember what the problem is and how did you solve the problem?

When you are copying pointer types to the GPU, you have to do a deep copy. Each pointer has to get a new value pointing to the newly allocated device memory. If there is a pointer to host memory left, it will lead to a crash of the kernel.

Alternatives to manual deep copies:

  • use index-oriented or use value-oriented data structures instead of pointers
  • use managed memory (see mnicely’s link) or zero-copy host memory. Both are slower than global device memory.
  • use C++ data structures with defined ownership of member variables. Then you can either use member variable types that manage their memory on their own or do have a copy constructor and copy assignment operator, which handle the deep copies.

All in all, the GPU is not well suited for data structures with pointers. A single index (as in the example of the OP) typically works well, but pointers to pointers are often a sign to reformulate the algorithm.