Why nested 3D data structures cause a crash cuda?

Robert_Crovella · July 3, 2023, 8:36pm

You don’t do this in CUDA:

What you have there is a struct definition not an instantiation. struct definitions don’t get tagged with __device__.

If you would like to instantiate a __device__ variable of that type, you would do:

struct Voxel
{
  uint8_t ID = 0;
  uint8_t hit_counter = 0;
};

__device__ struct Voxel v;  // the actual global device variable

Beyond that, if you are still having trouble, I suggest:

test against the latest version of CUDA
provide a short, complete example, with a description of the compile failure, along with the CUDA version you are using and the compile command line that caused the failure.

This isn’t really a forum for assistance with Qt. Also, please don’t post pictures of text on this forum.

I also note that eventually your variable will occupy over 2GB of space. Is that your intent? I would suggest to manage that with a dynamic allocation (e.g. cudaMalloc()) rather than a static allocation.

On CUDA 12.0.1, I see that compiling this:

#include <cstdio>
#include <cstdint>
struct Voxel
{
  uint8_t ID = 0;
  uint8_t hit_counter = 0;
};

struct Chunk_level_1
{
  uint8_t counter = 0;
  Voxel voxel [10][10][10] = {0}; // 1 000
};

struct Chunk_level_2
{
  uint8_t counter = 0;
  Chunk_level_1 Chk_Lv1 [10][10][10]= {0};  // 1 000 000
};

__device__ Chunk_level_2 Chk_Lv2 [10][10][10] = {0};  //1 000 000 000

__global__ void k(){
        printf("%c\n", Chk_Lv2[1][2][3].counter);
}

int main(){

        k<<<1,1>>>();
        cudaDeviceSynchronize();
}

Takes a “very long” time. On the other hand, I don’t have trouble with this:

#include <cstdio>
#include <cstdint>
struct Voxel
{
  uint8_t ID = 0;
  uint8_t hit_counter = 0;
};

struct Chunk_level_1
{
  uint8_t counter = 0;
  Voxel voxel [10][10][10] = {0}; // 1 000
};

struct Chunk_level_2
{
  uint8_t counter = 0;
  Chunk_level_1 Chk_Lv1 [10][10][10]= {0};  // 1 000 000
};


__global__ void k(Chunk_level_2 *c){
        printf("%c\n", c[(((1*10)+2)*10)+3].counter);
}

int main(){

        Chunk_level_2 *dc;
        cudaMalloc(&dc,  sizeof(Chunk_level_2)*10*10*10);
        cudaMemset(dc,0, sizeof(Chunk_level_2)*10*10*10);
        k<<<1,1>>>(dc);
        cudaDeviceSynchronize();
}

~~So if you’re not happy with that, and you want the __device__ variable instead, I would suggest:~~

~~retest on the latest version of CUDA~~

~~if it still manifests, file a bug.~~

According to my testing, it doesn’t seem to be possible to create a __device__ variable larger than 2GB. So I wouldn’t bother with that approach.

Topic		Replies	Views
__device__ variariable doubt CUDA Programming and Performance	1	3632	June 6, 2008
Arrays of Structure Allocating memory for array of structures. CUDA Programming and Performance	7	3696	September 24, 2009
3D data structure woes T_T CUDA Programming and Performance	6	1869	February 18, 2013
Some easy, but useful questions CUDA Programming and Performance	6	4171	July 11, 2008
nvcc Segmentation Fault (addressing issue with stack vars?) local to global gather CUDA Programming and Performance	8	9980	August 15, 2010
memory overflow? CUDA Programming and Performance	1	5150	July 27, 2009
How to allocate a 3d array such that you can use the indecies to access its elements CUDA Programming and Performance	20	5425	October 24, 2009
Compilation error CUDA Programming and Performance	4	967	April 17, 2010
cuda defined struct variable error CUDA Programming and Performance	2	1485	September 8, 2009
Shared Memory initialization CUDA Programming and Performance	19	45332	March 26, 2007

Why nested 3D data structures cause a crash cuda?

Related topics