Why nested 3D data structures cause a crash cuda?

You don’t do this in CUDA:

What you have there is a struct definition not an instantiation. struct definitions don’t get tagged with __device__.

If you would like to instantiate a __device__ variable of that type, you would do:

struct Voxel
{
  uint8_t ID = 0;
  uint8_t hit_counter = 0;
};

__device__ struct Voxel v;  // the actual global device variable

Beyond that, if you are still having trouble, I suggest:

  1. test against the latest version of CUDA
  2. provide a short, complete example, with a description of the compile failure, along with the CUDA version you are using and the compile command line that caused the failure.

This isn’t really a forum for assistance with Qt. Also, please don’t post pictures of text on this forum.

I also note that eventually your variable will occupy over 2GB of space. Is that your intent? I would suggest to manage that with a dynamic allocation (e.g. cudaMalloc()) rather than a static allocation.

On CUDA 12.0.1, I see that compiling this:

#include <cstdio>
#include <cstdint>
struct Voxel
{
  uint8_t ID = 0;
  uint8_t hit_counter = 0;
};

struct Chunk_level_1
{
  uint8_t counter = 0;
  Voxel voxel [10][10][10] = {0}; // 1 000
};

struct Chunk_level_2
{
  uint8_t counter = 0;
  Chunk_level_1 Chk_Lv1 [10][10][10]= {0};  // 1 000 000
};

__device__ Chunk_level_2 Chk_Lv2 [10][10][10] = {0};  //1 000 000 000

__global__ void k(){
        printf("%c\n", Chk_Lv2[1][2][3].counter);
}

int main(){

        k<<<1,1>>>();
        cudaDeviceSynchronize();
}

Takes a “very long” time. On the other hand, I don’t have trouble with this:

#include <cstdio>
#include <cstdint>
struct Voxel
{
  uint8_t ID = 0;
  uint8_t hit_counter = 0;
};

struct Chunk_level_1
{
  uint8_t counter = 0;
  Voxel voxel [10][10][10] = {0}; // 1 000
};

struct Chunk_level_2
{
  uint8_t counter = 0;
  Chunk_level_1 Chk_Lv1 [10][10][10]= {0};  // 1 000 000
};


__global__ void k(Chunk_level_2 *c){
        printf("%c\n", c[(((1*10)+2)*10)+3].counter);
}

int main(){

        Chunk_level_2 *dc;
        cudaMalloc(&dc,  sizeof(Chunk_level_2)*10*10*10);
        cudaMemset(dc,0, sizeof(Chunk_level_2)*10*10*10);
        k<<<1,1>>>(dc);
        cudaDeviceSynchronize();
}

So if you’re not happy with that, and you want the __device__ variable instead, I would suggest:

  1. retest on the latest version of CUDA
  2. if it still manifests, file a bug.

According to my testing, it doesn’t seem to be possible to create a __device__ variable larger than 2GB. So I wouldn’t bother with that approach.