Why nested 3D data structures cause a crash cuda?

ru.sencis · July 3, 2023, 7:59pm

I use a simple organization of 3D data structures to create a grid on CPU and fast searching occupied cells.

struct Voxel
{
  uint8_t ID = 0;
  uint8_t hit_counter = 0;
};

struct Chunk_level_1
{
  uint8_t counter = 0;
  Voxel voxel [10][10][10] = {0}; // 1 000
};

struct Chunk_level_2
{
  uint8_t counter = 0;
  Chunk_level_1 Chk_Lv1 [10][10][10]= {0};  // 1 000 000
};

Chunk_level_2 Chk_Lv2 [10][10][10] = {0};  //1 000 000 000

I want to declare it as a global variable to store in the gpu memory of the entire grid, updates it on gpu from rgb-d data and periodically synchronize filled cells with a similar structure on the CPU. But when I try to compile this, I get an build infinite loop until the memory on my ssd storage runs out.

__device__ struct Voxel
{
  uint8_t ID = 0;
  uint8_t hit_counter = 0;
};

__device__ struct Chunk_level_1
{
  uint8_t counter = 0;
  Voxel voxel [10][10][10] = {0}; // 1 000
};

__device__ struct Chunk_level_2
{
  uint8_t counter = 0;
  Chunk_level_1 Chk_Lv1 [10][10][10]= {0};  // 1 000 000
};

__device__ Chunk_level_2 Chk_Lv2 [10][10][10] = {0};  //1 000 000 000

I used cuda toolkit 12.0.1 in mobile 3050ti, qmake config:

# CUDA
# nvcc flags (ptxas option verbose is always useful)
NVCCFLAGS = --compiler-options -fno-strict-aliasing -use_fast_math --ptxas-options=-v
# Path to cuda toolkit install
CUDA_DIR = /usr/local/cuda-12.0
# GPU architecture (ADJUST FOR YOUR GPU)
CUDA_GENCODE  = arch=compute_86,code=sm_86
# manually add CUDA sources (ADJUST MANUALLY)
CUDA_SOURCES += cudamap.cu
# Path to header and libs files
INCLUDEPATH  += $$CUDA_DIR/include
# libs used in your code
LIBS += -L $$CUDA_DIR/lib64 -lcudart -lcuda

cuda.commands        = $$CUDA_DIR/bin/nvcc -c -gencode $$CUDA_GENCODE $$NVCCFLAGS -o ${QMAKE_FILE_OUT} ${QMAKE_FILE_NAME}
cuda.dependency_type = TYPE_C
cuda.depend_command  = $$CUDA_DIR/bin/nvcc -M ${QMAKE_FILE_NAME} | sed \"s/^.*: //\" #For Qt 5.12.2
cuda.input           = CUDA_SOURCES
cuda.output          = ${OBJECTS_DIR}${QMAKE_FILE_BASE}_cuda.o
# Tell Qt that we want add more stuff to the Makefile
QMAKE_EXTRA_COMPILERS += cuda

Robert_Crovella · July 3, 2023, 8:36pm

You don’t do this in CUDA:

What you have there is a struct definition not an instantiation. struct definitions don’t get tagged with __device__.

If you would like to instantiate a __device__ variable of that type, you would do:

struct Voxel
{
  uint8_t ID = 0;
  uint8_t hit_counter = 0;
};

__device__ struct Voxel v;  // the actual global device variable

Beyond that, if you are still having trouble, I suggest:

test against the latest version of CUDA
provide a short, complete example, with a description of the compile failure, along with the CUDA version you are using and the compile command line that caused the failure.

This isn’t really a forum for assistance with Qt. Also, please don’t post pictures of text on this forum.

I also note that eventually your variable will occupy over 2GB of space. Is that your intent? I would suggest to manage that with a dynamic allocation (e.g. cudaMalloc()) rather than a static allocation.

On CUDA 12.0.1, I see that compiling this:

#include <cstdio>
#include <cstdint>
struct Voxel
{
  uint8_t ID = 0;
  uint8_t hit_counter = 0;
};

struct Chunk_level_1
{
  uint8_t counter = 0;
  Voxel voxel [10][10][10] = {0}; // 1 000
};

struct Chunk_level_2
{
  uint8_t counter = 0;
  Chunk_level_1 Chk_Lv1 [10][10][10]= {0};  // 1 000 000
};

__device__ Chunk_level_2 Chk_Lv2 [10][10][10] = {0};  //1 000 000 000

__global__ void k(){
        printf("%c\n", Chk_Lv2[1][2][3].counter);
}

int main(){

        k<<<1,1>>>();
        cudaDeviceSynchronize();
}

Takes a “very long” time. On the other hand, I don’t have trouble with this:

#include <cstdio>
#include <cstdint>
struct Voxel
{
  uint8_t ID = 0;
  uint8_t hit_counter = 0;
};

struct Chunk_level_1
{
  uint8_t counter = 0;
  Voxel voxel [10][10][10] = {0}; // 1 000
};

struct Chunk_level_2
{
  uint8_t counter = 0;
  Chunk_level_1 Chk_Lv1 [10][10][10]= {0};  // 1 000 000
};


__global__ void k(Chunk_level_2 *c){
        printf("%c\n", c[(((1*10)+2)*10)+3].counter);
}

int main(){

        Chunk_level_2 *dc;
        cudaMalloc(&dc,  sizeof(Chunk_level_2)*10*10*10);
        cudaMemset(dc,0, sizeof(Chunk_level_2)*10*10*10);
        k<<<1,1>>>(dc);
        cudaDeviceSynchronize();
}

~~So if you’re not happy with that, and you want the __device__ variable instead, I would suggest:~~

~~retest on the latest version of CUDA~~

~~if it still manifests, file a bug.~~

According to my testing, it doesn’t seem to be possible to create a __device__ variable larger than 2GB. So I wouldn’t bother with that approach.

njuffa · July 3, 2023, 9:07pm

Compilation time and compiler memory requirements will grow with the size of the variable. I have not checked to confirm but as I recall, a statically initialized variable requires the entire variable to be written out to the object file. So with a 2GB statically initialized variable the object file will be > 2GB. Given enough system memory, disk space, and compilation time (20 minutes or so :-), the build should complete eventually.

Don’t do that. Dynamic allocation and use of cudaMemset() is the approach i would advise.

That seems like a compelling reason :-) Presumably a quite reasonable limitation to use a signed 32-bit integer somewhere (object file format?) to represent data length.

system · July 17, 2023, 9:07pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
__device__ variariable doubt CUDA Programming and Performance	1	3632	June 6, 2008
Arrays of Structure Allocating memory for array of structures. CUDA Programming and Performance	7	3692	September 24, 2009
3D data structure woes T_T CUDA Programming and Performance	6	1869	February 18, 2013
Some easy, but useful questions CUDA Programming and Performance	6	4171	July 11, 2008
nvcc Segmentation Fault (addressing issue with stack vars?) local to global gather CUDA Programming and Performance	8	9980	August 15, 2010
memory overflow? CUDA Programming and Performance	1	5147	July 27, 2009
How to allocate a 3d array such that you can use the indecies to access its elements CUDA Programming and Performance	20	5419	October 24, 2009
Compilation error CUDA Programming and Performance	4	966	April 17, 2010
cuda defined struct variable error CUDA Programming and Performance	2	1485	September 8, 2009
Shared Memory initialization CUDA Programming and Performance	19	45327	March 26, 2007

Why nested 3D data structures cause a crash cuda?

Related topics