You don’t do this in CUDA:
What you have there is a struct definition not an instantiation. struct definitions don’t get tagged with __device__
.
If you would like to instantiate a __device__
variable of that type, you would do:
struct Voxel
{
uint8_t ID = 0;
uint8_t hit_counter = 0;
};
__device__ struct Voxel v; // the actual global device variable
Beyond that, if you are still having trouble, I suggest:
- test against the latest version of CUDA
- provide a short, complete example, with a description of the compile failure, along with the CUDA version you are using and the compile command line that caused the failure.
This isn’t really a forum for assistance with Qt. Also, please don’t post pictures of text on this forum.
I also note that eventually your variable will occupy over 2GB of space. Is that your intent? I would suggest to manage that with a dynamic allocation (e.g. cudaMalloc()
) rather than a static allocation.
On CUDA 12.0.1, I see that compiling this:
#include <cstdio>
#include <cstdint>
struct Voxel
{
uint8_t ID = 0;
uint8_t hit_counter = 0;
};
struct Chunk_level_1
{
uint8_t counter = 0;
Voxel voxel [10][10][10] = {0}; // 1 000
};
struct Chunk_level_2
{
uint8_t counter = 0;
Chunk_level_1 Chk_Lv1 [10][10][10]= {0}; // 1 000 000
};
__device__ Chunk_level_2 Chk_Lv2 [10][10][10] = {0}; //1 000 000 000
__global__ void k(){
printf("%c\n", Chk_Lv2[1][2][3].counter);
}
int main(){
k<<<1,1>>>();
cudaDeviceSynchronize();
}
Takes a “very long” time. On the other hand, I don’t have trouble with this:
#include <cstdio>
#include <cstdint>
struct Voxel
{
uint8_t ID = 0;
uint8_t hit_counter = 0;
};
struct Chunk_level_1
{
uint8_t counter = 0;
Voxel voxel [10][10][10] = {0}; // 1 000
};
struct Chunk_level_2
{
uint8_t counter = 0;
Chunk_level_1 Chk_Lv1 [10][10][10]= {0}; // 1 000 000
};
__global__ void k(Chunk_level_2 *c){
printf("%c\n", c[(((1*10)+2)*10)+3].counter);
}
int main(){
Chunk_level_2 *dc;
cudaMalloc(&dc, sizeof(Chunk_level_2)*10*10*10);
cudaMemset(dc,0, sizeof(Chunk_level_2)*10*10*10);
k<<<1,1>>>(dc);
cudaDeviceSynchronize();
}
So if you’re not happy with that, and you want the __device__
variable instead, I would suggest:
- retest on the latest version of CUDA
if it still manifests, file a bug.
According to my testing, it doesn’t seem to be possible to create a __device__
variable larger than 2GB. So I wouldn’t bother with that approach.