in C or C++, the things you are showing here are referred to as static allocations. They don’t depend on a dynamic allocator such as malloc or new in order to provide storage space.
Managed memory has both static and dynamic allocation methods. cudaMallocManaged is the dynamic allocator, whereas __managed__ is the decorator used to provide a static allocation (but still managed).
To adhere most closely syntactically to what you have shown, the natural choice is to use a static (managed) allocation scheme:
$ cat t1884.cu
#include <cstdio>
#include <cstdint>
struct MyData { // declare
uint32_t x[512];
uint64_t y;
};
__managed__ struct MyData myData[] = { // initialize
{ { 0 }, 0xffffffffffffffff }
};
__global__ void k(struct MyData *d){
printf("%u, %lu\n", d[0].x[0], d[0].y);
}
int main(){
printf("from host code:\n");
printf("%u, %lu\n", myData[0].x[0], myData[0].y);
printf("from device code:\n");
k<<<1,1>>>(myData);
cudaDeviceSynchronize();
}
$ nvcc -o t1884 t1884.cu
$ compute-sanitizer ./t1884
========= COMPUTE-SANITIZER
from host code:
0, 18446744073709551615
from device code:
0, 18446744073709551615
========= ERROR SUMMARY: 0 errors
$
Of course, we could realize something similar using cudaMallocManaged, but it will look a bit different, just as your host code would look different if you used a dynamic allocator to provide storage for myData.
One of many possible limitations of using the __managed__ decorator is that variables of that type must live at global scope. If that or any other limitation is problematic, then the usual advice is to switch to a dynamic allocation scheme using cudaMallocManaged. There are many possible realizations here, but it could look like this:
$ cat t1884.cu
#include <cstdio>
#include <cstdint>
struct MyData { // declare
uint32_t x[512];
uint64_t y;
};
__global__ void k(struct MyData *d){
printf("%u, %lu\n", d[0].x[0], d[0].y);
}
int main(){
int array_size = 1;
struct MyData *myData;
cudaMallocManaged(&myData, array_size*sizeof(MyData));
myData[0] = { { 0 }, 0xffffffffffffffff };
printf("from host code:\n");
printf("%u, %lu\n", myData[0].x[0], myData[0].y);
printf("from device code:\n");
k<<<1,1>>>(myData);
cudaDeviceSynchronize();
}
$ nvcc -o t1884 t1884.cu
$ compute-sanitizer ./t1884
========= COMPUTE-SANITIZER
from host code:
0, 18446744073709551615
from device code:
0, 18446744073709551615
========= ERROR SUMMARY: 0 errors
$