Using cudaMallocManaged with C++ Structure

In Course Fundamentals of Accelerated Computing C/C++ Asynchronous Streaming, and Visual Profiling for Accelerated Applications with CUDA C/C++ the final programming task n-body problem has me stumped.
I think I am not using the cudaMallocManaged correctly to get the structure data p into the kernel.
I tried to no avail:

global void bodyForce(Body *p, float dt, int n) {

int bytes = nBodies * sizeof(Body);
float *buf;

buf = (float *)malloc(bytes);

Body p = (Body)buf;
cudaMallocManaged(&p, sizeof( Body));


bodyForce<<< 1,1 >>>(p, dt, nBodies); // compute interbody forces

The p[i].x etc. data are all zeros.

Can I get access to the solution program so I can see what the problem is? Give me an F on the course, I don’t care.

If you still have this question, I’d suggest asking it in one of the CUDA programming forums. Like, maybe this one:

https://devtalk.nvidia.com/default/board/57/cuda-programming-and-performance/