In Course Fundamentals of Accelerated Computing C/C++ Asynchronous Streaming, and Visual Profiling for Accelerated Applications with CUDA C/C++ the final programming task n-body problem has me stumped.
I think I am not using the cudaMallocManaged correctly to get the structure data p into the kernel.
I tried to no avail:
global void bodyForce(Body *p, float dt, int n) {
…
int bytes = nBodies * sizeof(Body);
float *buf;
buf = (float *)malloc(bytes);
Body p = (Body)buf;
cudaMallocManaged(&p, sizeof( Body));
…
bodyForce<<< 1,1 >>>(p, dt, nBodies); // compute interbody forces
The p[i].x etc. data are all zeros.
Can I get access to the solution program so I can see what the problem is? Give me an F on the course, I don’t care.