Here are some parts of CUDA code from the CUDA course task. I think pointerp
should point to the address of the first element in Body
, but then Body
has six floats, Not quite sure why p
and buf
are same. I try to write something like cudaMallocManaged(&p, 6*bytes)
, but why it cannot help to improve performance?
typedef struct { float x, y, z, vx, vy, vz; } Body;
int main(const int argc, const char** argv)
int nBodies = 2<<11;
int bytes = nBodies * sizeof(Body);
float *buf;
cudaMallocManaged(&buf, bytes);
Body *p = (Body*)buf;