Hi all,

So I have added the following code to a project of mine. This is the first cuda function of many I expect to add. Well, I added it, everything seems great, except when comparing the answer to my standard C code, I find out that the answer has changed! (Significantly - this is not due to a difference in floating point standard…)

I ran the code through cuda-gdb (and I have an 8800GTS, so I couldn’t see what was actually going on the kernel side, just before and after), and it showed me that the answers (modified members of a structure) were NOT changed at all. This finding seems to make sense with what my results show. This lead me to believe that my cudaMemcpy needs work, but that looks fine to me.

Does anyone have any ideas/suggestions? Because this is a large piece of code I will post 1) the kernel function and the data loading and unloading functions, and 3) the piece of my code where it is called.

```
void loadneutlist(unsigned int listlength, neutron* host, neutron* device)
{
//Assumes cudaMalloc already run for the lists
cudaMemcpy(device, host, listlength*sizeof(neutron), cudaMemcpyHostToDevice);
}
void getneutlist(unsigned int listlength, neutron* host, neutron* device)
{
//Assumes cudaMalloc already run for the lists
cudaMemcpy(host, device, listlength*sizeof(neutron), cudaMemcpyDeviceToHost);
}
__global__ void gpuscatter_iso(neutron* elist_d)
//This calcs a new u,v,w, and energy after an isotropic collision
{
int A=mat_list_d[elist_d[threadIdx.x].cell][elist_d[threadIdx.x].target_nuclide].zaid/1000;
A=mat_list_d[elist_d[threadIdx.x].cell][elist_d[threadIdx.x].target_nuclide].zaid-1000*A;
float mu_cm=gpurng(&elist_d[threadIdx.x].seed,2.0)-1;
float new_energy=elist_d[threadIdx.x].energy*(A*A+2*A*mu_cm+1)/((float)(A*A+2*A+1));
float temp=sqrt(elist_d[threadIdx.x].energy/new_energy);
float cos_phi=cos(atan(sin(acos(mu_cm))/(1.0/A+mu_cm)));
float sin_phi=sin(acos(cos_phi));
float cos_w=gpurng(&elist_d[threadIdx.x].seed,2.0)-1;
float sin_w=sin(acos(cos_w));
temp=sin_phi/(sqrt(1-(elist_d[threadIdx.x].oz)*(elist_d[threadIdx.x].oz)));//reused to save space
float new_u=temp*((elist_d[threadIdx.x].oy)*sin_w-(elist_d[threadIdx.x].oy)*(elist_d[threadIdx.x].ox)*cos_w)+(elist_d[threadIdx.x].ox)*cos_phi;
float new_v=temp*(-(elist_d[threadIdx.x].ox)*sin_w-(elist_d[threadIdx.x].oz)*(elist_d[threadIdx.x].oy)*cos_w)+(elist_d[threadIdx.x].oy)*cos_phi;
float new_w=sin_phi*sqrt(1-(elist_d[threadIdx.x].oz)*(elist_d[threadIdx.x].oz))*cos_w+(elist_d[threadIdx.x].oz)*cos_phi;
temp =new_u*new_u+new_v*new_v+new_w*new_w;
if (temp>1.0)
{
temp=1/temp;
new_u=new_u*temp;
new_v=new_v*temp;
new_w=new_w*temp;
}
elist_d[threadIdx.x].ox=new_u;
elist_d[threadIdx.x].oy=new_v;
elist_d[threadIdx.x].oz=new_w;
elist_d[threadIdx.x].energy=new_energy;
}
```

```
loadneutlist(elength, elist, elist_d);
gpuscatter_iso<<<1,elength>>>(elist_d);
getneutlist(elength, elist, elist_d);
```

After this point elist is treated as the result and elist_d is unnecessary.

Any help is GREATLY appreciated,

Thanks!