Hello. I’m currently engaged in a third year project in computer science at the University of Manchester, UK. I’m attempting to simulate the image artifacts that occur when imaging objects with nanoscale structure (using techniques such as atomic force microscopy).

I’m attempting to work out the distance from a point on the tip (represented as a 3D volume) to another point in space (also represented as a 3D volume), and then working out the force at that point (based on this distance value). The code I have working in java currently is as follows:

[codebox] //tip position is just an int {l, m, n}

```
for (int i = 0; i < returnMatrix.slices(); i++) {
for (int j = 0; j < returnMatrix.rows(); j++) {
for (int k = 0; k < returnMatrix.columns(); k++) {
int[] gridPosition = {i, j, k};
double force = calculatePotential(gridPosition, tipPosition);
double currentValue = returnMatrix.getQuick(i, j, k);
returnMatrix.setQuick(i, j, k, currentValue + force);
}//for k
}//for j
}//for i[/codebox]
```

I made an attempt at writing this in CUDA, and I think it ought to work, but it doesn’t. Most of the values come out wildly wrong, or NAN.

[codebox]**global** void calculateForceMatrix(int N, float *result, float tipGridSpacing, float objectGridSpacing, float surfaceCentre, float tipCentre, int surfaceMatrixHeight, int l, int m, int n){

```
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
int k = blockIdx.z * blockDim.z + threadIdx.z;
if(i < N && j < N && k < N){
//result[i][j][k] = dist(i,j,k);
float xDimension = ((tipCentre-i) * tipGridSpacing) + ((surfaceCentre-l) * objectGridSpacing);
float yDimension = ((tipCentre-j) * tipGridSpacing) + ((surfaceCentre-m) * objectGridSpacing);
float zDimension = ((surfaceMatrixHeight - k) * objectGridSpacing) + (n * tipGridSpacing);
float distance = fabs(xDimension) + fabs(yDimension) + fabs(zDimension);
float sixthPow = pow(1/distance, 6);
float closeTerm = sixthPow * sixthPow;
float farTerm = sixthPow;
float lennardJonesForce = -24*(2*closeTerm - farTerm);
int test = i*N*N + j*N + k;
result[test] = lennardJonesForce;
//int index = ((i*N*N)+{j*N)+k);
//result[index] = lennardJonesForce;
}
```

}//kernel[/codebox]

[codebox] float *a_h, *a_d; // Pointer to host & device arrays

```
size_t size = N * N * N * sizeof(float);
a_h = (float *)malloc(size); // Allocate array on host
cudaMalloc((void **) &a_d, size); // Allocate array on device
```

//…

```
// Do calculation on device:
dim3 dimBlock(N, N, N);
calculateForceMatrix<<<1, dimBlock>>>(N, a_d, tipGridSpacing, objectGridSpacing, surfaceCentre, tipCentre, surfaceMatrixHeight, l, m, n);
// Retrieve result from device and store it in host array
cudaMemcpy(a_h, a_d, sizeof(float)*N*N*N, cudaMemcpyDeviceToHost);
for (int i=0; i<N*N*N; i++) printf("%d %f\n", i, a_h[i]);[/codebox]
```

I’m sure i’m doing something blindingly obviously wrong, but I just can’t see it. Any help would be much appreciated.

Thanks,

FlamingBee