Help With Some Simple Code

FlamingBee · March 11, 2009, 2:24pm

Hello. I’m currently engaged in a third year project in computer science at the University of Manchester, UK. I’m attempting to simulate the image artifacts that occur when imaging objects with nanoscale structure (using techniques such as atomic force microscopy).

I’m attempting to work out the distance from a point on the tip (represented as a 3D volume) to another point in space (also represented as a 3D volume), and then working out the force at that point (based on this distance value). The code I have working in java currently is as follows:

[codebox] //tip position is just an int {l, m, n}

		for (int i = 0; i < returnMatrix.slices(); i++) {

			for (int j = 0; j < returnMatrix.rows(); j++) {

				for (int k = 0; k < returnMatrix.columns(); k++) {

					int[] gridPosition = {i, j, k};

					double force = calculatePotential(gridPosition, tipPosition);

					double currentValue = returnMatrix.getQuick(i, j, k);

					returnMatrix.setQuick(i, j, k, currentValue + force);

				}//for k

			}//for j

		}//for i[/codebox]

I made an attempt at writing this in CUDA, and I think it ought to work, but it doesn’t. Most of the values come out wildly wrong, or NAN.

[codebox]global void calculateForceMatrix(int N, float *result, float tipGridSpacing, float objectGridSpacing, float surfaceCentre, float tipCentre, int surfaceMatrixHeight, int l, int m, int n){

int i = blockIdx.x * blockDim.x + threadIdx.x;

int j = blockIdx.y * blockDim.y + threadIdx.y;

int k = blockIdx.z * blockDim.z + threadIdx.z;

if(i < N && j < N && k < N){

	//result[i][j][k] = dist(i,j,k);

	float xDimension = ((tipCentre-i) * tipGridSpacing) + ((surfaceCentre-l) * objectGridSpacing);

	float yDimension = ((tipCentre-j) * tipGridSpacing) + ((surfaceCentre-m) * objectGridSpacing);

	float zDimension = ((surfaceMatrixHeight - k) * objectGridSpacing) + (n * tipGridSpacing);

	float distance = fabs(xDimension) + fabs(yDimension) + fabs(zDimension);

	float sixthPow = pow(1/distance, 6);

	float closeTerm = sixthPow * sixthPow;

	float farTerm = sixthPow;

	float lennardJonesForce = -24*(2*closeTerm - farTerm);

	

	int test = i*N*N + j*N + k;

	result[test] = lennardJonesForce;

	//int index = ((i*N*N)+{j*N)+k);

	//result[index] = lennardJonesForce;

}

}//kernel[/codebox]

[codebox] float *a_h, *a_d; // Pointer to host & device arrays

size_t size = N * N * N * sizeof(float);

a_h = (float *)malloc(size);        // Allocate array on host

cudaMalloc((void **) &a_d, size);   // Allocate array on device

//…

// Do calculation on device:

dim3 dimBlock(N, N, N);

calculateForceMatrix<<<1, dimBlock>>>(N, a_d, tipGridSpacing, objectGridSpacing, surfaceCentre, tipCentre, surfaceMatrixHeight, l, m, n);

// Retrieve result from device and store it in host array

cudaMemcpy(a_h, a_d, sizeof(float)*N*N*N, cudaMemcpyDeviceToHost);

for (int i=0; i<N*N*N; i++) printf("%d %f\n", i, a_h[i]);[/codebox]

I’m sure i’m doing something blindingly obviously wrong, but I just can’t see it. Any help would be much appreciated.

Thanks,

FlamingBee

Jamie_K · March 11, 2009, 5:41pm

I scanned the code and nothing looks “wrong” although a few things I did notice:

Distance is the sum of absolute values of x, y, and z, which is a bit odd. I would have expected it to be the square, but I don’t know the mathematics of your problem. I’ll assume absolute value is what you want.
For a constant integer power, it’s generally better to use multiplication instead of pow(). Also, pow() is a double-precision function and I’m not sure how it behaves without compute capability 1.3. Prefer powf() if you’re using float data. EDIT: Nevermind, on 1.2 and lower, double-precision functions are mapped to their single-precision equivalents, so it won’t give bad values. But you should still prefer multiplication for constant integer powers.
A thread block that’s N x N x N will only work if N is 8 or less. I assume this is the case or it wouldn’t run. And I also assume you know that running a single block will underutilize your GPU.

None of these things is really wrong, but that’s all I saw.

FlamingBee · March 11, 2009, 9:01pm

I scanned the code and nothing looks “wrong” although a few things I did notice:

Distance is the sum of absolute values of x, y, and z, which is a bit odd. I would have expected it to be the square, but I don’t know the mathematics of your problem. I’ll assume absolute value is what you want.

For a constant integer power, it’s generally better to use multiplication instead of pow(). Also, pow() is a double-precision function and I’m not sure how it behaves without compute capability 1.3. Prefer powf() if you’re using float data. EDIT: Nevermind, on 1.2 and lower, double-precision functions are mapped to their single-precision equivalents, so it won’t give bad values. But you should still prefer multiplication for constant integer powers.

A thread block that’s N x N x N will only work if N is 8 or less. I assume this is the case or it wouldn’t run. And I also assume you know that running a single block will underutilize your GPU.

None of these things is really wrong, but that’s all I saw.

Yep, my mistake :) I was tired…
Yeah, of course. I typically switch it to that once I know it’s working (it’s easier to read with pow).
I actually didn’t know that. I thought you could have any number of threads per block. Realistically N is going to be about 60 :S I also didn’t know that running a single block would under utilize the GPU. Clearly I’ve still got a lot to read up on.

Topic		Replies	Views
Optimising CUDA code for Lennard Jones Potential Calculation CUDA Programming and Performance	6	960	August 3, 2017
Newbie Question: Moving nested for loop CPU code to GPU. CUDA Programming and Performance	4	4586	December 2, 2015
Is this a task in which CUDA could speed up things? CUDA Programming and Performance	11	11969	September 29, 2009
Simple Kernel, but having problems...ideas CUDA Programming and Performance	16	8127	April 24, 2011
I am calculating distance between two points without using threads but i need to calculate distance CUDA Programming and Performance	4	1666	January 21, 2015
having problem with simpe CUDA code Code debug CUDA Programming and Performance	4	1709	November 7, 2009
how to implement double for loops in CUDA CUDA Programming and Performance	23	16080	January 30, 2012
help to go in the good direction with CUDA a simple but slow code in C to optimise CUDA Programming and Performance	7	3892	September 24, 2008
Density map Getting mad with cuda CUDA Programming and Performance	12	7924	July 1, 2008
modifying simulator to work with CUDA.?. CUDA Programming and Performance	36	18856	April 15, 2009

Help With Some Simple Code

Related topics