Calling device functions

Hi all,

I have a kernel which calls various functions.

I have a few functions defined for vector maths (yeah, i know there’s a library for this) defined inlined as follows:

inline __device__ float dotProduct(float3 A, float3 B)

{

  return (A.x * B.x) + (A.y * B.y) + (A.z * B.z);

}

I then use these functions in my kernal, and I don’t have any problems.

I then have a non-inlined kernal defined as follows:

__device__ float calculate(struct x)

x is a fairly small structure, requiring 68 bytes of data. I am happy that this code works, I have tested this function on its own.

However, when called from my kernal it does not return the expected answer. If I plug in the expected answer manually the kernal returns the correct result.

My array of structures has 128 elements. The kernal loads them all into shared memory, of which I have 8704 bytes allocated. I am running 1 block of 128 threads.

Can anyone suggest why my function call might not be working?