Hi all,
I have a kernel which calls various functions.
I have a few functions defined for vector maths (yeah, i know there’s a library for this) defined inlined as follows:
inline __device__ float dotProduct(float3 A, float3 B)
{
return (A.x * B.x) + (A.y * B.y) + (A.z * B.z);
}
I then use these functions in my kernal, and I don’t have any problems.
I then have a non-inlined kernal defined as follows:
__device__ float calculate(struct x)
x is a fairly small structure, requiring 68 bytes of data. I am happy that this code works, I have tested this function on its own.
However, when called from my kernal it does not return the expected answer. If I plug in the expected answer manually the kernal returns the correct result.
My array of structures has 128 elements. The kernal loads them all into shared memory, of which I have 8704 bytes allocated. I am running 1 block of 128 threads.
Can anyone suggest why my function call might not be working?