Same algorithm different results

Linny · August 13, 2008, 9:16am

I have a problem that I can’t wrap my mind around. I’m using the cutil_math.h in my project.

I have a function that looks like this and works:

inline __device__ __host__

float3 matrix4x4Mulf3( const float* m, const float3 &a )

{

  float x = a.x * m[0] + a.y * m[1] + a.z * m[2] + m[3];

 Â float y = a.x * m[4] + a.y * m[5] + a.z * m[6] + m[7];

 Â float z = a.x * m[8] + a.y * m[9] + a.z * m[10] + m[11];

Â return make_float3(x,y,z);

}

an another implementation that looks like this and doesn’t work:

inline __device__ __host__

float3 matrix4x4Mulf3( const float* m, const float3 &a )

{

 Â float4 s = make_float4(a,1);

 Â float x = dot( s, make_float4( m[0], m[1], m[2], m[3] ) );

 Â float y = dot( s, make_float4( m[4], m[5], m[6], m[7] ) );

 Â float z = dot( s, make_float4( m[8], m[9], m[10], m[11] ) );

Â return make_float3( s );

}

What’s also interesting to note is the fact that the non working version seems to require dramatically less registers. The kernels registers go from 31 down to 10 when using the second implementation, however the results are wrong.

ilghiz · August 13, 2008, 10:44am

Hi,

is it in the last operand in the second example, I mean:

Â return make_float3(x,y,z);

instead of

Â return make_float3( s );

Regards

Ilghiz

Linny · August 13, 2008, 11:51am

EDIT:
I really should sleep more…you’re right, I’m returning the wrong value.
There is almost no difference in speed with the code when comparing both implementations or register usage.

elegant-math · August 13, 2008, 12:01pm

since you do not use x, y, z variables, the compiler optimize the source and remove all computational part concerned to these variables. There is a reason of register reduction.