x86 and cuda precision < shader w/ 16-bit floats

After trying to test out ray tracing on an AMD x2 x86 cpu and also with an 8800gtx using CUDA, the precision was worse than writing a ray tracing shader which used 16-bit floats.

16-bit floats shader test:

the cpu and cuda verison produce the same result of the “missed” rays with these 2 intersecting planes containing 32 triangles with 25 indices, each:

Möller’s ray-triangle intersection is used and is exactly the same on host/device vs shader. The missed rays come from checking (u+v) > 1. With the missed rays on the plane, (u+v) is 1.000000. An error check of 0.000001f was used to fill in the gaps, but there is still the precision error of determining which poly is closer. However, the error check isn’t used in the shader and will return crisp edges, unlike those shown in the above image. The only difference is the shader using halfs, while the other implementations use floats.

Any ideas?

Thanks,

Mike.

EDIT:

The resulting image is still the same on a core 2 quad.

The intersection code:

DEVICE_HOST_FUNCTION void Ray3ToTriangle(Collision* collision, Ray3& ray, float3& vert0, float3& vert1, float3& vert2)

{

	float3 edge1, edge2, pvec;

	float3_subtract(edge1, vert1, vert0);

	float3_subtract(edge2, vert2, vert0);

	float3_cross(pvec, ray.direction, edge2);

	const float determinant = float3_dot(pvec, edge1);

	const float inverseDeterminant = 1.0f / determinant;

	collision->isCollision = false;

	if(determinant > 0)

	{

  float3 tvec, qvec;

  float3_subtract(tvec, ray.origin, vert0);

 const float u = float3_dot(tvec, pvec) * inverseDeterminant;

  if(u < 0.0f || u > 1.0f)

  {

  	return;

  }

 float3_cross(qvec, tvec, edge1);

 const float v = float3_dot(ray.direction, qvec) * inverseDeterminant;

  if(v < 0.0f || (u + v) > 1.0f)

  {

  	return;

  }

 collision->distance = float3_dot(edge2, qvec) * inverseDeterminant;

  if(collision->distance > 0.0f)

  {

  	collision->isCollision = true;

  	float3_multiplyConstant(collision->point, ray.direction, collision->distance);

  	float3_add(collision->point, collision->point, ray.origin);

 	collision->u = u;

  	collision->v = v;

 	float3_cross(collision->normal, edge1, edge2);

  	float3_normalize(collision->normal);

  }

	}

	else if(determinant < 0)

	{

  float3 tvec, qvec;

  float3_subtract(tvec, ray.origin, vert0);

 const float inverseDeterminant = 1.0f / determinant;

  const float u = float3_dot(tvec, pvec) * inverseDeterminant;

  if(u < 0.0f || u > 1.0f)

  {

  	return;

  }

 float3_cross(qvec, tvec, edge1);

 const float v = float3_dot(ray.direction, qvec) * inverseDeterminant;

  if(v < 0 || (u + v) > 1.0f)

  {

  	return;

  }

 collision->distance = float3_dot(edge2, qvec) * inverseDeterminant;

 if(collision->distance > 0.0f)

  {

  	collision->isCollision = true;

  	float3_multiplyConstant(collision->point, ray.direction, collision->distance);

  	float3_add(collision->point, collision->point, ray.origin);

 	collision->u = u;

  	collision->v = v;

 	float3_cross(collision->normal, edge1, edge2);

  	float3_normalize(collision->normal);

  }

	}

}