The cutil_math.h is provided with NVIDIA_GPU_Computing_SDK as convenient functions on the built in variable (int, int2, int4, …) . I found it quite useful in practice especially when developing functions with template. The cutil_math is included from the CUDA 1.1 until now and I can see it regularly updated however surprisingly there are still many problems with current cutil_math.h even incorrect implementation of the functions, that can incidentally leads to the wrong results that I want to address here
Wrong implementation of divide operator
Lack of uint type functions
Lack of many essential functions
Inconsistent function coding interfaces
First of all, wrong implementation of divide operator.
[codebox]inline host device float2 operator/(float s, float2 a){
float inv = 1.0f / s;
return a * inv;
}[/codebox]
this is completely wrong for what it should be
[codebox]inline host device float2 operator/(float s, float2 a){
return make_float2(s /a.x, s /a.y);
}[/codebox]
It is unbelievable since it is written by someone from NVIDIA.
Second it lacks of essential functions for uint. I see some functions for uint3 but what about uint2, uint4. Though we can say uint and int is similar, i don’t want to use the type cast in my program to do the jobs while it is trivial and equivalent.
Third it lacks of function across the built-in types. If you define the function on one type then you should define it for other types. For example, the negate functions is only define for float2, float3, float4, int2 , int3 but not int4. Many other functions also are in the same situation.
The implementation of the function has mixed between different coding standard
[codebox]// max
static inline host device float4 fmaxf(float4 a, float4 b )
{
return make_float4(fmaxf(a.x,b.x), fmaxf(a.y,b.y), fmaxf(a.z,b.z), fmaxf(a.w,b.w));
}
// addition
inline host device float4 operator+(float4 a, float4 b )
{
return make_float4(a.x + b.x, a.y + b.y, a.z + b.z, a.w + b.w);
}[/codebox]
It looks like that people just add their own implementation without caring about what have been there.
So I try to reorganize and add more essential functions to it. Hopefully you will find it more useful.
After the discussion with alexish, I agree with him on the safety of the cutil_math.h, the new fcutil_math_safe.h remove all vector +/ scalar, scalar / vector, and vector * vector, vector / vector function. I also add double2 functions.
Use it at your own risk, but if you spot any problem, please inform me
cutil_math_safe.h.zip (3.28 KB)
cutil_math.h.zip (3.97 KB)