Revisit cutil_math.h Bugs in cutil_math.h , redesign and add more functions

The cutil_math.h is provided with NVIDIA_GPU_Computing_SDK as convenient functions on the built in variable (int, int2, int4, …) . I found it quite useful in practice especially when developing functions with template. The cutil_math is included from the CUDA 1.1 until now and I can see it regularly updated however surprisingly there are still many problems with current cutil_math.h even incorrect implementation of the functions, that can incidentally leads to the wrong results that I want to address here

    Wrong implementation of divide operator

    Lack of uint type functions

    Lack of many essential functions

    Inconsistent function coding interfaces

First of all, wrong implementation of divide operator.

[codebox]inline host device float2 operator/(float s, float2 a){

float inv = 1.0f / s;

return a * inv;


this is completely wrong for what it should be

[codebox]inline host device float2 operator/(float s, float2 a){

return make_float2(s /a.x, s /a.y);


It is unbelievable since it is written by someone from NVIDIA.

Second it lacks of essential functions for uint. I see some functions for uint3 but what about uint2, uint4. Though we can say uint and int is similar, i don’t want to use the type cast in my program to do the jobs while it is trivial and equivalent.

Third it lacks of function across the built-in types. If you define the function on one type then you should define it for other types. For example, the negate functions is only define for float2, float3, float4, int2 , int3 but not int4. Many other functions also are in the same situation.

The implementation of the function has mixed between different coding standard

[codebox]// max

static inline host device float4 fmaxf(float4 a, float4 b )


return make_float4(fmaxf(a.x,b.x), fmaxf(a.y,b.y), fmaxf(a.z,b.z), fmaxf(a.w,b.w));


// addition

inline host device float4 operator+(float4 a, float4 b )


return make_float4(a.x + b.x, a.y + b.y, a.z + b.z,  a.w + b.w);


It looks like that people just add their own implementation without caring about what have been there.

So I try to reorganize and add more essential functions to it. Hopefully you will find it more useful.

After the discussion with alexish, I agree with him on the safety of the cutil_math.h, the new fcutil_math_safe.h remove all vector +/ scalar, scalar / vector, and vector * vector, vector / vector function. I also add double2 functions.

Use it at your own risk, but if you spot any problem, please inform me (3.28 KB) (3.97 KB)

What’s wrong with that ?

[codebox]inline host device float2 operator/(float s, float2 a){

float inv = 1.0f / s;

return a * inv;


this code is perfectly consistent with the mathematical definition of the multiplication of a vector (in this case a float2 vector)

by a scalar (in this case 1/s).

Looking at your beautiful vector/vector operator i am waiting for the famous matrix/matrix one !


inline host device float2 operator/(float2 a, float2 B){

return make_float2(a.x / b.x, a.y / b.y);



Your linear algebra skills are unbelievable …

Edit: I am to rude, but mathematically speaking this thing is not an operator, so if you need this king of feature don’t code it as an operator !

Note the parameter ordering in the function definition. The float comes first, so this is the operation (scalar / vector), which no one expects would give the same result as (vector / scalar). If you take the interpretation that math operations on a vector should distribute element-wise, which many multidimensional array packages do, then Linh Ha’s version makes more sense.

I’m guessing that it won’t be long until tmurray joins this thread to express his disappointment in cutil :)


Ok, for the position of the arguments, but scalar/vector is certainly not a mathematical operation. Btw if you need something that distribute the operator element wise you can program it but it’s not an algebraic operator.
I think that coding operators that are not related to any mathematical function can lead to terrible mistakes.

True, yet they are commonly used in shading languages such as GLSL, HLSL, Cg,…


if you have b + ) then you have B), if you think i can insert such a code onto C++ and it can be compiled.

I don’t know if you return result as a matrix, it make sense but is there any built-in type as matrix

The operator have some meaning, people use it since it does make some sense. What I want to say it is a terrible mistake it is already there in cutil_math.h, i do not create it, if any one happens to use it, the result is completely wrong . I try to fix it, at least to make it more sense. That is all, I want to raise the problem, and thank you for the feed back highly appreciated. Could you suggest a better solution

Sure but a lot of cuda users are doing scientific computations and having in cutil_math.h a default vector/vector

operator seems too dangerous to me.

You are right, indeed i have (i supposed like many others) my own version of cutil_math.

My suggestion in very simple due to the name cutil_math and to the fact that cutil_math must be used safely

in scientific coding, restrict the operators defined in cutil_math to the mathematically correct operators (work you just done).

Everything else should be defined elsewhere by the user.

I’m with you on that one, unless they provide full documentation on the default operators


They are dangerous when you don’t know exactly what is doing. When I write float2 r = 1.f / (2.0, 3.0), i will never expect it return (2.0, 3.0) in any case (even matrix operation). Lots of people will be upset to know that their function produce incorrect results because they use something they definitely believe to be correct.

It’s clear that you don’t write float2 r = 1.f / (2.0, 3.0) … but you can write something like :

f = 4.0fmmu_amu_bdot(r,W)/(urho_arho_b) (with u, r, w and f vectors and everything else scalars)

instead of :

f = 4.0fmmu_amu_bdot(r,W)u/(rho_arho_b)

and be so unlucky to have this typo in a conditional statement that is rarely true in the middle of a 10.000 line code …

and you will have some fun in debugging.

So i maintain that a default vector/vector operator is a bad idea.

I see your points, I try to improve it or at least to make it useful so beside removing the vector/vector functions. What else should be removed or added.

BTW: if we agree that the multiplication is point-wise, then what should be the division since it should be inverse of the other.

In a vector space you have two operations:

  • addition between vector

  • scalar multiplication

So in cutil_math we must found vector + vector, vector - vector, scalarvector (if we want vectorscalar), vector/scalar.

vector/vector, vector ± scalar should disappear.

Obviously beside that we can define a set of useful functions:

dot product, length (norm), normalize, cross product for 3D case

and not forget that we have also doubles.

I am not sure to understand the question. There is no vector*vector operation so no inverse operation. If you are talking about

the scalar*vector operation, it’s not an internal law of the vector space and in case of a real vector space this law inherits the properties of the multiplication of reals, so no problem : vector/scalar = (1/scalar)*vector.

For me your cutil_math will be perfect if you remove the non algebraic operators and add the double cases.

BTW in all this discussion my point was that we can define what we want in a function, but operators must be valid mathematical


Thanks for the bug report, I’ll update cutil_math.h.

As we’ve mentioned many times before, cutil only exists for the convenience of the SDK samples, we don’t recommend using it in production code.

It would be nice if basic mathematical operations on float2 and float3 could be provided by NVIDIA - I suspect that lots of us have our own little headers to do this (plus scalar and vector products). Also, it would be good if float2 and double2 ‘knew’ about each other in the way that floats and doubles ‘know’ about each other - last time I checked (which may be a while ago), you couldn’t assign a float2 variable to a double2 variable.

This said, I imagine things could get sticky with complex numbers, since I understand that cuFloatComplex is just a typedef of float2.