Vector maths on float2, where are the SIMD functions?

Hello,

I’m just trying to do some float2 vector maths (i.e. float2/float2) in a CUDA device program (compiling with NVRTCV), and getting:

no operator “/” matches these operands
operand types are: float2 / float2

Is there a function intrinsic instead of an operator for vector maths?

I know I could do:
make_float2(val1.x/val2.x,val1.y/val2.y)
But as a GPU programmer, that seems totally wrong as its going to waste the opportunity to use SIMD vector instructions.

There aren’t any SIMD intrinsics that operate on a quantity larger than 32 bits.

https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__SIMD.html#group__CUDA__MATH__INTRINSIC__SIMD

https://stackoverflow.com/questions/48345049/do-cuda-cores-have-vector-instructions/48345799#48345799

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#simd-video-instructions

That opportunity does not exist, as there are no SIMD vector instructions provided by the hardware that operate on pairs of ‘float’ operands. So nothing is being wasted with your current approach.

Ok, I see. Coming from years of writing vec3 operations in GLSL and OpenCL, it seemed surprising. I suppose given CUDA is aimed at more general purpose compute it makes sense.

Is that all NVIDIA hardware then? Because the code I’m writing is running on an NVIDIA Jetson TX1.

There is no support for float2 operations across the entire GPU range. The limited set of SIMD video instructions (sub-word size operation within a 32-bit register) introduced with the Kepler architecture was largely replaced with software emulations in subsequent architectures.

Some recent architectures have added a few instructions for operating on half2 data (which fits into a 32-bit register).

Classical wide explicit SIMD processing as it exists in CPUs is not a good match for GPUs. Using scalar instructions almost exclusively provides for flexible use of execution units and simplifies the hardware and the tool chain.