Vector maths on float2, where are the SIMD functions?

PeartreeStudios · July 8, 2018, 10:17am

Hello,

I’m just trying to do some float2 vector maths (i.e. float2/float2) in a CUDA device program (compiling with NVRTCV), and getting:

no operator “/” matches these operands
operand types are: float2 / float2

Is there a function intrinsic instead of an operator for vector maths?

I know I could do:
make_float2(val1.x/val2.x,val1.y/val2.y)
But as a GPU programmer, that seems totally wrong as its going to waste the opportunity to use SIMD vector instructions.

Robert_Crovella · July 8, 2018, 2:14pm

There aren’t any SIMD intrinsics that operate on a quantity larger than 32 bits.

[url]https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__SIMD.html#group__CUDA__MATH__INTRINSIC__SIMD[/url]

[url]opencl - Do CUDA cores have vector instructions? - Stack Overflow

[url]https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#simd-video-instructions[/url]

njuffa · July 8, 2018, 2:48pm

That opportunity does not exist, as there are no SIMD vector instructions provided by the hardware that operate on pairs of ‘float’ operands. So nothing is being wasted with your current approach.

PeartreeStudios · July 8, 2018, 8:48pm

Ok, I see. Coming from years of writing vec3 operations in GLSL and OpenCL, it seemed surprising. I suppose given CUDA is aimed at more general purpose compute it makes sense.

Is that all NVIDIA hardware then? Because the code I’m writing is running on an NVIDIA Jetson TX1.

njuffa · July 9, 2018, 1:36am

There is no support for float2 operations across the entire GPU range. The limited set of SIMD video instructions (sub-word size operation within a 32-bit register) introduced with the Kepler architecture was largely replaced with software emulations in subsequent architectures.

Some recent architectures have added a few instructions for operating on half2 data (which fits into a 32-bit register).

Classical wide explicit SIMD processing as it exists in CPUs is not a good match for GPUs. Using scalar instructions almost exclusively provides for flexible use of execution units and simplifies the hardware and the tool chain.

Topic		Replies	Views
How to use SIMD Video Instructions and why is there no 32/64 bit float version CUDA Programming and Performance	4	1614	October 12, 2021
Future support/extension of CUDA SIMD intrinsics CUDA Programming and Performance	4	2374	September 29, 2016
SIMD intrinsics with NVRTC CUDA Programming and Performance	2	690	July 23, 2020
Vector operations, swizzle and macros in CUDA CUDA Programming and Performance	3	8669	May 20, 2009
SIMD on GPU CUDA Programming and Performance	6	17810	April 29, 2009
A question about calculation of integer (or short integer) and float data CUDA Programming and Performance	8	3319	April 4, 2014
packed sse-like math-funcs for float4/int4 etc CUDA Programming and Performance	4	2363	November 28, 2008
16 bit int multiplication using SIMD / mixed precision CUDA Programming and Performance	7	1832	October 12, 2021
CUDA intrinsics? CUDA Programming and Performance	7	3529	November 16, 2017
Faster __vsubus4() implementation CUDA Programming and Performance	3	1237	July 2, 2016

Vector maths on float2, where are the SIMD functions?

Related topics