I’m looking for the fastest available 2D convolution for 32-bit floating point to benchmark some of my own code against. Does anyone have any suggestion? Benchmark results?
I noticed NPP has routines for this, as I’m sure there ar MANY other implementations available. Can anyone give me some pointers?