For my applications I need to call linear algebraic, signal/image processing and neural network functions from within the GPU kernel, avoiding any interaction with the CPU. I understand support for cuBLAS device functions stopped, and NPP never had device function support.
What are my options for (3rd party?) device callable HPC library functions?
*linear algebraic
*signal- and image processing
*neural network
cuTLASS seems an excellent base layer for further developments. Does someone know about availability of optimized signal- and image-processing functions built on top of cuTLASS?
• Filtering
• Thresholding
• Morphology as in dilations, erosions, region growing, D-transform, skeletons
• Blob analysis
• 1D- and 2D FFT
• 1D- and 2D Correlation
• Neural Networks
• SIFT