any lin alg, sig/im-pro and NN device functions available?

For my applications I need to call linear algebraic, signal/image processing and neural network functions from within the GPU kernel, avoiding any interaction with the CPU. I understand support for cuBLAS device functions stopped, and NPP never had device function support.

What are my options for (3rd party?) device callable HPC library functions?
*linear algebraic
*signal- and image processing
*neural network

I look forward to your reactions.

one possibility for linear algebra: cutlass

Thank you Robert

cuTLASS seems an excellent base layer for further developments. Does someone know about availability of optimized signal- and image-processing functions built on top of cuTLASS?

• Filtering
• Thresholding
• Morphology as in dilations, erosions, region growing, D-transform, skeletons
• Blob analysis
• 1D- and 2D FFT
• 1D- and 2D Correlation
• Neural Networks
• SIFT