Nvidia Flex - CUDA performance

Hi! I’ve been implementing Flex into our engine, and had to change from the DirectX implementation because it doesn’t seem stable on GTX 1080 cards(Device getting removed). So I changed to CUDA implementation instead.

The problem is CUDA tanks the fps in comparison to DirectX, the fps goes from 90 down to 70 with CUDA. Is there someway to improve CUDA performance in the Flex API?

Best regards,

You might want to show relevant code, otherwise we can at best offer wild guesses. It is not clear what kind of software engine you are talking about. As a guess, a CUDA based solution may suffer from overhead from CUDA-DirectX interop versus a solution using compute shaders inside DirectX, i.e. without interop.

As far as I am aware, one difference between graphics shaders and CUDA kernels is that the shader compiler plays fast and loose with mathematical expressions, sacrificing accuracy and standard compliant handling of special cases for performance. The closest one can get with CUDA is to compile with -use_fast_math, but that is still not as aggressive as using the shader compiler.