I’m actually working in a project where I have to optimize a software which uses DirectX GPGPU and the DOTNET platform.

I wonder if I can improve performances by getting closer to the hardware as the actual solution is quite high level (dotnet + directx). In theory, the cuda should be faster as it don’t rely on any framework and can communicate directly with the GPU.

Does anyone has a feedback about directX gpgpu performance vs cuda performance ?


I believe you’ll find it much more difficult to code using the DirectX API than the CUDA API. Especially if your algorithm doesn’t fit well within the DirectX framework.

The thing is that the shader extensions of DirectX have been designed with games in mind. Which supposes a certain mode of operation. I believe (I’m no expert) that scatter operations are not easily implemented with DirectX. And for sure with DirectX you don’t have access to the multiprocessor’s shared memory. Which will impact performance, depending of the algorithm.

Also, the DirectX’s API and GPU-executed source code (HLSL) have a steeper learning curve than CUDA’s minimal API. CUDA is not perfect (yet) and has some bugs but you stand a better chance of getting support from nVidia and CUDA users than from Microsoft. For the time being, I believe that CUDA is the best solution.

Thanks for the answer. I totally agree with you for the advantages of CUDA.

However after spending half a day looking for cuda performance, I have came to the conclusion that directx is far better than cuda when it’s come to execution times.

I have a hard time believing this is a true statement in general. Perhaps for some specific algorithms this would be true (though I’m not sure why), but anything which makes use of shared memory in CUDA should be massively faster than the equivalent DirectX code.

Can you clarify what you mean here?

(I should point out that CUDA does have one “performance bug”: It looks enough like normal C that new developers tend to inappropriately apply CPU programming concepts to the GPU. It is very easy to write poorly performing code in CUDA until you fully internalize the rules.)