Fastest CUDA Implementation?

Hey all,

I am trying to accelerate a program that is written for CPU using CUDA and an NVIDIA GPU. From what I understand, CUDA comes in different programming languages, C, C++, Python, Fortran? Are there significant speed differences between these platforms?

Thank you.

How about you focus on making your program parallelized first? Don’t ask which language, ask yourself, “What’s the best way of using thousands of threads to accomplish my goal?”

We don’t need more language flame wars 'round these here parts.

You should be able to write GPU code (i.e. kernels) that runs in comparable speed on either CUDA C, CUDA C++, or CUDA Fortran. CUDA Fortran is a translation system that ultimately produces a CUDA C/C++ implementation (of the GPU code), that can be called from Fortran, but compiles using the CUDA C/C++ compiler. Python will depend on the specific CUDA/Python implementation. PyCUDA is essentially a wrapper environment that allows C/C++ kernels or CUDA libraries to be called from Python. In that case, an equivalent kernel should have the same performance whether it is called from PyCUDA or from CUDA C/C++ (or CUDA Fortran, if the CUDA Fortran-generated “translated” kernel is equivalent to how you would write the same function in CUDA C/C++).

The net of it is, all of the systems I describe ultimately use the CUDA C/C++ compiler and framework, at least for the CUDA kernel portions of the code, so you should not expect any inherent differences based on that.

For the portions of your code that run on the host CPU even in the CPU/GPU implementation, there will be language differences, and I’m not addressing those.

Does Python store data in a contiguous layout like C? If not then there will be overhead to get that data in the right form before it is transferred over to the device.

Keep in mind in real-life projects there usually is a mix of host and device code, and if you use anything other than C++/Fortran on the host, it will add significantly to the running time. I have found MATLAB to be much faster than Python.

On the CPU side Python is one of the slowest languages in existence;

Thank you txbob and CudaaduC. I think I’ll go with CUDA C++.