Cuda portability


I’m wondering how are your experience with portability (performance and source-code) for cuda?

  • Are there any issues porting code from compiler to compiler?
  • Does the performance show consistent performance (using comparable metric) over all Hardware?

Thanks in advance

It is not an issue because there is no portability for CUDA code. That is because there is only one compiler for it : Nvidia’s. Other compilers are used for the host-side code (gcc, clang, VS, etc.) but all device code is compiled by Nvidia’s compiler.

All device code is indeed compiled finally/ultimately for NVIDIA GPUs by the ptxas compiler (or the equivalent functionality in the GPU driver). There are a few user-created assemblers out there (e.g. maxas) but these aren’t that relevant to this discussion, I don’t think.

However, the CUDA device code compilation process doesn’t necessarily begin with ptxas, and the conversion of source code (in whatever form it may be) to PTX may follow a number of available paths, some of which are not wholly created by NVIDIA or part of the NVIDIA provided toolchain(s). I’ll mention 2 examples:

clang has the ability to compile CUDA C++ device code:

gnu tools have the ability compile OpenACC device source code:

As far as I know, both of these examples build fatbinaries with embedded PTX, so they are runnable directly as a “CUDA executable”. The conversion to CUDA machine code would be handled by the GPU driver, equivalently to a CUDA executable built with e.g. -gencode arch=compute_30,code=compute_30 using NVIDIA nvcc toolchain.

I’m not trying to provide any value judgments here, or any statements of suitability.

I appreciate the elaboration Robert. Thank you.