Compiling CUDA code from other platforms

Hi, I’m new to PGI compiler, I have to compile some C + CUDA code on a computer without a Nvidia card, for executing in other computers. I can use Windows or Linux (32/64 bits) version with the free trial (I don’t know if I can have both with the trial) for compiling, and execute the programs in a Windows/Linux (32/64 bits) architecture too, with an appropriate Nvidia card. I have a few questions:

  • Should I use Windows or Linux version of PGI? I prefer the version to make it easier to compile and just run on another machine (Windows?).

  • Should I compile for Windows, or Linux? Which could give me more compatibility?

  • How can you compile for a specific graphics card from another platform?

I’m interested in knowing about the “safety” of the process, I want to compile and be assured it would work on the another machine, at least in the fewest possible attemps, because for me it takes a long time between compiling / testing the code, since I can’t compile in the destination machine.

Thank you very much.

Hi valenbg,

Our CUDA C++ compiler only targets x86 based systems while our CUDA Fortran compiler and OpenACC directive/pragmas can target NVIDIA gpus.

If you really want maximum portability, then you’ll want to more to using a higher level approach like OpenACC. With OpenACC, you insert pragmas in your code indicating which portion of code to off-load to an Accelerator. An Accelerator is a generic device that could be an NVIDIA GPU, but could also be an AMD Radeon, Intel Xeon Phi, or other future device. Actually, we’re working towards allowing a single binary to target multiple devices. You can more information on OpenACC at: PGI Compilers with OpenACC | PGI.

As for the OS choice, problems can occur when building on a newer OS version and running on an older version, but otherwise it’s just a matter of having the correct shared libraries installed, or link statically. Though, these issue occur on either OS so it’s more a matter of personal preference.

Choice of target host CPU is also important. Compiling to use the most generic CPU will give you more portability, but usually result in slower code. The PGI Unified Binary technology is able to optimize compute intensive portion of your code for different CPU architectures, wrap them up in the same binary, and then at run time, select the appropriate code path. PGI Unified Binary™ descion | PGI

  • Mat

I plan to use openACC directives for auto-parallelize C code, and then executing in other computer with a Nvidia accelerator. So with openACC, I can target Nvidia gpus, and the PGI Unified Binary it’s really an interesting thing I’m probably going to try. Thank you very much for your response.