TF32, OpenACC

Hello,

Maybe this is a silly question, I am running my GPU code (which is done with MPI+OpenACC) in A100 with FP32, is there a way to use TF32, or by default it uses TF32 if I use FP32?

Thanks for your help in advance,
Feng

Hi Feng,

We’re not able to implicitly user the Tensor cores within OpenACC, but you can still take advantage of them by either using CUDA or CUDA Fortran (See: Using Tensor Cores in CUDA Fortran | NVIDIA Technical Blog), or via Fortran intrinsics using the “cutensorex” module (See: Bringing Tensor Cores to Standard Fortran | NVIDIA Technical Blog).

Hope this helps,
Mat

Hi Mat,

Thanks for your reply!

Are there future plans or possibility to allow OpenACC to use TF32?

Besides, just for curiosity, I could potentially turn some of my OpenACC Kernels to CUDA C in order to use TF32, in other OpenACC kernels, I will still use FP32, so mixing TF32 (CUDA C) and FP32 (OpenACC), is this feasible?

Thanks,
Feng

Are there future plans or possibility to allow OpenACC to use TF32?

I’m not aware of any plans, but don’t think it’s straight forward to directly transform generic code to utilize the Tensor cores.

I will still use FP32, so mixing TF32 (CUDA C) and FP32 (OpenACC), is this feasible?

Yes, our OpenACC implementation is inter-operable with CUDA objects. If you’re mixing them in the same file, they you’ll want to use ‘nvc++’ to compile since ‘nvcc’ doesn’t support OpenACC, or at least use nvc++ as nvcc’s host compiler (i.e. “-ccbin nvc++”). While not fully supported, nvc++ does have some support for compiling CUDA code.

Also if using nvcc, add “-rdc=true” so relocatable device code (RDC) is enabled so the device code can be linked properly. Alternately, disable RDC in OpenACC via “-gpu=nordc”, however you wont then be able to call device routines or access global data found in separate files.

You may also need to become familiar with the OpenACC “host_data” construct if passing OpenACC managed data to a CUDA kernel.

Here’s a few helpful articles and a video. These may not directly apply to what you’re doing, but hopefully still useful.