Question about implicit functions with device arrays in CUDA Fortran

federico-zanetti · July 16, 2025, 10:35pm

Hi,
I’m working on a CUDA Fortran code developed and optimized for GPU clusters. The code uses CPUs for input/output operations and GPUs for the compute-intensive kernels.

I recently implemented some custom subroutines to handle matrix transposition and matrix-matrix multiplication, as I read that intrinsic functions like transpose, matmul, and norm2 are not efficient—or possibly not even usable—when working with device arrays.

I’d like to clarify whether it’s safe and efficient to use these intrinsic functions with device arrays, or if it’s better to always replace them with custom device-aware implementations.

Thank you very much for your help!
Federico

Robert_Crovella · July 16, 2025, 11:50pm

Usually CUDA fortran codes are compiled using nvfortran compiler. In my experience, questions related to CUDA fortran may get better attention by posting on the nvfortran forum here. I can move your question there if desired.

federico-zanetti · July 17, 2025, 7:44am

Thank you for the quick reply. It’s my first time posting in this forum, and I’m still figuring out how it works. I’d be grateful if you could move the question to the correct section of the forum. Thanks again.
Federico

Robert_Crovella · July 17, 2025, 1:45pm

OK I see you have already reposted. I’m just going to close this duplicate.

Topic		Replies	Views
Question about implicit functions with device arrays in CUDA Fortran nvc, nvc++ and nvfortran	2	15	July 17, 2025
Bringing Tensor Cores to Standard Fortran Technical Blog	1	520	August 26, 2020
How to pass a device allocated array into a function properly in Fortran? nvc, nvc++ and nvfortran cuda , debugging-and-troubleshooting	3	59	November 26, 2024
column-based, row-based between CUDA C and CUDA Fortran Legacy PGI Compilers	4	3035	September 30, 2010
Error in compiling CUDA Fortran nvc, nvc++ and nvfortran	5	75	February 4, 2025
Workarounds for IEEE_FMA with NVFortran nvc, nvc++ and nvfortran	4	42	February 14, 2025
Suggestion and advice on porting a linear algebra intensive subroutine to cuda Legacy PGI Compilers	2	551	June 1, 2020
How CUDA Fortran map operations to cuda bin code Legacy PGI Compilers	1	2956	November 29, 2010
traversing array Legacy PGI Compilers	1	6217	February 19, 2010
Roadmap for CUDA Fortran? nvc, nvc++ and nvfortran	7	1274	June 13, 2022

Question about implicit functions with device arrays in CUDA Fortran

Related topics