Nvfortran error

Hi Mat,
I encountered an nvfortran error:nvfortran-f-0155-compiler failed to translate accelerator region (see-minfo message):device compiler exited with error status code,can you help me?

Mat’s on vacation this week. Can you provide your source or more information?

Hello, I have solved this problem because I wrote a piece of code repeatedly, but now I have some problems. I wonder if you can provide some resources? Can nvfortran use the trim function? Does nvfortran support len_ The function of trim and len to calculate string length?

In host code or device code? Yes for host code. I’ll have to check for device code. In general, our device code support for Fortran character data is not complete.

My current code includes the parallel code on the CPU and GPU. In my code, the device code does not recognize trim. It is always recognized as trima, but my host code does not recognize trim. Normally, trim, len, and len_ Trim is an internal function of fortran, shouldn’t it support nvfortran?

My window runs like this:

This fragment of my code is like this:

Under normal circumstances, the result of running my window should be that there is no blank line between 3, 4 and 5. I don’t know what is going on. Can you help me?

I’d really need a test I can compile and run myself. I do not see how wep_set_fold is set. And, line 343 is overwriting what is set in line 342.

Thank you bleblack, I have solved it. When I use nvfortran to compile and run my program, in the kernel function configuration, when the grid value is set to 2, 4, 6, 8, and 12, the running time is almost the same. When the number of threads is changed to 1024, the running time does not change much. What is the matter? The code defined by my kernel function and the running results are as follows:

Hi 1799336883,

Without a reproducing example, we can’t give a definite answer. Though some possible reasons are:

  • The calls to fuzhi_GPU_day and fuzhi_CPU_day dominate the time so the time spent in Cycle_Runoff doesn’t matter
  • There not enough work in Cycle_Runoff so increasing the grid or block dimensions doesn’t matter.
  • There’s some issue with Cycle_Runoff

Have you profile your code with Nsight-Systems and Nsight-Compute to get a more accurate view of the performance?


Hello,Mat, I have solved this problem, and now I have a new problem,I want to ask you some questions. Can nvfortran compile openmp statements? I want to use cuda fortran and openmp to rewrite my fortran serial code under linux.

Can nvfortran compile openmp statements?

I want to use cuda fortran and openmp to rewrite my fortran serial code under linux.

That’s fine, though if you’re wanting to use OpenMP to support multi-GPU programming, I prefer using MPI instead. With OpenMP you end up having to do domain decomposition and it’s more difficult to manage the memory movement. With MPI, domain decomposition is inherent, allows the program to run across multiple nodes, and GPU Aware MPI allows for direct memory transfers between GPUs.


hi,Mat.Do you have a case about cuda fortran and openmp combined programming? I want to learn, but I can’t find any relevant cases.