NVFORTRAN-F-0000-Internal compiler error. gen_llvm_expr(): no incoming ili

Hi,
I am a new person at this forum (apart from multiple visits while previous debugging various issues) and here is my first question. I am studying CUDA Fortran for a couple of months, I have written a number of tests and have already passed numerous issues with debugging.
Now I am rewritting our big Fortran code to use CUDA and GPU (together with MPI)
Yesterday I got this error which google knows nothing about.
I know that it is desired to post the reproducible sample, but as I’ve mentioned the whole code is quite big (and when I’ve tried to reduce the problem, the reduces version works!)
This error appears in a module which contains many host, global and device subroutines; the module length is about 800 lines and the error appears, say, at line 250, pointing to the end of the host subroutine, and in this subroutine there are several calls to global subroutines with a proper chevron.
This error looks like a secondary one. By that I mean that first the compiler manages to pass further that line finding some bugs and typos further in the code and then, kind of on some kind of linking step it founds this error.
I believe that it should have something with the way (or I-don’t-know-a-rule) the global subroutine is launched from a host subroutine. Or perhaps there is some rule about mixing host, global, device subroutines in one module or something else… Again, I’ve tried to mimic similar behavior on a smaller sample, and everything works as expected.
Back to the error message.
First part: “NVFORTRAN-F-0000-Internal compiler error” does not say much.
But I hope that some NVFORTRAN specialist might know what indicates the remaining part.
So what could it stand for: gen_llvm_expr() ?
And what is that: no incoming ili ?

P.S. I have NVFORTRAN 23.11-0 and CUDA 12.2

Hi Modestov,

An “ICE” is an issue with the compiler itself and in this case with the LLVM code generation. I’d need a reproducing example to see what’s triggering it.

What I’d suggest is to try updating to our latest release (currently 24.9) to see if we have it fixed already.

If not, please pull together a minimal reproducing example which shows the error. I can then report it to our team and possibly find you a work around.

-Mat

Hi Mat,
Thank you for a really prompt reply.
I am afraid updating the compiler is not an option, as currently I am working on one supercomputer and this is what they provide. I also need MPI and HDF5 and they have a couple of modules for that setup. And I am pretty sure that they do not update compilers to their latest versions every time a new version appear (after all it’s Spain :) ).
I’ll try to produce a smaller sample reproducing this error.

Well… while reducing the code from 20 files and 18k lines to 3 files with less than 400 lines in total I have figured out what caused that problem.
In the global subroutine the last input variable is an allocatable device array:
CALL compute_flux<<<grid0,block0>>>(mx, my, mz, nflux, gflux)
It was allocated before in other host subroutine:
ALLOCATE( gflux(mx, my, mz, nflux) )
But! nflux was declared as
integer, constant :: nflux = 7
while it should be
integer, parameter :: nflux = 7

Take home message: take double care about parameters with constant attribute :).

I’m glad you were able to find the cause, though the compiler shouldn’t give an ICE. So if you’re still able, providing a reproducing example would be appreciated. I’d give the report a lower priority, but it would be good for engineering to take a look so it doesn’t happen for others.

Thanks!

Here is the very minimal example:

Module kernel
use cudafor
integer :: stat
real, allocatable, dimension(:), device :: gv
! integer, parameter :: mm = 100
integer, constant :: mm = 100
contains
attributes(global) subroutine func(lx,va)
integer, value :: lx
real, dimension(lx), intent(out) :: va
integer :: i
i = (blockIdx%x-1) * blockDim%x + threadIdx%x
if (i .le. lx ) va(i) = 1.0
end subroutine func
End module

Program main
use kernel
use cudafor
stat = cudasetdevice(0)
allocate( gv(mm))
call func<<<100,1>>>(mm,gv)
End program

Compile with:
nvfortran -cuda kernel.F90 main.F90 -o execute

So when mm is declared with attribute constant, the compiler gives that error, if mm is declared as a true parameter, then everything is fine.

It might be good to have some warning or even an error message on this occasion.

Cheers

Thanks! I reported it as TPR#36714.

The problem is with passing the constant “mm” by value into the kernel which I don’t believe is valid. Though the compiler should be catching the error instead of giving an ICE.

“constant” is stored in read-only device memory so has limited use on the host. Parameter is the correct usage here.

1 Like

Hi Mat,
One more question related to my confusion between constant and parameter attributes.
When I pass an allocatable arrays to a global subroutine is there a way NOT to pass its dimensions, like instead of
call func<<<100,1>>>(mm,gv)
use
call func<<<100,1>>>(gv)
and in global subroutine just use
real, dimension(mg) :: gv
where mg is declared as integer, constant, and somewhere in host have mg = mm
I have tried it, but it gives a run time error, so I guess such use of constant is not valid either.
I know that I can skip that mm as input parameter and declare gv array as
real, dimension(:) :: gv
but I’ve read and checked that for a kernel it’s better to know the array dimensions in advance. I think I saw some gain in performance.

Hmm, you should be able to access “mg” from the kernel since it’s in the module so accessible. Maybe something else is going on?

I updated your example using direct access to mg:

% cat test.CUF
Module kernel
use cudafor
integer :: stat
real, allocatable, dimension(:), device :: gv
real, allocatable, dimension(:) :: gvH
integer, parameter :: mm=100
integer, constant :: mg
contains
attributes(global) subroutine func(va)
real, dimension(lx), intent(out) :: va
integer :: i
i = (blockIdx%x-1) * blockDim%x + threadIdx%x
if (i .le. mg ) va(i) = 1.0
end subroutine func
End module

Program main
use kernel
use cudafor
mg=mm
stat = cudasetdevice(0)
allocate( gv(mm))
allocate( gvH(mm))
call func<<<100,1>>>(gv)
gvH=gv
print *, gvH(1:5)

End program
% nvfortran test.CUF ; a.out
    1.000000        1.000000        1.000000        1.000000
    1.000000

In kernel do you have there lx or mg?

Okey…
I have also checked and it looks like working…
Perhaps indeed there should be something else in my code, maybe some another confusion in constants and parameters…
Thanks a lot!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.