cudaErrorInvalidDeviceFunction

I’m able to reproduce the invalid device function error, I happened to be using CUDA 11.4, it appears you are also.

I note that if I drop the -dlto switches from the first two lines of your compilation sequence, that the error disappears.

My suggestions:

  • retest with the latest available CUDA toolchain
  • if the problem persists, file a bug