I was able to reproduce the error on CUDA 10.1.158 and icc 18.0 tools.
The error is actually being emitted by cudafe++ which is one of the internal tools that nvcc uses.
It looks like a (nvcc) compiler issue to me, however there is additional work that I would normally do before I would file a bug.
If you wish to short-circuit the process, you’re welcome to file a bug yourself. For full visibility on your end, that would be my recommendation.
I seem to recall multiple similar problems with x86 intrinsics in the past. I would recommend moving host code with such intrinsics into separate files that are compiled directly with the host compiler, if that is possible.