"an illegal instruction was encountered"

I know some people saw this error many years ago but I’m now seeing it in CUDA 8.0 RC.

It doesn’t appear when I run the app with nvprof cuda-memcheck or in a debug build.

It’s not critical that I get it fixed as it’s an old benchmark but was a root cause or workaround ever discovered or is this truly a codegen error?

You know the old saying: “MCVE or it didn’t happen!” :-)

Seriously, it is totally unclear (to me, anyhow) under what circumstances this occurs, and I can’t recall having seen this before. It is possible that there are code generation errors in the backend that occasionally construct illegal machine instruction encodings (perhaps due to an uninitialized variable, or access out of bound error inside PTXAS), but without a repro case it seems impossible to diagnose whether that is the case or not.

Agreed, but I’m having some trouble even finding where it’s failing. :|

I assume that by “trouble even finding where it’s failing” you mean that you have trouble finding the specific instruction inside a particular kernel that triggers the exception, rather than identifying which kernel is affected?

If you run a release build under the CUDA debugger, wouldn’t it be able to at least pinpoint the general region of kernel code where this illegal instruction occurs?

Have you tried disassembling the SASS for the kernel in question? If the root cause is a bad instruction encoding produced by the compiler backend, one would expect either the disassembler to complain about it, or the assembly language output to contain a noticeably “weird” instruction of some kind.

How reproducible is this issue? Is there a possibility that it is a transient error cause by a flipped bit in some memory location? Also, is the app maybe using the driver interface to download code to the GPU manually, and is it possible it could be loading a corrupted binary image?

I can run into these in sass programming if I set an illegal stall count (like try to dual issue an instruction that can’t be) or if the generated op code is bad. I can’t recall how these behave under cuda-memcheck.

You could try stepping through the code in the sass debugger.

Or if you have code that reliably generates this bug I’d just submit it to nvidia. It’s probably an issue with ptxas.

Thanks guys… I was able to narrow it down.

The newest, smelliest and untested part of the code was in fact the culprit (as always).

It looks like I had an errant SMEM pointer and sm_52 was more resilient to the bug than sm_50.

It was very Heisenbug’y since even the sm_52 kernel would occasionally fail with an error 73 / “illegal instruction”.

It’s interesting that a simple cuda-memcheck as well as nvprof squelched the bug entirely.

It’s good to read that you were able to track down the root cause of this, but I am a bit puzzled how an errant SMEM pointer triggers an “illegal instruction” exception.

Me too.

Yes, I’m puzzled too.

It’s a warp-synchronous transposition sequence of SMEM stores, loads and then stores to GMEM.

It should be OK but it’s not!

Ahhh… I think I found it. Very bad mistake.

The kernel was mixing a PTX “named barrier” [ bar.sync(barrier,threads) ] with a regular __syncthreads() and it looks like barrier 0 was used in both cases which is not what I would ever have wanted.

Misuse of bar.sync() that results in bad arrival counts is probably a hard error under certain conditions.

The kernels are generated with a program and only this particular configuration was emitting the experimental bar.sync() PTX.

I hadn’t run this code in a long long time and totally missed this code was being emitted.

Replacing the experimental bar.sync(b,t) ops with __syncthreads() fixed the problem (for real this time).

Case closed!