Multi Architectures runtime problem

We use to have -gencode=arch=compute_20,code=sm_20 which works fine for Kepler architecture.

For supporting more architectures, we use the following command:

After that we got Cudr Module 200 error:
* This indicates that the device kernel image is invalid. This can also
* indicate an invalid CUDA module.

Anyone has a clue?

BTW, we use CUDA 8.0.

If you dropped 20 and run on a fermi gpu, you will get that error.

I tried, looks like if we use driver function directly, we do not need multi architecture.
driver function load the module from ptx and run in JIT mode.
Only runtime lib uses multi architecture.
Please comment if it is true.