Cuda 10.1 and Ampere GPU compatibility

I’m building an application on an Ampere-GPU machine, and the application needs CUDA 10.1 running at the same time (can’t use the latest for now). I’m bumping into driver version incompatibilities, even after following the documentation where we “should” be able to make both run by setting the CUDA_FORCE_PTX_JIT=1 (NVIDIA Ampere GPU Architecture Compatibility Guide :: CUDA Toolkit Documentation). I have not been able to successful build the application locally or even with CI. Would anyone have some helpful advice?

Here’s what’s currently installed on my CI builds:

| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A10G         Off  | 00000000:00:1E.0 Off |                    0 |
|  0%   22C    P0    58W / 300W |      0MiB / 23028MiB |      2%      Default |
|                               |                      |                  N/A |
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|  No running processes found                                                 |
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
CUDA Version 10.1.243

make sure your compilation commands include an arch specification that includes PTX

For example:

nvcc -arch=sm_70

will compile for a cc7.0 target (which CUDA 10.1 understands) and also include PTX. The PTX generally speaking should be able to forward-JIT to Ampere GPUs.

Note that NVIDIA recommends CUDA 11.0 (A100, A30) or 11.1 (all others) for Ampere GPUs. There may be situations where the PTX JIT mechanism doesn’t resolve all possible issues in trying to build an application this way (with a CUDA version not recommeded for use on Ampere GPUs.) YMMV.