Unable to run CUDA programs in Singularity containers from NGC

I am experiencing some problems running programs compiled in a container built from an NGC CUDA image:

$ srun singularity pull docker://nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04

I am running my attempts to compile and run software through a Slurm queue on a DGX-2 system.
I have downloaded the CUDA Toolkit code samples which I keep in ‘~/NVIDIA_CUDA-11.1_Samples’.
I pick some random code sample: ‘~/NVIDIA_CUDA-11.1_Samples/0_Simple/matrixMul/’ and compile that in my container:

$ srun --gres=gpu:1 singularity exec --nv cuda_11.1-cudnn8-devel-ubuntu18.04.sif make

The build succeeds and I now have the executable matrixMul. Now I try to execute matrixMul in my container:

$ srun --gres=gpu:1 singularity exec --nv ~/image-build/cuda_11.1/cuda_11.1-cudnn8-devel-ubuntu18.04.sif ./matrixMul
slurmstepd: task_p_pre_launch: Using sched_affinity for tasks
[Matrix Multiply Using CUDA] - Starting...
CUDA error at ../../common/inc/helper_cuda.h:779 code=3(cudaErrorInitializationError) "cudaGetDeviceCount(&device_count)" 
srun: error: nv-ai-01.srv.aau.dk: task 0: Exited with exit code 1

It fails and the error message could indicate that it cannot see any available GPUs. If I try to probe a bit more around with nvidia-smi, I do seem to have a GPU available in the container:

$ srun --gres=gpu:1 singularity exec --nv ~/image-build/cuda_11.1/cuda_11.1-cudnn8-devel-ubuntu18.04.sif nvidia-smi
slurmstepd: task_p_pre_launch: Using sched_affinity for tasks
Mon Oct 19 23:26:07 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.116.00   Driver Version: 418.116.00   CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM3...  On   | 00000000:BE:00.0 Off |                    0 |
| N/A   39C    P0    51W / 350W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I get suspicious here because nvidia-smi says CUDA is version 10.1, but I am running it inside a container which I expect to provide CUDA 11.1.
I expect the cuda:11.1-cudnn8-devel-ubuntu18.04 container to provide me with the necessary drivers and CUDA toolkit to compile and run CUDA applications. Compiling seems to work fine, but running - not so much…
What could be wrong here?

I have figured out what I did wrong. I was making wrong assumptions indeed: the driver is provided by the host OS. Since the driver is an older version that CUDA 11.1 is not compatible with, I solved this by using the CUDA 10.0 / cuDNN 7 version of this image instead and then I am able to both compile and run the samples. More importantly, CUDA version 10.0 is just enough for the actual software I was eventually interested in running, so this solves my problem.