CUDA driver version is insufficient for CUDA runtime version [ Rocky linux 8.6]

Hi Folks

I am using Rocky linux 8.6 and trying to run a nccl docker image . Below are failing logs.
My Node GPU configuration as follows :
[root@node001 ~]# nvidia-smi
Wed Nov 2 20:46:07 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100 80G… On | 00000000:17:00.0 Off | 0 |
| N/A 30C P0 40W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA A100 80G… On | 00000000:65:00.0 Off | 0 |
| N/A 31C P0 42W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 2 NVIDIA A100 80G… On | 00000000:CA:00.0 Off | 0 |
| N/A 30C P0 44W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 3 NVIDIA A100 80G… On | 00000000:E3:00.0 Off | 0 |
| N/A 30C P0 43W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

SLURM_DEBUG=2 srun --export="NCCL_DEBUG=INFO,NCCL_IB_DISABLE=1,PMIX_MCA_gds=hash,NVIDIA_DRIVER_CAPABILITIES=utility,compute" -w node001 -N 1 --ntasks-per-node=1 --gpu-bind=none --gpus-per-task=1 --exclusive --mpi=pmix_v3 --container-image=192.168.61.4:5000#/deepops-nccl-test:0.1 /nccl_tests/build/all_reduce_perf -b 1M -e 4G -f 2 -g 1
srun: select/cons_res: common_init: select/cons_res loaded
srun: select/cons_tres: common_init: select/cons_tres loaded
srun: select/linear: init: Linear node selection plugin loaded with argument 4
srun: debug:  switch/none: init: switch NONE plugin loaded
srun: debug:  spank: opening plugin stack /cm/shared/apps/slurm/var/etc/slurm/plugstack.conf
srun: debug:  /cm/shared/apps/slurm/var/etc/slurm/plugstack.conf: 1: include "/cm/shared/apps/slurm/var/etc/slurm/plugstack.conf.d/*"
srun: debug:  spank: opening plugin stack /cm/shared/apps/slurm/var/etc/slurm/plugstack.conf.d/pyxis.conf
srun: debug:  spank: /cm/shared/apps/slurm/var/etc/slurm/plugstack.conf.d/pyxis.conf:1: Loaded plugin spank_pyxis.so
srun: debug:  SPANK: appending plugin option "container-image"
srun: debug:  SPANK: appending plugin option "container-mounts"
srun: debug:  SPANK: appending plugin option "container-workdir"
srun: debug:  SPANK: appending plugin option "container-name"
srun: debug:  SPANK: appending plugin option "container-save"
srun: debug:  SPANK: appending plugin option "container-mount-home"
srun: debug:  SPANK: appending plugin option "no-container-mount-home"
srun: debug:  SPANK: appending plugin option "container-remap-root"
srun: debug:  SPANK: appending plugin option "no-container-remap-root"
srun: debug:  SPANK: appending plugin option "container-entrypoint"
srun: debug:  SPANK: appending plugin option "no-container-entrypoint"
srun: debug:  SPANK: appending plugin option "container-writable"
srun: debug:  SPANK: appending plugin option "container-readonly"
srun: launch/slurm: init: launch Slurm plugin loaded
srun: debug:  mpi type = pmix_v3
srun: debug:  mpi/pmix_v3: init: PMIx plugin loaded
srun: debug:  propagating RLIMIT_CPU=18446744073709551615
srun: debug:  propagating RLIMIT_FSIZE=18446744073709551615
srun: debug:  propagating RLIMIT_DATA=18446744073709551615
srun: debug:  propagating RLIMIT_STACK=18446744073709551615
srun: debug:  propagating RLIMIT_CORE=0
srun: debug:  propagating RLIMIT_RSS=18446744073709551615
srun: debug:  propagating RLIMIT_NPROC=255101
srun: debug:  propagating RLIMIT_NOFILE=131072
srun: debug:  propagating RLIMIT_MEMLOCK=18446744073709551615
srun: debug:  propagating RLIMIT_AS=18446744073709551615
srun: debug:  propagating SLURM_PRIO_PROCESS=0
srun: debug:  propagating UMASK=0022
srun: debug:  Entering slurm_allocation_msg_thr_create()
srun: debug:  port from net_stream_listen is 45751
srun: debug:  Entering _msg_thr_internal
srun: debug:  auth/munge: init: Munge authentication plugin loaded
srun: Waiting for nodes to boot (delay looping 450 times @ 0.100000 secs x index)
srun: Nodes node001 are ready for job
srun: jobid 10: nodes(1):`node001', cpu counts: 128(x1)
srun: debug:  requesting job 10, user 0, nodes 1 including (node001)
srun: debug:  cpus 1, tasks 1, name all_reduce_perf, relative 65534
srun: launch/slurm: launch_p_step_launch: CpuBindType=(null type)
srun: debug:  Entering slurm_step_launch
srun: debug:  mpi type = (null)
srun: debug:  mpi/pmix_v3: pmixp_abort_agent_start: (null) [0]: pmixp_agent.c:376: Abort agent port: 42401
srun: debug:  mpi/pmix_v3: p_mpi_hook_client_prelaunch: (null) [0]: mpi_pmix.c:224: setup process mapping in srun
srun: debug:  mpi/pmix_v3: _pmix_abort_thread: (null) [0]: pmixp_agent.c:352: Start abort thread
srun: debug:  Entering _msg_thr_create()
srun: debug:  initialized stdio listening socket, port 42201
srun: debug:  Started IO server thread (23456086173440)
srun: debug:  Entering _launch_tasks
srun: launching StepId=10.0 on host node001, 1 tasks: 0
srun: route/default: init: route default plugin loaded
srun: debug:  launch returned msg_rc=0 err=0 type=8001
pyxis: importing docker image: 192.168.61.4:5000#/deepops-nccl-test:0.1
srun: launch/slurm: _task_start: Node node001, 1 tasks started





pyxis: imported docker image: 192.168.61.4:5000#/deepops-nccl-test:0.1
# nThread 1 nGpus 1 minBytes 1048576 maxBytes 4294967296 step: 2(factor) warmup iters: 5 iters: 20 validation: 1
#
# Using devices
node001: Test CUDA failure common.cu:1045 'CUDA driver version is insufficient for CUDA runtime version'
 .. node001 pid 11621: Test failure common.cu:1007
srun: launch/slurm: _task_finish: Received task exit notification for 1 task of StepId=10.0 (status=0x0200).
srun: error: node001: task 0: Exited with exit code 2
srun: debug:  task 0 done
srun: debug:  IO thread exiting
srun: debug:  mpi/pmix_v3: _conn_readable: (null) [0]: pmixp_agent.c:103:     false, shutdown
srun: debug:  mpi/pmix_v3: _pmix_abort_thread: (null) [0]: pmixp_agent.c:354: Abort thread exit
srun: debug:  Leaving _msg_thr_internal