I am struggling to resolve the CUDA version conflict between driver and runtime when running from singularity. I have used two containers (build from docker)
clara-parabricks_4.1.2-1.sif
clara-parabricks_4.2.0-1.sif
(details are below) “[PB Error 2023-Nov-14 11:27:02][src/stitchPiece_step0.cu:438] cudaSafeCall() failed: CUDA driver version is insufficient for CUDA runtime version, exiting.”
when running the singularity container on HPC gpu node.
What would I need to do to resolve this conflict? (A challenge to note is on HPC cluster users do not have root privileges )
I don’t understand why nvidia-smi shows CUDA Version shows 11.8 when singularity has the CUDA-12.2. I also looked at the requirement variable. I guess if fix the paths it might work.
Singularity> echo $NVIDIA_REQUIRE_CUDA
cuda>=12.2 brand=tesla,driver>=450,driver<451
cuda>brand=tesla,driver>=470,driver<471
cuda>brand=unknown,driver>=470,driver<471
cuda>brand=nvidia,driver>=470,driver<471
cuda>brand=nvidiartx,driver>=470,driver<471
cuda>brand=geforce,driver>=470,driver<471
cuda>brand=geforcertx,driver>=470,driver<471
cuda>brand=quadro,driver>=470,driver<471
cuda>brand=quadrortx,driver>=470,driver<471
cuda>brand=titan,driver>=470,driver<471
cuda>brand=titanrtx,driver>=470,driver<471
cuda>brand=tesla,driver>=525,driver<526
cuda>brand=unknown,driver>=525,driver<526
cuda>brand=nvidia,driver>=525,driver<526
cuda>brand=nvidiartx,driver>=525,driver<526
cuda>brand=geforce,driver>=525,driver<526
cuda>brand=geforcertx,driver>=525,driver<526
cuda>brand=quadro,driver>=525,driver<526
cuda>brand=quadrortx,driver>=525,driver<526
cuda>brand=titan,driver>=525,driver<526
cuda>brand=titanrtx,driver>=525,driver<526
Don’t understand why CUDA Version shows 11.8 when singularity has the CUDA-12.2
Singularity> nvidia-smi
Tue Nov 14 11:39:19 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … On | 00000000:86:00.0 Off | N/A |
| 30% 33C P8 13W / 250W | 1MiB / 11264MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
Singularity> echo $PATH
/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
Singularity> uname -a
Linux gpu0035.vampire 3.10.0-1160.71.1.el7.x86_64 #1 SMP Tue Jun 28 15:37:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Singularity> ls /usr/local/cuda*
/usr/local/cuda:
compat doc gds lib64 targets
/usr/local/cuda-12:
compat doc gds lib64 targets
/usr/local/cuda-12.2:
compat doc gds lib64 targets
Singularity> time pbrun rna_fq2bam --in-fq
Singularity> /nobackup/h_vmac/user/janveva1/Project/RNAseq/bulk_RNAseq_g
Singularity> pu/test_data/Sample_043_END1.fastq.gz
Singularity> /nobackup/h_vmac/user/janveva1/Project/RNAseq/bulk_RNAseq_g
Singularity> pu/test_data/Sample_043_END2.fastq.gz --genome-lib-dir
Singularity> /nobackup/h_vmac/user/janveva1/Project/RNAseq/bulk_RNAseq_g
Singularity> pu/genome_index/ --ref
Singularity> /nobackup/h_vmac/user/janveva1/Project/RNAseq/bulk_RNAseq_g
Singularity> pu/test_data/101bp_r110/Homo_sapiens.GRCh38.110.dna_sm.prim
Singularity> ary_assembly.fa --output-dir
Singularity> /nobackup/h_vmac/user/janveva1/Project/RNAseq/bulk_RNAseq_g
Singularity> pu/output/ --out-bam Sample_043_END2_gpu.bam
Singularity> --read-files-command zcat
Please visit NVIDIA Clara - NVIDIA Docs for detailed documentation
[Parabricks Options Mesg]: Automatically generating ID prefix [Parabricks Options Mesg]: Read group created for /panfs/accrepfs.vampire/nobackup/h_vmac/user/janveva1/Project/RNAseq/bulk_RNAseq_gpu/test_data/Sample_043_END1.fastq.gz
and
/panfs/accrepfs.vampire/nobackup/h_vmac/user/janveva1/Project/RNAseq/bulk_RNAseq_gpu/test_data/Sample_043_END2.fastq.gz
[Parabricks Options Mesg]: @RG\tID:HYWM7BCXX161201.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HYWM7BCXX161201.1
[PB Info 2023-Nov-14 12:02:40] ------------------------------------------------------------------------------
[PB Info 2023-Nov-14 12:02:40] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2023-Nov-14 12:02:40] || Version 4.2.0-1 ||
[PB Info 2023-Nov-14 12:02:40] || star ||
[PB Info 2023-Nov-14 12:02:40] ------------------------------------------------------------------------------
[PB Info 2023-Nov-14 12:02:40] … started STAR run [PB Info 2023-Nov-14 12:02:40] … loading genome [PB Info 2023-Nov-14 12:03:55] read from genomeDir done 74.864 [PB Info 2023-Nov-14 12:03:55] Gpu num:1 Cpu thread num: 4 [PB Info 2023-Nov-14 12:03:55] … started mapping [PB Error 2023-Nov-14 12:03:55][src/stitchPiece_step0.cu:438] cudaSafeCall() failed: CUDA driver version is insufficient for CUDA runtime version, exiting.
For technical support visit Clara Parabricks v4.2.0 - NVIDIA Docs
Exiting…
Could not run rna_fq2bam
Exiting pbrun …
Best regrds,
Vaibhav