Clara-parabricks_4.1.0-1.sif can not recognize A100 cards?

hi, trying to run fq2bam with a A100 node, get this error message:

[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /home/crick/working/haplotypecalling/Data/sample_1.fq.gz and
/home/crick/working/haplotypecalling/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[PB Info 2023-Jun-01 00:50:13] ------------------------------------------------------------------------------
[PB Info 2023-Jun-01 00:50:13] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2023-Jun-01 00:50:13] || Version 4.1.0-1 ||
[PB Info 2023-Jun-01 00:50:13] || GPU-BWA mem, Sorting Phase-I ||
[PB Info 2023-Jun-01 00:50:13] ------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[PB Error 2023-Jun-01 00:50:14][ParaBricks/src/pbOpts.cu:107] Bad argument value: Number of GPUs requested (4) is more than number of GPUs (0in the system., exiting.
For technical support visit Help - NVIDIA Docs
Exiting…

Could not run fq2bam
Exiting pbrun …

===
on the host:

[crick@csctmp-xe8545-2 haplotypecalling]$ nvidia-smi
Thu Jun 1 01:29:36 2023
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:01:00.0 Off | 0 |
| N/A 20C P0 57W / 500W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:41:00.0 Off | 0 |
| N/A 19C P0 56W / 500W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:81:00.0 Off | 0 |
| N/A 21C P0 58W / 500W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 3 NVIDIA A100-SXM4-80GB On | 00000000:C1:00.0 Off | 0 |
| N/A 18C P0 59W / 500W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+

In the container:

[crick@csctmp-xe8545-2 haplotypecalling]$ singularity shell clara-parabricks_4.1.0-1.sif
Singularity> nvidia-smi
Thu Jun 1 01:30:58 2023
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-80GB On | 00000000:01:00.0 Off | 0 |
| N/A 20C P0 57W / 500W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 1 NVIDIA A100-SXM4-80GB On | 00000000:41:00.0 Off | 0 |
| N/A 19C P0 56W / 500W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 2 NVIDIA A100-SXM4-80GB On | 00000000:81:00.0 Off | 0 |
| N/A 21C P0 58W / 500W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 3 NVIDIA A100-SXM4-80GB On | 00000000:C1:00.0 Off | 0 |
| N/A 18C P0 59W / 500W| 0MiB / 81920MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+
Singularity>

Any clue for it?

Thanks,

Wei

1 Like

btw, I have --nv and --num-gpus in my code. My code works with V100 cards. Also in sigularity.config I modified “always use nv = yes” “always use rocm = yes”

Hi,
I observed the exact same error on 4xA100 cards, running on Docker.

It works now for me.
I re-installed cuda 12 correctly and disabled MIG service.

Thanks,

Parabricks does not support MIG mode.