Error with clara-parabricks:4.0.1-1 fq2bam - Bad argument value: Number of GPUs requested is more than number of gpus in system

Hi

I am trying out clara-parabricks:4.0.1-1.sif on a g4dn.metal EC2 which has 8 GPUs

This is how I created my sif file:

singularity build clara-parabricks_4.0.1-1.sif   docker://nvcr.io/nvidia/clara/clara-parabricks:4.0.1-1

And I get this error when trying to run fq2bam

Bad argument value: Number of GPUs requested (8) is more than number of GPUs (0in the system., exiting.

This is nvidia-smi on the host

nvidia-smi
Sun Jun  4 18:34:55 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:18:00.0 Off |                    0 |
| N/A   34C    P0    25W /  70W |  13458MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:19:00.0 Off |                    0 |
| N/A   29C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            Off  | 00000000:35:00.0 Off |                    0 |
| N/A   30C    P8    11W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            Off  | 00000000:36:00.0 Off |                    0 |
| N/A   28C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  Tesla T4            Off  | 00000000:E7:00.0 Off |                    0 |
| N/A   29C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  Tesla T4            Off  | 00000000:E8:00.0 Off |                    0 |
| N/A   29C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  Tesla T4            Off  | 00000000:F4:00.0 Off |                    0 |
| N/A   29C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  Tesla T4            Off  | 00000000:F5:00.0 Off |                    0 |
| N/A   29C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      7387      C   python                          13455MiB |
+-----------------------------------------------------------------------------+

This below is from inside the container after invoking it on command like by singularity shell --nv clara-parabricks_4.0.1-1.sif

Singularity> date
Sun Jun  4 18:38:42 EDT 2023

Singularity> nvidia-smi
Sun Jun  4 18:39:50 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:18:00.0 Off |                    0 |
| N/A   42C    P0    26W /  70W |  13458MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:19:00.0 Off |                    0 |
| N/A   31C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla T4            Off  | 00000000:35:00.0 Off |                    0 |
| N/A   32C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla T4            Off  | 00000000:36:00.0 Off |                    0 |
| N/A   30C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  Tesla T4            Off  | 00000000:E7:00.0 Off |                    0 |
| N/A   31C    P8     8W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  Tesla T4            Off  | 00000000:E8:00.0 Off |                    0 |
| N/A   31C    P8    11W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  Tesla T4            Off  | 00000000:F4:00.0 Off |                    0 |
| N/A   32C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  Tesla T4            Off  | 00000000:F5:00.0 Off |                    0 |
| N/A   31C    P8     9W /  70W |      3MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      7387      C   python                          13455MiB |
+-----------------------------------------------------------------------------+

pbrun fq2bam \
> --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
> --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
> --out-bam fq2bam_output.bam

[PB Error 2023-Jun-04 18:42:55][ParaBricks/src/pbOpts.cu:132] Bad argument value: Number of GPUs requested (8) is more than number of GPUs (0in the system., exiting.

Would appreciate any help.

Thanks in advance.

Hi there,
I observed the same issue with docker and version nvcr.io/nvidia/clara/clara-parabricks:4.1.0-1 .

Kind regards,
Daniel

Hello,

sorry to hear you are having issues.
Can you please let me know if you are able to run any other CUDA application and/or CUDA samples, to make sure that this is not a driver issue.

Hey,
I re-installed cuda12. It seems that previously it was not installed properly.
Furthermore, I disabled MIG service.
Now it works for me.

Thank you for your reply.
Parabricks does not support MIG mode.

Best