Hi all,
I’m running fq2bam 4.5.0-1 within an apptainer container + nextflow + slurm.
Sometimes, fq2bam runs without any problem (exit status=0):
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.51.03 Driver Version: 575.51.03 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:21:00.0 Off | 0 |
| N/A 30C P0 33W / 250W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB On | 00000000:81:00.0 Off | 0 |
| N/A 30C P0 33W / 250W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
(...)
[PB Info 2025-Sep-10 15:26:49] ------------------------------------------------------------------------------
[PB Info 2025-Sep-10 15:26:49] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2025-Sep-10 15:26:49] || Version 4.5.0-1 ||
[PB Info 2025-Sep-10 15:26:49] || GPU-PBBWA mem, Sorting Phase-I ||
[PB Info 2025-Sep-10 15:26:49] ------------------------------------------------------------------------------
[PB Info 2025-Sep-10 15:26:49] Mode = pair-ended-gpu
[PB Info 2025-Sep-10 15:26:49] Running with 2 GPU(s), using 4 stream(s) per device with 24 worker threads per GPU
(...)
Total Time: 5 minutes 1 second ||
(exit status = 0)
but sometimes I get an error:
GPUS+SLURM+NVIDIA/Parabricks : cudaGetDevice() failed in geting device ID. Status: unknown error, exiting
| NVIDIA-SMI 575.51.03 Driver Version: 575.51.03 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:21:00.0 Off | 0 |
| N/A 32C P0 39W / 250W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB On | 00000000:81:00.0 Off | 0 |
| N/A 29C P0 33W / 250W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found | `
(...)
Mode = pair-ended-gpu
Running with 2 GPU(s), using 1 stream(s) per device with 24 worker threads per GPU
cudaGetDevice() failed in geting device ID. Status: unknown error, exiting.
How can I fix this please ?