Clara-parabricks_4.1.0-1.sif can not recognize A100 cards?

wei.xing · June 1, 2023, 6:31am

hi, trying to run fq2bam with a A100 node, get this error message:

[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /home/crick/working/haplotypecalling/Data/sample_1.fq.gz and
/home/crick/working/haplotypecalling/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[PB Info 2023-Jun-01 00:50:13] ------------------------------------------------------------------------------
[PB Info 2023-Jun-01 00:50:13] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2023-Jun-01 00:50:13] || Version 4.1.0-1 ||
[PB Info 2023-Jun-01 00:50:13] || GPU-BWA mem, Sorting Phase-I ||
[PB Info 2023-Jun-01 00:50:13] ------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[PB Error 2023-Jun-01 00:50:14][ParaBricks/src/pbOpts.cu:107] Bad argument value: Number of GPUs requested (4) is more than number of GPUs (0in the system., exiting.
For technical support visit Help - NVIDIA Docs
Exiting…

Could not run fq2bam
Exiting pbrun …

===
on the host:

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+

In the container:

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+
Singularity>

Any clue for it?

Thanks,

Wei

wei.xing · June 1, 2023, 7:05am

btw, I have --nv and --num-gpus in my code. My code works with V100 cards. Also in sigularity.config I modified “always use nv = yes” “always use rocm = yes”

daniel.amsel · June 6, 2023, 10:11am

Hi,
I observed the exact same error on 4xA100 cards, running on Docker.

daniel.amsel · June 6, 2023, 1:03pm

It works now for me.
I re-installed cuda 12 correctly and disabled MIG service.

mdemouth · June 6, 2023, 1:34pm

Thanks,

Parabricks does not support MIG mode.

wangyanni541 · July 2, 2024, 3:25pm

Hi, Have you already resolved this issue? I encountered the same problem after upgrading CUDA and the driver. Additionally, I am just a regular user on our HPC, so I cannot reinstall CUDA.

wangyanni541 · July 2, 2024, 3:30pm

MIG service on my device has been disabled, but i meet the same issue, I have a very urgent task that needs to be completed. Can someone please help me?

andjoseph · July 2, 2024, 3:35pm

Can you share more details? Please share full command you ran and the full output.

What driver/CUDA version was working previously? What version of Parabricks?

wangyanni541 · July 2, 2024, 3:44pm

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+
MY command:
pbrun fq2bam_meth --ref hg38.fa --in-fq …/fq2bam_test/SRR13749850_1.fastq.gz …/fq2bam_test/SRR13749850_1.fastq.gz --out-bam SRR13749850.bam --num-gpus 1
my decker command : docker run -it --gpus=‘“device=0,1”’ -e NVIDIA_VISIBLE_DEVICES=0,1 --volume /home/wang_yanni/sc_meth_ATAC/integration/ATAC_Meth/natu re_data_atac_meth/fq2bam:/mydata 74f2b983a773 /bin/bash

wangyanni541 · July 2, 2024, 3:48pm

The error
root@5b45d44a0a2e:/mydata/fq2bam_meth# pbrun fq2bam_meth --ref hg38.fa --in-fq …/fq2bam_test/SRR13749850_1.fastq.gz …/fq2bam_test/SRR13749850_1.fastq. gz --out-bam SRR13749850.bam
Please visit NVIDIA Clara - NVIDIA Docs for detailed documentation

[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /mydata/fq2bam_test/SRR13749850_1.fastq.gz and
/mydata/fq2bam_test/SRR13749850_1.fastq.gz
[Parabricks Options Mesg]: @RG\tID:SRR13749850.1.1\tLB:lib1\tPL:bar\tSM:sample\tPU:SRR13749850.1.1
[PB Info 2024-Jul-02 14:26:45] ------------------------------------------------------------------------------
[PB Info 2024-Jul-02 14:26:45] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2024-Jul-02 14:26:45] || Version 4.3.1-1 ||
[PB Info 2024-Jul-02 14:26:45] || GPU-PBBWA mem, Sorting Phase-I ||
[PB Info 2024-Jul-02 14:26:45] ------------------------------------------------------------------------------
[PB Info 2024-Jul-02 14:26:45] Mode = pair-ended-gpu
[PB Info 2024-Jul-02 14:26:45] Running with 8 GPU(s), using 4 stream(s) per device with 16 worker threads per GPU
[PB Info 2024-Jul-02 14:26:55] # 0 0 0 0 0 0 pool: 0 0 bases/GPU/minute: 0.0
[PB Info 2024-Jul-02 14:27:04] Time spent reading: 0.008965 seconds
[PB Error 2024-Jul-02 14:27:05][src/internal/bwa_lib_context.cu:86] cudaGetDevice() failed in geting device ID. Status: system not yet initialized, exit ing.
For technical support visit NVIDIA Clara - NVIDIA Docs
Exiting…

andjoseph · July 2, 2024, 4:39pm

-e NVIDIA_VISIBLE_DEVICES=0,1 - this docker flag is wrong. It should be:

CUDA_VISIBLE_DEVICES

You can remove this part from the command though. We suggest just using this command:

docker run -it --gpus '"device=0,1"' --volume /home/wang_yanni/sc_meth_ATAC/integration/ATAC_Meth/natu re_data_atac_meth/fq2bam:/mydata 74f2b983a773 /bin/bash

You should also confirm things look correct in the container by running nvidia-smi once inside

wangyanni541 · July 2, 2024, 5:29pm

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+
then I ran: pbrun fq2bam --ref /mydata/hg38.fa --in-fq /mydata/fq2bam_test/SRR13749850_1.fastq.gz /mydata/fq2bam_test/SRR13749850_2.fastq.gz --out-bam /mydata/SRR13749850_new.bam --no-markdups --num-gpus 1 , generating the same error:
root@c3a39e4fa403:/mydata# pbrun fq2bam --ref /mydata/hg38.fa --in-fq /mydata/fq2bam_test/SRR13749850_1.fastq.gz /mydata/fq2bam_test/SRR13749850_2.fastq.gz --out-bam /mydata/SRR13749850_new.bam --no-markdups --num-gpus 1
Please visit NVIDIA Clara - NVIDIA Docs for detailed documentation

[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /mydata/fq2bam_test/SRR13749850_1.fastq.gz and
/mydata/fq2bam_test/SRR13749850_2.fastq.gz
[Parabricks Options Mesg]: @RG\tID:SRR13749850.1.1\tLB:lib1\tPL:bar\tSM:sample\tPU:SRR13749850.1.1
[PB Info 2024-Jul-02 17:14:46] ------------------------------------------------------------------------------
[PB Info 2024-Jul-02 17:14:46] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2024-Jul-02 17:14:46] || Version 4.0.0-1 ||
[PB Info 2024-Jul-02 17:14:46] || GPU-BWA mem, Sorting Phase-I ||
[PB Info 2024-Jul-02 17:14:46] ------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[PB Error 2024-Jul-02 17:15:06][ParaBricks/src/pbOpts.cu:132] Bad argument value: Number of GPUs requested (1) is more than number of GPUs (0in the system., exiting.
For technical support visit Help - NVIDIA Docs
Exiting…

Could not run fq2bam
Exiting pbrun …

andjoseph · July 2, 2024, 5:45pm

Once inside the container, what does this print?

echo $CUDA_VISIBLE_DEVICES

Topic		Replies	Views
Could not run fq2bam as part of germline pipeline (Version 4.0.1-1 ) Parabricks ai , nvidia-smi , fq2bam	11	161	December 9, 2024
"Could not run fq2bam" Is the only verbose output from Parabricks 4.4.0-1 and 4.3.2-1 on tutorial data Parabricks ai , demos-and-tutorials , fq2bam	15	205	March 3, 2025
Run parabricks and found cudaMemGetInfo returned 802 Parabricks	8	1589	January 13, 2022
Problem with gpu Parabricks ai	12	2508	November 1, 2024
Failed: CUDA driver version is insufficient for CUDA runtime version Parabricks cuda , containers , ai , driver	8	2065	November 21, 2023
Fq2bam Error Received signal: 11 Parabricks cuda , ai	3	1555	May 4, 2023
Error with clara-parabricks:4.0.1-1 fq2bam - Bad argument value: Number of GPUs requested is more than number of gpus in system Parabricks ai	4	1039	June 6, 2023
Out-of-memory errors running pbrun fq2bam through singularity on A100s via slurm Parabricks ai	2	1369	January 19, 2023
Fq2bam on GCP Parabricks ai , fq2bam	10	69	December 11, 2024
Error running parabricks on gcp Parabricks ai	2	1295	June 4, 2024

Clara-parabricks_4.1.0-1.sif can not recognize A100 cards?

Related topics