Run parabricks and found cudaMemGetInfo returned 802

I’m run parabricks with sample data and found
cudaMemGetInfo returned 802
→ system not yet initialized

This command I run
pbrun fq2bam --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz --out-bam output.bam --gpu-devices 0,1,2,3,4,5,6,7 --num-gpus 1


|| Parabricks accelerated Genomics Pipeline ||
|| Version 3.6.1-1.ampere ||
|| GPU-BWA mem, Sorting Phase-I ||
|| Contact: Parabricks-Support@nvidia.com ||

cudaMemGetInfo returned 802
→ system not yet initialized
For technical support, updated user guides and other Parabricks documentation can be found at
Answers to most FAQ’s can be found on the developer forum
Customers with paid Parabricks licenses have direct access to support and can contact EnterpriseSupport@nvidia.com
Users of free evaluation licenses can contact parabricks-eval-support@nvidia.com for troubleshooting any questions.
Exiting…

Could not run fq2bam
Exiting pbrun …

Hey @user120120,

It seems like your CUDA environment might not be set up properly. To test this with Docker you can run:

docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

and it should return the output for nvidia-smi.

Can you try this and let me know what comes back?

Thanks!

I’m install parabricks with singularity 3.7.1-5.1.ohpc.2.1 with this command

./parabricks/installer.py --release 3.6.1-1 --extra-tools --ampere --install-location ampere --container singularity --force

Here’s output from “docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi”

[root@c0 ~]# docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
Unable to find image ‘nvidia/cuda:11.0-base’ locally
11.0-base: Pulling from nvidia/cuda
54ee1f796a1e: Pull complete
f7bfea53ad12: Pull complete
46d371e02073: Pull complete
b66c17bbf772: Pull complete
3642f1a6dfb3: Pull complete
e5ce55b8b4b9: Pull complete
155bc0332b0a: Pull complete
Digest: sha256:774ca3d612de15213102c2dbbba55df44dc5cf9870ca2be6c6e9c627fa63d67a
Status: Downloaded newer image for nvidia/cuda:11.0-base
Sun Jan 9 12:01:48 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM… Off | 00000000:07:00.0 Off | 0 |
| N/A 29C P0 57W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA A100-SXM… Off | 00000000:0B:00.0 Off | 0 |
| N/A 30C P0 58W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 2 NVIDIA A100-SXM… Off | 00000000:48:00.0 Off | 0 |
| N/A 28C P0 54W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 3 NVIDIA A100-SXM… Off | 00000000:4C:00.0 Off | 0 |
| N/A 30C P0 58W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 4 NVIDIA A100-SXM… Off | 00000000:88:00.0 Off | 0 |
| N/A 28C P0 57W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 5 NVIDIA A100-SXM… Off | 00000000:8B:00.0 Off | 0 |
| N/A 30C P0 58W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 6 NVIDIA A100-SXM… Off | 00000000:C8:00.0 Off | 0 |
| N/A 28C P0 57W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 7 NVIDIA A100-SXM… Off | 00000000:CB:00.0 Off | 0 |
| N/A 29C P0 57W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

and this from host

[root@c0 ~]# nvidia-smi
Sun Jan 9 19:14:10 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM… Off | 00000000:07:00.0 Off | 0 |
| N/A 29C P0 57W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA A100-SXM… Off | 00000000:0B:00.0 Off | 0 |
| N/A 30C P0 58W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 2 NVIDIA A100-SXM… Off | 00000000:48:00.0 Off | 0 |
| N/A 28C P0 54W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 3 NVIDIA A100-SXM… Off | 00000000:4C:00.0 Off | 0 |
| N/A 30C P0 58W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 4 NVIDIA A100-SXM… Off | 00000000:88:00.0 Off | 0 |
| N/A 28C P0 58W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 5 NVIDIA A100-SXM… Off | 00000000:8B:00.0 Off | 0 |
| N/A 31C P0 58W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 6 NVIDIA A100-SXM… Off | 00000000:C8:00.0 Off | 0 |
| N/A 29C P0 57W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 7 NVIDIA A100-SXM… Off | 00000000:CB:00.0 Off | 0 |
| N/A 29C P0 57W / 400W | 0MiB / 81251MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Hey @user120120,

It looks like all the GPUs are disabled on the machine for some reason. I would talk to the system admin about that.

It’s also possible that the GPUs are running in MIG mode (multi-instance GPU), which is not supported in Parabricks and can lead to issues. So I would say your plan of attack is:

  1. Figure out if your machine is in MIG mode.
  2. If it is, disable it, and run again
  3. If it’s not, then the GPUs were disables in some other way and need to be re-enabled.

I check in my host, MIG mode are disable

I reinstall parabricks and run, it’s same error

I test on T4 card host parapricks can running
but on ampere card it’s not running

Hey @user120120,

I’m not entirely sure what’s going on here. Can you try running on another GPU? I notice that you’re restricting to GPU 0 in your pbrun command, can you try tunning on GPU 1, or another one? It could be something wrong with that 1 GPU. Everything else that you’re doing seems correct.

I got it.
I install nvidia-frabicmanager for manage nvswitch multi GPU

yum install nvidia-frabicmanager

and run parabricks again
[root@c0 ~]# /opt/parabrick/pbrun fq2bam --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz --out-bam output.bam --gpu-devices 0,1,2,3,4,5,6,7 --num-gpus 8
Please visit NVIDIA Clara Documentation for detailed documentation

[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /root/parabricks_sample/Data/sample_1.fq.gz and
/root/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[PB Info 2022-Jan-13 14:54:37] Logger not initialized!
[PB Info 2022-Jan-13 14:54:37] ------------------------------------------------------------------------------
[PB Info 2022-Jan-13 14:54:37] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2022-Jan-13 14:54:37] || Version 3.7.0-1.ampere ||
[PB Info 2022-Jan-13 14:54:37] || GPU-BWA mem, Sorting Phase-I ||

[PB Info 2022-Jan-13 14:54:37] ------------------------------------------------------------------------------
[PB Info 2022-Jan-13 14:54:39] Logger already initialized, continuing with current settings.
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[PB Info 2022-Jan-13 14:54:40] GPU-BWA mem
[PB Info 2022-Jan-13 14:54:40] ProgressMeter Reads Base Pairs Aligned
[PB Info 2022-Jan-13 14:54:57] 5043564 510000000
[PB Info 2022-Jan-13 14:55:00] 10087128 1190000000
[PB Info 2022-Jan-13 14:55:02] 15130692 1710000000
[PB Info 2022-Jan-13 14:55:05] 20174256 2340000000
[PB Info 2022-Jan-13 14:55:07] 25217820 2810000000
[PB Info 2022-Jan-13 14:55:10] 30261384 3500000000
[PB Info 2022-Jan-13 14:55:13] 35304948 4070000000
[PB Info 2022-Jan-13 14:55:15] 40348512 4710000000
[PB Info 2022-Jan-13 14:55:18] 45392076 5250000000
[PB Info 2022-Jan-13 14:55:21] 50435640 5850000000
[PB Info 2022-Jan-13 14:55:30]
GPU-BWA Mem time: 49.923947 seconds
[PB Info 2022-Jan-13 14:55:30] GPU-BWA Mem is finished.

[main] CMD: PARABRICKS mem -Z ./pbOpts.txt /root/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /root/parabricks_sample/Data/sample_1.fq.gz /root/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[main] Real time: 51.271 sec; CPU: 1516.921 sec
[PB Info 2022-Jan-13 14:55:30] ------------------------------------------------------------------------------
[PB Info 2022-Jan-13 14:55:30] || Program: GPU-BWA mem, Sorting Phase-I ||
[PB Info 2022-Jan-13 14:55:30] || Version: 3.7.0-1.ampere ||
[PB Info 2022-Jan-13 14:55:30] || Start Time: Thu Jan 13 14:54:37 2022 ||
[PB Info 2022-Jan-13 14:55:30] || End Time: Thu Jan 13 14:55:30 2022 ||
[PB Info 2022-Jan-13 14:55:30] || Total Time: 53 seconds ||
[PB Info 2022-Jan-13 14:55:30] ------------------------------------------------------------------------------
[PB Info 2022-Jan-13 14:55:31] ------------------------------------------------------------------------------
[PB Info 2022-Jan-13 14:55:31] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2022-Jan-13 14:55:31] || Version 3.7.0-1.ampere ||
[PB Info 2022-Jan-13 14:55:31] || Sorting Phase-II ||
[PB Info 2022-Jan-13 14:55:31] || Contact: Parabricks-Support@nvidia.com ||
[PB Info 2022-Jan-13 14:55:31] ------------------------------------------------------------------------------
[PB Info 2022-Jan-13 14:55:32] progressMeter - Percentage
[PB Info 2022-Jan-13 14:55:32] 0.0 0.00 GB
[PB Info 2022-Jan-13 14:55:42] Sorting and Marking: 10.000 seconds
[PB Info 2022-Jan-13 14:55:42] ------------------------------------------------------------------------------
[PB Info 2022-Jan-13 14:55:42] || Program: Sorting Phase-II ||
[PB Info 2022-Jan-13 14:55:42] || Version: 3.7.0-1.ampere ||
[PB Info 2022-Jan-13 14:55:42] || Start Time: Thu Jan 13 14:55:31 2022 ||
[PB Info 2022-Jan-13 14:55:42] || End Time: Thu Jan 13 14:55:42 2022 ||
[PB Info 2022-Jan-13 14:55:42] || Total Time: 11 seconds ||
[PB Info 2022-Jan-13 14:55:42] ------------------------------------------------------------------------------
[PB Info 2022-Jan-13 14:55:42] ------------------------------------------------------------------------------
[PB Info 2022-Jan-13 14:55:42] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2022-Jan-13 14:55:42] || Version 3.7.0-1.ampere ||
[PB Info 2022-Jan-13 14:55:42] || Marking Duplicates, BQSR ||
[PB Info 2022-Jan-13 14:55:42] || Contact: Parabricks-Support@nvidia.com ||
[PB Info 2022-Jan-13 14:55:42] ------------------------------------------------------------------------------
[PB Info 2022-Jan-13 14:55:43] progressMeter - Percentage
[PB Info 2022-Jan-13 14:55:53] 100.0 0.00 GB
[PB Info 2022-Jan-13 14:55:53] BQSR and writing final BAM: 10.035 seconds
[PB Info 2022-Jan-13 14:55:53] ------------------------------------------------------------------------------
[PB Info 2022-Jan-13 14:55:53] || Program: Marking Duplicates, BQSR ||
[PB Info 2022-Jan-13 14:55:53] || Version: 3.7.0-1.ampere ||
[PB Info 2022-Jan-13 14:55:53] || Start Time: Thu Jan 13 14:55:42 2022 ||
[PB Info 2022-Jan-13 14:55:53] || End Time: Thu Jan 13 14:55:53 2022 ||
[PB Info 2022-Jan-13 14:55:53] || Total Time: 11 seconds ||
[PB Info 2022-Jan-13 14:55:53] ------------------------------------------------------------------------------