"Could not run fq2bam" Is the only verbose output from Parabricks 4.4.0-1 and 4.3.2-1 on tutorial data

michal66 · November 28, 2024, 10:03am

Hello There

I have encountered and issue that disallowes me to use parabricks, as it simply does not work, when used as presented in the fq2bam tutorial.

My computational environment consists of 16vCPUs, 256GB of RAM and 1 Tesla A100 (80GB).

Here is my printout from the nvidia-smi:

Thu Nov 28 10:00:04 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          Off | 00000000:00:10.0 Off |                    0 |
| N/A   23C    P0              49W / 500W |      0MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

My command, for running the fq2bam does not differ greatly from the original one from the tutorial (which also does not work properly) that i have tried, returns the least informative log i have had ever see. Literally no information can be inferred from that printout.

Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation


/usr/local/parabricks/run_pb.py fq2bam --num-gpus 1 --x3 --verbose --memory-limit 60 --bwa-options=-K 19 --ref /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq /workdir/parabricks_sample/Data/sample_1.fq.gz /workdir/parabricks_sample/Data/sample_2.fq.gz --out-bam /outputdir/fq2bam_output.bam --tmp-dir //1U548PMS

[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /workdir/parabricks_sample/Data/sample_1.fq.gz and
/workdir/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
/usr/local/parabricks/binaries/bin/pbbwa /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --mode pair-ended-gpu /workdir/parabricks_sample/Data/sample_1.fq.gz /workdir/parabricks_sample/Data/sample_2.fq.gz -R @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 --nGPUs 1 --nstreams 4 --cpu-thread-pool 16 -K 19 -F 0 --min-read-size 1 --max-read-size 480 --markdups --write-bin --verbose
For technical support visit https://docs.nvidia.com/clara/index.html#parabricks
Exiting...

Could not run fq2bam
Exiting pbrun ...

And the command that I have used is here:

docker run --gpus all --rm --volume $(pwd):/workdir --volume $(pwd):/outputdir nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1 pbrun fq2bam --num-gpus 1 --x3 --verbose --memory-limit 60 --bwa-options="-K 19" --ref /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq /workdir/parabricks_sample/Data/sample_1.fq.gz /workdir/parabricks_sample/Data/sample_2.fq.gz --out-bam /outputdir/fq2bam_output.bam

What would be the solution for that issue?

michal66 · November 29, 2024, 8:44am

Hi there nvidia.
Are you even working on support for developers?
There are a lot of problems just like mine, that have been ghosted for even longer than 24 hours.
It’s not even funny, since you are marketing your solutions as “for Healthcare & Life Sciences”. All of life sciences professionals would agree, that such form of support is unacceptable, especially since your software and tutorials seems to be quite erroneous and counter intuitive.

It would be greatly appreciated if you would respond with something that could be usefull.

michal66 · December 2, 2024, 11:15am

Hi nvidia,
Any news on troubleshooting of your own tutorial?

dpuleri · December 2, 2024, 11:45pm

Hello, apologies for the late reply. We are based in the US and these posts came during the Thanksgiving holiday.

We double-checked and the command from the tutorial works as displayed on FQ2BAM Tutorial - NVIDIA Docs. The exact directories and directory structure may need to be changed to match your directory structure.

Looking at the particular command you ran, you added --bwa-options="-K 19". The option -K # for bwa-mem is a hidden parameter that controls the chunk size (in terms of number of bases) that is used for the window to determine the best alignment with output for paired-alignment. The number 19 is not bad, just very small and will severely hamper performance. Using a larger parameter like --bwa-options="-K 10000000" would yield better performance and still maintain reproducibility if you pass the same parameter to another run. However, that is not causing the crash you’re seeing.

The strange thing with your run is that the actual binary is not being run. It seems to be getting a signal from docker or the host OS before getting to that stage (that is why fq2bam says “Exiting…” because it received a signal). Usually the signal that is received which could cause this printout is an out of memory signal but 256GB should be more than enough.

Could you confirm that the sample data downloaded correctly? The md5sum I see for parabricks_sample.tar.gz is 05b51303a7b9939c9232f88e7ecd1444.

One other idea would be to explicitly set the temporary directory for work to be --tmp-dir /workdir. Although we have tested and we will print out an informative error if the disk is out of space.

michal66 · December 3, 2024, 9:11am

Hi,
I do confirm that the md5sum is 05b51303a7b9939c9232f88e7ecd1444
The command
docker run --gpus all --rm --volume $(pwd):/workdir --volume $(pwd):/outputdir nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1 pbrun fq2bam --num-gpus 1 --x3 --verbose --memory-limit 60 --bwa-options="-K 10000000" --tmp-dir /workdir --ref /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq /workdir/parabricks_sample/Data/sample_1.fq.gz /workdir/parabricks_sample/Data/sample_2.fq.gz --out-bam /outputdir/fq2bam_output.bam
have returned same output as before:

Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation


/usr/local/parabricks/run_pb.py fq2bam --num-gpus 1 --x3 --verbose --memory-limit 60 --bwa-options=-K 10000000 --tmp-dir /workdir/OHF980NE --ref /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq /workdir/parabricks_sample/Data/sample_1.fq.gz /workdir/parabricks_sample/Data/sample_2.fq.gz --out-bam /outputdir/fq2bam_output.bam

[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /workdir/parabricks_sample/Data/sample_1.fq.gz and
/workdir/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
/usr/local/parabricks/binaries/bin/pbbwa /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --mode pair-ended-gpu /workdir/parabricks_sample/Data/sample_1.fq.gz /workdir/parabricks_sample/Data/sample_2.fq.gz -R @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 --nGPUs 1 --nstreams 4 --cpu-thread-pool 16 -K 10000000 -F 0 --min-read-size 1 --max-read-size 480 --markdups --write-bin --verbose
For technical support visit https://docs.nvidia.com/clara/index.html#parabricks
Exiting...

Could not run fq2bam
Exiting pbrun ...

Moreover, when i check the config of nvidia container toolkit, everything seems to be fine, and the cuda sample tests are passed.

$ docker run --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2


[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done


$ docker run --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:devicequery



/cuda-samples/sample Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA A100-SXM4-80GB"
  CUDA Driver Version / Runtime Version          12.5 / 12.5
  CUDA Capability Major/Minor version number:    8.0
  Total amount of global memory:                 81038 MBytes (84974239744 bytes)
  (108) Multiprocessors, (064) CUDA Cores/MP:    6912 CUDA Cores
  GPU Max Clock rate:                            1410 MHz (1.41 GHz)
  Memory Clock rate:                             1593 Mhz
  Memory Bus Width:                              5120-bit
  L2 Cache Size:                                 41943040 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        167936 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 16
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.5, CUDA Runtime Version = 12.5, NumDevs = 1
Result = PASS

dpuleri · December 3, 2024, 2:48pm

Thanks for checking your CUDA install. That was going to be my next question.

Can you do one more run that gets killed and then right after the run check your system log to see what happened:

sudo journalctl -k --since "10 minutes ago" | grep "Killed process"

For example, I forced one of my fq2bam processes to be killed by not giving it enough memory and I got the following:

sudo journalctl -k --since "10 minutes ago" | grep "Killed process"
Dec 03 09:43:01 computer kernel: Memory cgroup out of memory: Killed process 697849 (pbbwa) total-vm:5651832kB, anon-rss:5212224kB, file-rss:7296kB, shmem-rss:0kB, UID:0 pgtables:10296kB oom_score_adj:0

michal66 · December 4, 2024, 8:26am

Hi,
Unfortunetely the journalctl does not return anything, as no process has been killed.
On the other hand, there is a new printout while trying to run the process within the container.

After entering the container with
$ docker run --gpus all -it --rm --volume $(pwd):/workdir --volume $(pwd):/outputdir nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1 /bin/bash

and running this command inside of it
/usr/local/parabricks/binaries/bin/pbbwa /workdir/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --mode pair-ended-gpu /workdir/parabricks_sample/Data/sample_1.fq.gz /workdir/parabricks_sample/Data/sample_2.fq.gz -R @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 --nGPUs 1 --nstreams 4 --cpu-thread-pool 16 -K 10000000 -F 0 --min-read-size 1 --max-read-size 480 --markdups --write-bin --verbose

The output I’m getting is pointing directly to the culprit
/usr/local/parabricks/binaries/bin/pbbwa: error while loading shared libraries: libfilehandle.so: cannot open shared object file: No such file or directory

This seems like the parabricks container in version 4.4.0-1 has not been created properly and there are errors within shared libraries.

dpuleri · December 4, 2024, 2:28pm

We set the LD_LIBRARY_PATH when running through pbrun. That is the entry point for users. The shared libraries are correct in the container and all paths will be set if you run through pbrun.

We have not had other reports of users having issues with the container.

Can you give a full description of your environment? Are you running in the cloud? If so, what image, instance, etc. On-prem? What OS? CUDA driver version? Can you confirm that the container was downloaded correctly by confirming the sha256sum?

If you have NVIDIA Enterprise Support, the best way to get the issue resolved is by contacting our dedicated Enterprise Support Team. You can contact our support team via web portal, phone, or web form. You can find this information listed on our Enterprise Customer Support page.

Topic		Replies	Views
Clara-parabricks_4.1.0-1.sif can not recognize A100 cards? Parabricks ai	12	1125	July 2, 2024
Fq2bam Error Received signal: 11 Parabricks cuda , ai	3	1468	May 4, 2023
Could not run fq2bam as part of germline pipeline (Version 4.0.1-1 ) Parabricks ai , nvidia-smi , fq2bam	11	97	December 9, 2024
Fq2bam on GCP Parabricks ai , fq2bam	10	31	December 11, 2024
Run parabricks and found cudaMemGetInfo returned 802 Parabricks	8	1582	January 13, 2022
Problem with gpu Parabricks ai	12	2228	November 1, 2024
Failed: CUDA driver version is insufficient for CUDA runtime version Parabricks cuda , containers , ai , driver	8	1999	November 21, 2023
[Nvidia/Parabricks] Does pbrun support GPU options? Parabricks	3	1033	October 12, 2021
Failed to run pbrun deepvariant Parabricks	2	925	July 9, 2021
gpu computing sdk 4.0 runtime failures build the sdk succesfully, but the run of any exe failed CUDA Programming and Performance	3	2793	August 8, 2011

"Could not run fq2bam" Is the only verbose output from Parabricks 4.4.0-1 and 4.3.2-1 on tutorial data

Hello There

Related topics