Hello,
I started recentyl using Parabricks and run into an issue related to the marking duplicates and BQSR step in the fq2bam method. I hope to get some hints on how to fix this.
I get the following error message:
[PB Info 2025-Jan-20 11:59:15] ------------------------------------------------------------------------------
[PB Info 2025-Jan-20 11:59:15] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2025-Jan-20 11:59:15] || Version 4.4.0-1 ||
[PB Info 2025-Jan-20 11:59:15] || Marking Duplicates, BQSR ||
[PB Info 2025-Jan-20 11:59:15] ------------------------------------------------------------------------------
[PB Info 2025-Jan-20 11:59:22] BQSR using CUDA device(s): { 0 1 }
[PB Info 2025-Jan-20 11:59:25] Using PBBinBamFile for BAM writing
[PB Info 2025-Jan-20 11:59:25] progressMeter - Percentage
[PB Info 2025-Jan-20 11:59:35] 0.0
[PB Info 2025-Jan-20 11:59:38] Checking if the index file exists for the input compressed file:/knownSites1/1000G_phase1.indels.b37.vcf.gz for better performance
[W::bcf_hdr_check_sanity] GL should be declared as Number=G
[PB Info 2025-Jan-20 11:59:38] Checking if the index file exists for the input compressed file:/knownSites2/Mills_and_1000G_gold_standard.indels.b37.vcf.gz for better performance
[W::hts_idx_load2] The index file is older than the data file: /knownSites2/Mills_and_1000G_gold_standard.indels.b37.vcf.gz.tbi
[PB Info 2025-Jan-20 11:59:38] Checking if the index file exists for the input compressed file:/knownSites3/Mutect2-WGS-panel-b37.vcf.gz for better performance
[PB Info 2025-Jan-20 11:59:45] 0.1
[PB Info 2025-Jan-20 11:59:55] 0.4
[PB Warning 2025-Jan-20 12:00:03][src/PBTempFile.cpp:155] Attempting to allocate host memory above desired limit (798.798905 GB)
My settings are 128 CPUs, 1 TiB RAM and 2x L40S GPUs:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S Off | 00000000:01:00.0 Off | 0 |
| N/A 32C P0 87W / 350W | 1MiB / 46068MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40S Off | 00000000:C1:00.0 Off | 0 |
| N/A 33C P0 83W / 350W | 1MiB / 46068MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
And my command:
docker run --rm --gpus all --workdir /preprocessing --volume /home/thiele/projects/WGS_Pilot/241218_A01542_0126_AHJY3VDSXC/results/preprocessing/1_FF:/preprocessing/1_FF --volume /home/thiele/projects/WGS_P
ilot/241218_A01542_0126_AHJY3VDSXC/tmp_parabricks:/tmp_dir --volume /home/thiele/projects/WGS_Pilot/241218_A01542_0126_AHJY3VDSXC/raw_data:/fastq --volume /home/thiele/projects/ref_data/homo_sapiens/igenomes/Homo_sapiens/GAT
K/GRCh37/Sequence/BWAIndex:/ref_dir --volume /home/thiele/projects/WGS_Pilot/241218_A01542_0126_AHJY3VDSXC/results/reports/markduplicates/1_FF:/reports/markduplicates/1_FF --volume /home/thiele/projects/WGS_Pilot/241218_A015
42_0126_AHJY3VDSXC/results/reports/preprocessing_qc/1_FF:/reports/preprocessing_qc/1_FF --volume /home/thiele/projects/ref_data/homo_sapiens/igenomes/Homo_sapiens/GATK/GRCh37/Annotation/intervals:/intervals --volume /home/th
iele/projects/ref_data/homo_sapiens/igenomes/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle:/knownSites1 --volume /home/thiele/projects/ref_data/homo_sapiens/igenomes/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle:/knownSite
s2 --volume /home/thiele/projects/ref_data/homo_sapiens/GATK_PoN/hg19:/knownSites3 nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1 pbrun fq2bam --ref /ref_dir/human_g1k_v37_decoy.fasta --in-fq /fastq/1_FF_S1_R1_001.fastq.gz /f
astq/1_FF_S1_R2_001.fastq.gz "@RG\tID:1_FF.1\tSM:P1_1_FF\tLB:1_FF\tPL:illumina\tPU:1" --interval-file /intervals/wgs_calling_regions_Sarek_onlyCanonicalChromosomes.list --knownSites /knownSites1/1000G_phase1.indels.b37.vcf.g
z --knownSites /knownSites2/Mills_and_1000G_gold_standard.indels.b37.vcf.gz --knownSites /knownSites3/Mutect2-WGS-panel-b37.vcf.gz --out-recal-file /preprocessing/1_FF/1_FF.bqsr_report.txt --out-bam /preprocessing/1_FF/1_FF.
recal.bam --out-duplicate-metrics /reports/markduplicates/1_FF/1_FF_mdReport.txt --out-qc-metrics-dir /reports/preprocessing_qc/1_FF --bwa-options="-Y -B 3 -K 100000000" --logfile /preprocessing/1_FF/1_FF.log --tmp-dir /tmp_
dir --memory-limit 500 --bwa-cpu-thread-pool 24 --num-cpu-threads-per-stage 24
I tried also to run it with the --low-memory
option, but I got the same error. Thanks for your help.
Best regards,
Mihaela