Memory issues while running Marking Duplicates and BQSR

Hello,

I started recentyl using Parabricks and run into an issue related to the marking duplicates and BQSR step in the fq2bam method. I hope to get some hints on how to fix this.

I get the following error message:

[PB Info 2025-Jan-20 11:59:15] ------------------------------------------------------------------------------
[PB Info 2025-Jan-20 11:59:15] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2025-Jan-20 11:59:15] ||                              Version 4.4.0-1                             ||
[PB Info 2025-Jan-20 11:59:15] ||                         Marking Duplicates, BQSR                         ||
[PB Info 2025-Jan-20 11:59:15] ------------------------------------------------------------------------------
[PB Info 2025-Jan-20 11:59:22] BQSR using CUDA device(s): { 0 1 }
[PB Info 2025-Jan-20 11:59:25] Using PBBinBamFile for BAM writing
[PB Info 2025-Jan-20 11:59:25] progressMeter -  Percentage
[PB Info 2025-Jan-20 11:59:35] 0.0
[PB Info 2025-Jan-20 11:59:38] Checking if the index file exists for the input compressed file:/knownSites1/1000G_phase1.indels.b37.vcf.gz for better performance
[W::bcf_hdr_check_sanity] GL should be declared as Number=G
[PB Info 2025-Jan-20 11:59:38] Checking if the index file exists for the input compressed file:/knownSites2/Mills_and_1000G_gold_standard.indels.b37.vcf.gz for better performance
[W::hts_idx_load2] The index file is older than the data file: /knownSites2/Mills_and_1000G_gold_standard.indels.b37.vcf.gz.tbi
[PB Info 2025-Jan-20 11:59:38] Checking if the index file exists for the input compressed file:/knownSites3/Mutect2-WGS-panel-b37.vcf.gz for better performance
[PB Info 2025-Jan-20 11:59:45] 0.1
[PB Info 2025-Jan-20 11:59:55] 0.4
[PB Warning 2025-Jan-20 12:00:03][src/PBTempFile.cpp:155] Attempting to allocate host memory above desired limit (798.798905 GB)

My settings are 128 CPUs, 1 TiB RAM and 2x L40S GPUs:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40S                    Off |   00000000:01:00.0 Off |                    0 |
| N/A   32C    P0             87W /  350W |       1MiB /  46068MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L40S                    Off |   00000000:C1:00.0 Off |                    0 |
| N/A   33C    P0             83W /  350W |       1MiB /  46068MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

And my command:

docker run --rm --gpus all --workdir /preprocessing --volume /home/thiele/projects/WGS_Pilot/241218_A01542_0126_AHJY3VDSXC/results/preprocessing/1_FF:/preprocessing/1_FF --volume /home/thiele/projects/WGS_P
ilot/241218_A01542_0126_AHJY3VDSXC/tmp_parabricks:/tmp_dir --volume /home/thiele/projects/WGS_Pilot/241218_A01542_0126_AHJY3VDSXC/raw_data:/fastq --volume /home/thiele/projects/ref_data/homo_sapiens/igenomes/Homo_sapiens/GAT
K/GRCh37/Sequence/BWAIndex:/ref_dir --volume /home/thiele/projects/WGS_Pilot/241218_A01542_0126_AHJY3VDSXC/results/reports/markduplicates/1_FF:/reports/markduplicates/1_FF --volume /home/thiele/projects/WGS_Pilot/241218_A015
42_0126_AHJY3VDSXC/results/reports/preprocessing_qc/1_FF:/reports/preprocessing_qc/1_FF --volume /home/thiele/projects/ref_data/homo_sapiens/igenomes/Homo_sapiens/GATK/GRCh37/Annotation/intervals:/intervals --volume /home/th
iele/projects/ref_data/homo_sapiens/igenomes/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle:/knownSites1 --volume /home/thiele/projects/ref_data/homo_sapiens/igenomes/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle:/knownSite
s2 --volume /home/thiele/projects/ref_data/homo_sapiens/GATK_PoN/hg19:/knownSites3 nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1 pbrun fq2bam --ref /ref_dir/human_g1k_v37_decoy.fasta --in-fq /fastq/1_FF_S1_R1_001.fastq.gz /f
astq/1_FF_S1_R2_001.fastq.gz "@RG\tID:1_FF.1\tSM:P1_1_FF\tLB:1_FF\tPL:illumina\tPU:1" --interval-file /intervals/wgs_calling_regions_Sarek_onlyCanonicalChromosomes.list --knownSites /knownSites1/1000G_phase1.indels.b37.vcf.g
z --knownSites /knownSites2/Mills_and_1000G_gold_standard.indels.b37.vcf.gz --knownSites /knownSites3/Mutect2-WGS-panel-b37.vcf.gz --out-recal-file /preprocessing/1_FF/1_FF.bqsr_report.txt --out-bam /preprocessing/1_FF/1_FF.
recal.bam --out-duplicate-metrics /reports/markduplicates/1_FF/1_FF_mdReport.txt --out-qc-metrics-dir /reports/preprocessing_qc/1_FF --bwa-options="-Y -B 3 -K 100000000" --logfile /preprocessing/1_FF/1_FF.log --tmp-dir /tmp_
dir --memory-limit 500 --bwa-cpu-thread-pool 24 --num-cpu-threads-per-stage 24

I tried also to run it with the --low-memory option, but I got the same error. Thanks for your help.

Best regards,
Mihaela