Help with performance from Parabricks on SLURM HPC cluster using Snakemake

theJJ · February 7, 2025, 10:06am

Hi,

I am trying to align Illumina 30x paired-end FASTQ files to CHM13v2.0 (T2T) reference. I am using Parabricks v4.4.0. According to NVIDIA and some benchmarks out there, it should be possible to analyze 30x data in under an hour. However, when I run it on my university’s HPC, it takes too long to complete and the GPUs appear to be underutilized.

I am using Snakemake’s SLURM executor plugin to submit the jobs, using Snakemake’s apptainer integration to run in the Parabricks Docker container (see below).
I followed NVIDIA’s guide for best performance for fq2bam.
I request 4 A100 GPUs, 196GB CPU memory, and 32 CPU threads. According to this benchmark and this one too, 4 A100 GPUs can run the germline pipeline in a little over an hour.

However, when I run fq2bam and come back after 3-5 hours, it is still in Sorting Phase-I. I also received a warning from the HPC admin about GPU underutilization (see attached image - is this normal?)

Any idea why I am not getting good performance? Why could it be taking so long when I run it?

Job rule:

rule run_parabricks_alignment:
    input:
        fq_1=f"{work_dir}/inputs/{{sample}}_R1.fastq.gz",
        fq_2=f"{work_dir}/inputs/{{sample}}_R2.fastq.gz",
    output:
        f"{work_dir}/outputs/{{sample}}_markdup.bam",
    log:
        f"{work_dir}/logs/run_parabricks_alignment/{{sample}}.log",
    container:
        "docker://nvcr.io/nvidia/clara/clara-parabricks:4.4.0-1"
    params:
        reference=f"{work_dir}/resources/reference/chm13v2.0.fa",
        pixel_distance=2500,
        out_dir=f"{work_dir}/outputs",
    shell:
        "(pbrun fq2bam "
        "--ref {params.reference} "
        "--in-fq {input.fq_1} {input.fq_2} "
        "--out-bam {output} "
        "--out-duplicate-metrics {params.out_dir}/{wildcards.sample}_duplicate-metrics.txt "
        "--out-qc-metrics-dir {params.out_dir}/{wildcards.sample}_qc-metrics "
        f"--tmp-dir {work_dir}/tmp "
        "--bwa-options='-M' "
        "--fix-mate "
        "--optical-duplicate-pixel-distance {params.pixel_distance} "
        "--gpusort --gpuwrite "
        ")2> {log}"

Snakemake parameters:

jobs: 10
executor: slurm
use-conda: true
use-apptainer: true

set-resources:
  run_parabricks_alignment:
    slurm_extra: "'--gpus=4' '--cpus-per-task=32' '--qos=normal'"
    slurm_partition: "gpu-a100"
    mem: "196GB"
    time: 480

Topic		Replies	Views
Parabrick rna_fq2bam are running for several hours Parabricks	1	1064	May 23, 2023
Out-of-memory errors running pbrun fq2bam through singularity on A100s via slurm Parabricks ai	2	1373	January 19, 2023
Clara-parabricks:4.3.1 fq2bam_meth is slow Parabricks ai	1	173	July 11, 2024
PARABRICKS mem from pbrun germline command hanging and not finishing Parabricks	7	1493	July 5, 2022
Struggling to produce identical results between Parabricks fq2bam (with apptainer) and bwa and gatk Parabricks ai	2	1009	February 12, 2023
Problem with gpu Parabricks ai	12	2528	November 1, 2024
Fq2bam Marking Duplicates, BQSR - high memory use, job killed OOM Parabricks	3	1253	July 2, 2024
Fq2bam Error Received signal: 11 Parabricks cuda , ai	3	1560	May 4, 2023
Fq2bam: Unexpected Issue #1, Return code: 2 Parabricks	4	831	February 17, 2021
Fq2bam on GCP Parabricks ai , fq2bam	10	73	December 11, 2024

Help with performance from Parabricks on SLURM HPC cluster using Snakemake

Related topics