Fq2bam Marking Duplicates, BQSR - high memory use, job killed OOM

Because v4.1.1-1 isn’t handling --align-only and --no-markdups as expected for me (see my other recent posts), fq2bam is running Marking Duplicates, BQSR. However, here I run into memory problems.

Consistent with the stated hardware requirements, I queued a job using Slurm as follows:

#SBATCH --cpus-per-task=24
#SBATCH --gpus-per-task=2
#SBATCH --mem-per-cpu=5g 
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

After properly aligning and sorting, it logged:

[PB Info 2023-Jul-14 12:04:31] ------------------------------------------------------------------------------
[PB Info 2023-Jul-14 12:04:31] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2023-Jul-14 12:04:31] ||                              Version 4.1.1-1                             ||
[PB Info 2023-Jul-14 12:04:31] ||                         Marking Duplicates, BQSR                         ||
[PB Info 2023-Jul-14 12:04:31] ------------------------------------------------------------------------------
[PB Info 2023-Jul-14 12:04:57] progressMeter -  Percentage
[PB Info 2023-Jul-14 12:05:07] 0.9       14.09 GB
[PB Info 2023-Jul-14 12:05:17] 1.6       26.09 GB
[PB Info 2023-Jul-14 12:05:27] 2.4       37.58 GB
[PB Info 2023-Jul-14 12:05:37] 3.2       48.73 GB
[PB Info 2023-Jul-14 12:05:47] 4.5       60.46 GB
[PB Info 2023-Jul-14 12:05:57] 6.0       70.30 GB
[PB Info 2023-Jul-14 12:06:07] 7.0       80.88 GB
[PB Info 2023-Jul-14 12:06:17] 8.7       91.20 GB
[PB Info 2023-Jul-14 12:06:27] 10.1      100.41 GB
[PB Info 2023-Jul-14 12:06:37] 11.7      111.54 GB
slurmstepd: error: Detected 1 oom_kill event in StepId=55846256.batch. Some of the step tasks have been OOM Killed.

Since the job was queued with net 5 x 24 = 120G CPU RAM, it is apparent that memory usage was accumulating and the job killed when it hit the job limit. This raises two issues/questions:

  1. Why does Marking Duplicates, BQSR use so much memory? Is it expected?

  2. More importantly, is there a mechanism to tell fq2bam how much memory is available to it?

The job is running on a shared node, and I do not necessarily have access to all RAM on the machine. Even if I try to ensure that I am the only job running on a node, I still need to provide a memory request, and thus will have a job memory limit.

I read in other posts about an option called --memory-limit, but I do not see it in current documentation so assume it was dropped in recent versions? It seems an important option to me.

(as an aside, for me this issue would become moot if --align-only worked, I don’t actually want to do sorting or dup marking)

While I still think --memory-limit is a valuable option to have, I may have found a workaround for shared clusters.

From Slurm sbatch docs:

–exclusive[={user|mcs}]
The job allocation can not share nodes with other running jobs … the job is allocated all CPUs and GRES on all nodes in the allocation, but is only allocated as much memory as it requested. This is by design to support gang scheduling, because suspended jobs still reside in memory. To request all the memory on a node, use –mem=0.

The last bit I had missed previously - so, it is possible to request all memory on a node without having to provide a specific mem number.

A further discovery, rather by accident. To make #SBATCH --exclusive work meaningfully on our cluster, I had to switch to V100 GPUs with 16G memory (prior were A40), so I added the --low-memory flag. Unexpectedly, one or more of those changes altered the behavior of Marking Duplicates, BQSR. CPU memory no longer accumulates progressively, so no OOM:

[PB Info 2023-Jul-15 22:51:39] ------------------------------------------------------------------------------
[PB Info 2023-Jul-15 22:51:39] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2023-Jul-15 22:51:39] ||                              Version 4.1.1-1                             ||
[PB Info 2023-Jul-15 22:51:39] ||                         Marking Duplicates, BQSR                         ||
[PB Info 2023-Jul-15 22:51:39] ------------------------------------------------------------------------------
[PB Info 2023-Jul-15 22:53:04] progressMeter -  Percentage
[PB Info 2023-Jul-15 22:53:14] 0.2       1.87 GB
[PB Info 2023-Jul-15 22:53:24] 0.8       3.01 GB
[PB Info 2023-Jul-15 22:53:34] 1.9       1.43 GB
[PB Info 2023-Jul-15 22:53:44] 2.7       1.47 GB
[PB Info 2023-Jul-15 22:53:54] 3.1       1.86 GB
[PB Info 2023-Jul-15 22:54:04] 3.8       2.98 GB
[PB Info 2023-Jul-15 22:54:14] 4.5       3.37 GB
[PB Info 2023-Jul-15 22:54:24] 5.5       1.84 GB
...
[PB Info 2023-Jul-15 23:13:44] 97.9      5.48 GB
[PB Info 2023-Jul-15 23:13:54] 98.0      4.35 GB
[PB Info 2023-Jul-15 23:14:04] 98.1      4.08 GB
[PB Info 2023-Jul-15 23:14:14] 100.0     0.00 GB
[PB Info 2023-Jul-15 23:14:14] BQSR and writing final BAM:  1270.353 seconds

It would be great if someone had insight what is happening with this, it doesn’t appear to be documented.

Hello. A few notes:

  1. You can see memory limit in the documentation: fq2bam (BWA-MEM + GATK) - NVIDIA Docs. Scroll down to performance options. The default is actually half of the system installed memory. So, depending on the configuration with slurm we may detect more than what slurm has allocated to you. If you are having issues with host memory usage then I would suggest to provide a certain number of GB manually. For example if you are going to have 60GB allocated to you then to be safe you can provide --memory-limit 40.
  2. --low-memory is a low memory mode for GPUs such as V100 with 16GB of device memory.
  3. What is the issue with --align-only?