Why Does NVIDIA Parabricks Become Slower as More Samples Are Processed?

Dear Parabricks Team,

We are currently using NVIDIA Parabricks for read alignment and variant calling on our server. At the beginning of the analysis, the performance is very good, and each sample can typically be completed within around 30 minutes. However, as the number of processed individuals gradually increases, the running speed becomes significantly slower. Eventually, processing a single sample may take nearly 2 hours.

We would like to ask whether this behavior is expected and what factors may cause the performance degradation over time. For example, could it be related to GPU memory usage, I/O bottlenecks, temporary files, cache accumulation, or multi-sample workload scheduling?

We would greatly appreciate any suggestions on how to diagnose or optimize this issue.

Best regards,

Dong

-rw-r–r-- 1 root root 11000374318 5月 11 02:39 SRR12536543.bam
-rw-r–r-- 1 root root 12452845443 5月 11 02:59 SRR12536545.bam
-rw-r–r-- 1 root root 10018054242 5月 11 03:19 SRR12536548.bam
-rw-r–r-- 1 root root 14206731637 5月 11 03:38 SRR12536549.bam
-rw-r–r-- 1 root root 11383195681 5月 11 04:00 SRR12536550.bam
-rw-r–r-- 1 root root 12530702818 5月 11 04:20 SRR12536551.bam
-rw-r–r-- 1 root root 8361238531 5月 11 04:40 SRR12536556.bam
-rw-r–r-- 1 root root 16805773076 5月 11 05:00 SRR12536559.bam
-rw-r–r-- 1 root root 13090502411 5月 11 05:25 SRR12536567.bam
-rw-r–r-- 1 root root 10493289341 5月 11 05:45 SRR12536568.bam
-rw-r–r-- 1 root root 11442689732 5月 11 06:04 SRR12536577.bam
-rw-r–r-- 1 root root 13212514363 5月 11 06:23 SRR12536586.bam
-rw-r–r-- 1 root root 13209437895 5月 11 06:44 SRR12536589.bam
-rw-r–r-- 1 root root 12191300058 5月 11 07:05 SRR12536591.bam
-rw-r–r-- 1 root root 11826780367 5月 11 07:25 SRR12536597.bam
-rw-r–r-- 1 root root 12063318133 5月 11 07:45 SRR12536603.bam
-rw-r–r-- 1 root root 12164341471 5月 11 08:06 SRR12536604.bam
-rw-r–r-- 1 root root 12731373645 5月 11 08:27 SRR12536606.bam
-rw-r–r-- 1 root root 10942247544 5月 11 08:49 SRR12536607.bam
-rw-r–r-- 1 root root 13404421986 5月 11 09:10 SRR12536609.bam
-rw-r–r-- 1 root root 10597886187 5月 11 09:32 SRR12536610.bam
-rw-r–r-- 1 root root 10624503367 5月 11 09:51 SRR12536616.bam
-rw-r–r-- 1 root root 34811864676 5月 11 10:24 SRR30641509.bam
-rw-r–r-- 1 root root 34701229920 5月 11 11:19 SRR30641510.bam
-rw-r–r-- 1 root root 32175847104 5月 11 12:12 SRR30641511.bam
-rw-r–r-- 1 root root 38531636329 5月 11 13:07 SRR30641513.bam
-rw-r–r-- 1 root root 36435131694 5月 11 14:08 SRR30641514.bam
-rw-r–r-- 1 root root 51901737531 5月 11 15:33 SRR30641516.bam
-rw-r–r-- 1 root root 38869605025 5月 11 20:01 SRR30641517.bam
-rw-r–r-- 1 root root 33981825306 5月 11 22:53 SRR30641518.bam
-rw-r–r-- 1 root root 30237453312 5月 12 00:42 SRR30641519.bam
-rw-r–r-- 1 root root 28756700909 5月 12 01:52 SRR30641521.bam
-rw-r–r-- 1 root root 31218578227 5月 12 02:57 SRR30641522.bam
-rw-r–r-- 1 root root 29072439853 5月 12 08:11 SRR30641536.bam
-rw-r–r-- 1 root root 36868536102 5月 12 09:02 SRR30641537.bam
-rw-r–r-- 1 root root 33254129590 5月 12 09:58 SRR30641538.bam
-rw-r–r-- 1 root root 42117299153 5月 12 10:54 SRR30641540.bam
-rw-r–r-- 1 root root 31988139145 5月 12 11:52 SRR30641542.bam
-rw-r–r-- 1 root root 36543920568 5月 12 12:45 SRR30641545.bam
-rw-r–r-- 1 root root 35132376402 5月 12 13:43 SRR30641546.bam
-rw-r–r-- 1 root root 30056611199 5月 12 14:37 SRR30641547.bam
-rw-r–r-- 1 root root 29005285556 5月 12 15:25 SRR30641551.bam
-rw-r–r-- 1 root root 44781118453 5月 12 16:19 SRR30641552.bam
-rw-r–r-- 1 root root 33996836257 5月 12 17:26 SRR30641555.bam
-rw-r–r-- 1 root root 36361199649 5月 12 18:24 SRR30641557.bam
-rw-r–r-- 1 root root 35178740047 5月 12 19:23 SRR30641558.bam
-rw-r–r-- 1 root root 42369496380 5月 12 20:24 SRR30641559.bam

Hello @1004803499 , can you please provide the following details:

  1. Which tool/pipeline specifically is being used?
  2. System information (GPU type/number, CPU type/number, Memory available)
  3. How are multiple samples being processed? Are they processed one after the other in a loop or in parallel via multiple processes?
  4. The exact command you are using to launch a Parabricks job?

Thank you!

Hi @1004803499 , another point to note is that the performance of Parabricks does depend on the size and complexity of the input. Based on the output you have pasted here, the SRR30… files are significantly larger than the SRR125… files on Home - SRA - NCBI. This could be another reason why each successive output file takes longer to be written.