Cnvkit not accelerated at all

I successfully run CNVkit with Parabricks v3.6 on 2 GPU and I obtained some output.
However, it doesn’t seem to be accelerated at all for 2 reasons:

  1. it works even though using 0 gpu devices and at the same speed than using more than 1 gpu device
  2. running the same steps with the cpu version of cnvkit is even faster (at least 1.5x than running on our dgx01), even the coverage calculation from read depths - as stated in the documentation.

Can you please help me in order to explain such discrepancies?
How should one expect to have CNVkit being accelerated? Do I miss some options, maybe?

BTW, here is my command line:

pbrun cnvkit --ref hg38.fa --in-bam myinput.bam --keep-tmp --tmp-dir /raid/tmp --generate-vcf --output-dir .

Many thanks