BAM File Size Increase After Using Performance Options

Hello community,

Why did the size of the ‘HG002.hiseqx.pcr-free.30x.pb.bam’ file increase after using certain performance options?

I ran the ‘germline (GATK Germline Pipeline)’ twice: first according to the official documentation, and second with additional options:
‘–fq2bamfast’, ‘–gpuwrite’, ‘–gpuwrite-deflate-algo 0’, ‘–gpusort’, ‘–num-cpu-threads-per-stage 24’, ‘–bwa-nstreams 8’, ‘–bwa-cpu-thread-pool 24’, ‘–memory-limit 200’, ‘–num-htvc-threads 24’.

Now I see that the size of ‘HG002.hiseqx.pcr-free.30x.pb.bam’ has increased from 66GB to 86GB.

Another question is whether this increase will impact the accuracy and quality of results, or if it is simply due to differences in compression and the files will functionally remain the same?

It is simply due to different compression algorithms. You can decompress the BAMs to SAM files and see that (other than PG lines in the header) the contents will be the same (assuming you gave the same command). You can get a higher compression ratio with --gpuwrite by using --gpuwrite-deflate-algo 3. It should still be faster than CPU compression.

1 Like