Hi @gburnett
I am using Parabricks 4.2.0 with very huge WGS files and I am frequently getting these errors with bamsort
[/home/jenkins/agent/workspace/parabricks-branch-build//sortcommon/inc/compressfile.h:439] LZ4 decompression returned an error code of -1, expected decBytes > 0, exiting.
For technical support visit https://docs.nvidia.com/clara/parabricks/4.2.0/index.html
I am not sure where to start looking at to start debugging this.
Thanks.
Hey @avenkatraman,
Can you give me an idea of how big? In terms of GB or reads?
Thank you
Hi @gburnett
The bam is somewhere around 160 GB to 180 GB. I am using g4dn.12xlarge
pbrun bamsort \
--ref ref_index.fa \
--in-bam ${sample_id}.bam \
--out-bam ${sample_id}.sorted.bam \
--logfile ${sample_id}.bamsort.log.txt \
--sort-order coordinate \
--max-records-in-ram 25000000 \
--num-zip-threads 10 --num-sort-threads 6 \
--verbose --x3 \
--num-gpus 4
Thanks
Hi @avenkatraman,
I have just reached out to the engineering team to ask if there are any limits on bam size for bamsort. In the meantime, I see that you’ve already reduced the --max-records-in-ram, which was going to be my next suggestion.
Hi @gburnett
Will it help if I share the full workflow
Thanks
This will be sufficient for now, we are looking into it.
Hi @gburnett
I was wondering if you have any updates on the above.
One other question - NVIDIA Clara Parabricks | NVIDIA NGC
- on above page, the left panel says “Modified - 24-Oct-2023” - what is this referring to
- the docker tag for 4.2.0-1 says “10-Oct-2023 - 2:42 PM”
Did the docker image get updated?
Thanks
This particular error is associated with the decompression of data and suggests that the decompression process failed, possibly due to an unexpected condition.
In such cases, it’s important to investigate the following factors:
- Data Integrity: Ensure that the input data (WGS files) is not corrupted. Errors in data storage or transfer can lead to decompression issues.
- System Resources: Large WGS files demand significant system resources. Verify that you have enough memory and processing power to handle the data efficiently.
- Software Version: Consider updating Parabricks to the latest version or checking for any available patches or bug fixes that might address this specific error.
- Configuration: Review the configuration settings for the bamsort process to ensure they are optimized for large data sets.
- Log Files: Check for any additional log files or error messages that might provide more context on what caused the LZ4 decompression error.
- Community Support: It’s also a good idea to seek assistance from the Parabricks community or support channels, as they may have encountered similar issues and can provide specific guidance.
By addressing these factors, you can work towards resolving the LZ4 decompression error and optimizing your use of Parabricks for processing very large WGS files effectively. If you need further assistance or have more specific information about your setup, please feel free to share, and the community can provide more targeted support.