Fq2bam sometimes gives an error

Hello, thank you for the fantastic Clara Parabricks! I’m using clara-parabricks:4.5.1-1 with Nvidia Driver Version: 575.57.08 and CUDA Version: 12.9. The server is based on 2x AMD EPYC 9754 and 4x Nvidia L40 GPU, Ubuntu 24.04, NVME SSD, 512GB GDDR6. In general, the genomic pipeline works fine and is very fast, but sometimes when running fq2bam as part of deepvariant_germline, an error occurs:

[PB [31mError [0m 2025-Jun-22 20:56:49][src/internal/gpu_check_error.cu:22] cudaSafeCall() failed at src/internal/gpu_bwa_io.cu/82: invalid argument, exiting.

At the same time, fastq files and reference files are valid, and when re-running fq2bam, everything runs without errors.

Here are the deepvariant_germline launch parameters I use:

docker run --rm \
--name $SAMPLE_NAME \
--gpus all \
--env TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=268435456 \
nvcr.io/nvidia/clara/clara-parabricks:4.5.1-1 \
pbrun deepvariant_germline \
--ref $BWA_MEM_REFERENCE.fasta \
--in-fq $WORKING_DIR/$SAMPLE_NAME/fastq_L01_R1.fastq.gz $WORKING_DIR/$SAMPLE_NAME/fastq_L01_R2.fastq.gz \
--in-fq $WORKING_DIR/$SAMPLE_NAME/fastq_L02_R1.fastq.gz $WORKING_DIR/$SAMPLE_NAME/fastq_L02_R2.fastq.gz \
--out-bam $WORKING_DIR/$SAMPLE_NAME/$SAMPLE_NAME.bam \
--tmp-dir $TEMP_DIR/$SAMPLE_NAME/ \
--bwa-cpu-thread-pool 24 \
--out-variants $WORKING_DIR/$SAMPLE_NAME/$SAMPLE_NAME.deepvariant.vcf \
--num-streams-per-gpu 4 \
--gpusort \
--gpuwrite

I’d be grateful for any advice on how to resolve this error. Thank you!

I found a solution to this problem on my system.

I was running Ubuntu with Xorg, and Xorg was regularly accessing the GPU, which caused random errors during fq2bam analysis.

To fix it, I disabled unnecessary services and blacklisted some kernel modules:

Disable services:

sudo systemctl disable nvidia-persistenced.service
sudo systemctl disable display-manager

Blacklist kernel modules, update initramfs:

echo "blacklist nvidia_drm"     | sudo tee /etc/modprobe.d/blacklist-nvidia-drm.conf
echo "blacklist nvidia_modeset" | sudo tee /etc/modprobe.d/blacklist-nvidia-modeset.conf
sudo update-initramfs -u
sudo reboot

A good sign that everything is set up correctly (all unnecessary services are disabled and there are no unexpected GPU accesses) is a successful reset of GPU using nvidia-smi:

sudo nvidia-smi --gpu-reset
# Output: GPU 00000000:03:00.0 was successfully reset

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.