Could not run fq2bam as part of germline pipeline (Version 4.0.1-1 )

Hi there,

I have some problems for Parabricks Version 4.0.1-1. Is any one can help? Thank you very much.

Here is full log message

Please visit NVIDIA Clara - NVIDIA Docs for detailed documentation

[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /input_data/DBS_LO_1_R1_001.fastq.gz and
/input_data/DBS_LO_1_R2_001.fastq.gz
[Parabricks Options Mesg]: @RG\tID:HTHKLDSX7.1\tLB:lib1\tPL:bar\tSM:DBS\tPU:HTHKLDSX7.1

[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Read group created for /input_data/DBS_LO_1_R1_001.fastq.gz and
/input_data/DBS_LO_1_R2_001.fastq.gz
[Parabricks Options Mesg]: @RG\tID:HTHKLDSX7.1\tLB:lib1\tPL:bar\tSM:DBS\tPU:HTHKLDSX7.1
[PB Info 2024-Sep-30 03:17:30] ------------------------------------------------------------------------------
[PB Info 2024-Sep-30 03:17:30] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2024-Sep-30 03:17:30] || Version 4.0.1-1 ||
[PB Info 2024-Sep-30 03:17:30] || GPU-BWA mem, Sorting Phase-I ||
[PB Info 2024-Sep-30 03:17:30] ------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[PB Info 2024-Sep-30 03:17:36] GPU-BWA mem
[PB Info 2024-Sep-30 03:17:36] ProgressMeter Reads Base Pairs Aligned
[PB Warning 2024-Sep-30 03:17:37][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/memoryManager.cu/18: invalid argument
[PB Warning 2024-Sep-30 03:17:37][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/memoryManager.cu/18: invalid argument
[PB Warning 2024-Sep-30 03:17:37][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/memoryManager.cu/18: invalid argument
[PB Warning 2024-Sep-30 03:17:37][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/memoryManager.cu/18: invalid argument
[PB Error 2024-Sep-30 03:17:37][ParaBricks/src/check_error.cu:44] No GPUs active, shutting down due to previous error., exiting.
For technical support visit Help - NVIDIA Docs
Exiting…

Could not run fq2bam as part of germline pipeline
Exiting pbrun …
[bgzip] No such file or directory: /raid/parabricks/u7815186/output//DBS.vcf
Please visit NVIDIA Clara - NVIDIA Docs for detailed documentation

[Parabricks Options Error]: Input file /output_data/DBS.bam not found. Exiting…
[Parabricks Options Error]: Run with -h to see help

Here is the nvidia-smi

Mon Sep 30 11:52:09 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100-SXM2-16GB On | 00000000:06:00.0 Off | 0 |
| N/A 38C P0 73W / 300W | 13038MiB / 16384MiB | 38% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 1 Tesla V100-SXM2-16GB On | 00000000:07:00.0 Off | 0 |
| N/A 41C P0 71W / 300W | 13038MiB / 16384MiB | 14% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 2 Tesla V100-SXM2-16GB On | 00000000:0A:00.0 Off | 0 |
| N/A 39C P0 71W / 300W | 13040MiB / 16384MiB | 30% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 3 Tesla V100-SXM2-16GB On | 00000000:0B:00.0 Off | 0 |
| N/A 36C P0 74W / 300W | 13038MiB / 16384MiB | 39% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 4 Tesla V100-SXM2-16GB On | 00000000:85:00.0 Off | 0 |
| N/A 31C P0 42W / 300W | 1MiB / 16384MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 5 Tesla V100-SXM2-16GB On | 00000000:86:00.0 Off | 0 |
| N/A 33C P0 41W / 300W | 1MiB / 16384MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 6 Tesla V100-SXM2-16GB On | 00000000:89:00.0 Off | 0 |
| N/A 33C P0 41W / 300W | 1MiB / 16384MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 7 Tesla V100-SXM2-16GB On | 00000000:8A:00.0 Off | 0 |
| N/A 31C P0 41W / 300W | 1MiB / 16384MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3685742 C …local/parabricks/binaries//bin/htvc 13026MiB |
| 1 N/A N/A 3685742 C …local/parabricks/binaries//bin/htvc 13026MiB |
| 2 N/A N/A 3685742 C …local/parabricks/binaries//bin/htvc 13026MiB |
| 3 N/A N/A 3685742 C …local/parabricks/binaries//bin/htvc 13026MiB |
±----------------------------------------------------------------------------------------+

I can run BWA and GATK separate. But it don’t work together in pipeline.

Hi @chw0905, can you also include the full command that you used to run this job?

Thank you

Thanks for your reply.

Here is the command

#$numArgs = $#ARGV + 1;
#ARGV[0]: *.conf

foreach $argnum (0 … $#ARGV)
{
my $name = $ARGV[$argnum];
}

#Loading configure file
open LIST, “< $ARGV[0]”;
while()
{
chomp;
@array_list = split(“=”,“$_”);
my $tag = $array_list[0];
my $arg = $array_list[1];
if($tag eq “fastq_path”){
$fastq_path = $arg;
}
if($tag eq “fastq_finished_path”){
$fastq_finished_path = $arg;
}
if($tag eq “fastq_mapping_fail_path”){
$fastq_mapping_fail_path = $arg;
}
if($tag eq “output_path”){
$output_path = $arg;
}
if($tag eq “reference_path”){
$reference_path = $arg;
}
if($tag eq “temp_dir_path”){
$temp_dir_path = $arg;
}
if($tag eq “reference_version”){
$reference_version = $arg;
}
if($tag eq “gVCF_output”){
$gVCF_output = $arg;
}
if($tag eq “delete_fastq_files”){
$delete_fastq = $arg;
}
}
close LIST;

#List all fastq into hash
chdir(“$fastq_path”);
@list = glob(“*.fastq.gz”);
for my $i(0…$#list){
@array_dir = split(“_”,“$list[$i]”);
my $sample_ID = $array_dir[0];
if (exists $hash_ID{$sample_ID}){
$ID_value = $hash_ID{$sample_ID};
$ID_value = $ID_value.links.$list[$i];
$hash_ID{$sample_ID} = $ID_value;
}else{
$hash_ID{$sample_ID} = $list[$i];
}
}

chdir(“$output_path”);

#Parabricks germline pipeline (GATK)
foreach $key(keys %hash_ID){
chdir(“$output_path”);
$ID_value = $hash_ID{$key};
@array_ID = split(“links”,“$ID_value”);
my $fq1 = $array_ID[0];
my $fq2 = $array_ID[1];
@array_name = split(“_”,“$fq1”);
my $sample_ID = $array_name[0];
system("docker run --gpus ‘"device=4,5,6,7"’ --rm --volume $fastq_path:/input_data --volume $reference_path:/ref_data --volume $output_path:/output_data --volume $temp_dir_path:/tmp_dir →
system(“bgzip -@ 16 $output_path/$sample_ID.vcf”);
if($gVCF_output eq “yes” ){
system("docker run --gpus ‘"device=4,5,6,7"’ --rm --volume $fastq_path:/input_data --volume $reference_path:/ref_data --volume $output_path:/output_data --volume $temp_dir_path:/tm>
}

    #Check bam and fastq.gz files exist or not
    if(-e "$output_path/$sample_ID.bam" || -e "$output_path/$sample_ID.vcf.gz"){
            if($delete_fastq eq "yes"){
                    system("rm $fastq_path/$fq1");
                    system("rm $fastq_path/$fq2");
                    }
            if($delete_fastq eq "no"){
                    #Create mapping_done folder
                    if(! -d "$fastq_finished_path/mapping_done"){
                            system("mkdir $fastq_finished_path/mapping_done");
                            }
                    system("mv $fastq_path/$fq1 $fastq_finished_path/mapping_done/");
                    system("mv $fastq_path/$fq2 $fastq_finished_path/mapping_done/");
                    }
            }else{
                    #Create mapping_failed folder
                    if(! -d "$fastq_mapping_fail_path/mapping_failed"){
                            system("mkdir $fastq_mapping_fail_path/mapping_failed");
                            }
                    system("mv $fastq_path/$fq1 $fastq_mapping_fail_path/mapping_failed/");
                    system("mv $fastq_path/$fq2 $fastq_mapping_fail_path/mapping_failed/");
                    }
    }

Hi @chw0905,

Do you have the pbrun command that ends up getting executed? This looks like mostly code to organize files, but I’m not seeing the actual run command.

Thank you

Thank you for your reply.

Here is the command

system(“docker run --gpus ‘"device=4,5,6,7"’ --rm --volume $fastq_path:/input_data --volume $reference_path:/ref_data --volume $output_path:/output_data --volume $temp_dir_path:/tmp_dir --workdir /output_data nvcr.io/nvidia/clara/clara-parabricks:4.0.1-1 pbrun haplotypecaller --ref /ref_data/$reference_version --in-bam /output_data/$sample_ID.bam --out-variants /output_data/$sample_ID.g.vcf.gz --gvcf --tmp-dir /tmp_dir”)

Thank you

Thank you. The run command looks fine enough to me, nothing too complicated going on. Looking back at your initial error, it looks like it’s having trouble detecting GPUs? I see you’re passing the GPUs flag into Docker. I would check the syntax. See if you can pass --gpus all for starters and if you get the same error. Otherwise, it might be a driver mismatch issue. Have you run Parabricks jobs on this machine before, and it’s only this time that’s giving you problems?

Thanks.

Thank you for your reply.

I tried changing the command --gpus all
Sometimes it can work.
Sometimes it can’t.
And sometimes stuck for a long time.

Here is the full log message

[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /input_data/AT010016_R1_001.fastq.gz and
/input_data/AT010016_R2_001.fastq.gz
[Parabricks Options Mesg]: @RG\tID:H73K7DSXC.1\tLB:lib1\tPL:bar\tSM:AT010016\tPU:H73K7DSXC.1

[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Read group created for /input_data/AT010016_R1_001.fastq.gz and
/input_data/AT010016_R2_001.fastq.gz
[Parabricks Options Mesg]: @RG\tID:H73K7DSXC.1\tLB:lib1\tPL:bar\tSM:AT010016\tPU:H73K7DSXC.1
[PB Info 2024-Oct-08 07:51:16] ------------------------------------------------------------------------------
[PB Info 2024-Oct-08 07:51:16] || Parabricks accelerated Genomics Pipeline ||
[PB Info 2024-Oct-08 07:51:16] || Version 4.0.1-1 ||
[PB Info 2024-Oct-08 07:51:16] || GPU-BWA mem, Sorting Phase-I ||
[PB Info 2024-Oct-08 07:51:16] ------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[PB Info 2024-Oct-08 07:51:23] GPU-BWA mem
[PB Info 2024-Oct-08 07:51:23] ProgressMeter Reads Base Pairs Aligned
[PB Warning 2024-Oct-08 07:51:37][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/memoryManager.cu/18: invalid argument
[PB Warning 2024-Oct-08 07:51:38][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/memoryManager.cu/18: invalid argument
[PB Warning 2024-Oct-08 07:51:38][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/memoryManager.cu/18: invalid argument
[PB Warning 2024-Oct-08 07:51:39][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/memoryManager.cu/18: invalid argument
[PB Warning 2024-Oct-08 07:51:39][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/memoryManager.cu/18: invalid argument
[PB Warning 2024-Oct-08 07:51:39][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/memoryManager.cu/18: invalid argument
[PB Warning 2024-Oct-08 07:51:40][ParaBricks/src/check_error.cu:41] cudaSafeCall() failed at ParaBricks/src/memoryManager.cu/18: invalid argument

I have run Parabricks jobs on this machine before (Version 4.0.1-1)
And never problems before until now.

Thank you very much.

Hi @chw0905,

This cudaSafeCall error implies that maybe the driver on the machine is below the Parabricks recommendation. Driver version 465+ for this version of Parabricks 4.0.1 that you are using. However, I can see from your previous nvidia-smi command that your driver is 560. Are you submitting these jobs to a cluster? Is it possible that not every node has updated GPU drivers? That could explain why sometimes you are getting it to run, and other times it is not running.

Thank you for your replay.

Yes, I’am submitting these jobs to a cluster.

Is any command can check the GPU driver updated or not?

Thank you very much.