[Nvidia/Parabricks] got an error on running Marking Duplicates (with the official Parabricks samples)

Hi there,

I got an error on running Marking Duplicates.

Error happened when parsing args :-1 Exiting...

Input data

Use the official data from parabricks_sample.

parabricks_sample
├── Data
│   ├── sample_1.fq.gz
│   └── sample_2.fq.gz
└── Ref
    ├── Homo_sapiens_assembly38.dict
    ├── Homo_sapiens_assembly38.fasta
    ├── Homo_sapiens_assembly38.fasta.amb
    ├── Homo_sapiens_assembly38.fasta.ann
    ├── Homo_sapiens_assembly38.fasta.bwt
    ├── Homo_sapiens_assembly38.fasta.fai
    ├── Homo_sapiens_assembly38.fasta.pac
    ├── Homo_sapiens_assembly38.fasta.sa
    ├── Homo_sapiens_assembly38.known_indels.vcf.gz
    └── Homo_sapiens_assembly38.known_indels.vcf.gz.tbi

2 directories, 12 files

My command

Same as official instructions.

$ pbrun fq2bam \
    --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
    --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
    --out-bam output.bam

output log

$ pbrun fq2bam \--ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz --out-bam output.bam
Please visit https://www.nvidia.com/en-us/docs/parabricks/ for detailed documentation


Please visit https://www.nvidia.com/en-us/docs/parabricks/ for detailed documentation


[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for
/home/ubuntu/tj_tsai/parabricks/dataset/parabricks_sample/Data/sample_1.fq.gz and
/home/ubuntu/tj_tsai/parabricks/dataset/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v3.1.1                              ||
||                       GPU-BWA mem, Sorting Phase-I                       ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs

GPU-BWA mem
ProgressMeter	Reads		Base Pairs Aligned


WARNING
The system has 8 threads, however recommended number of threads with 2 GPU is 24.
The run might not finish or might have less than expected performance.
[06:44:44]	5043564		570000000
[06:45:11]	10087128	1160000000
[06:45:38]	15130692	1740000000
[06:46:06]	20174256	2310000000
[06:46:35]	25217820	2890000000
[06:47:03]	30261384	3490000000
[06:47:30]	35304948	4070000000
[06:47:58]	40348512	4630000000
[06:48:28]	45392076	5220000000
[06:48:56]	50435640	5790000000

GPU-BWA Mem time: 320.142383 seconds
GPU-BWA Mem is finished.

GPU Sorting, Marking Dups, BQSR
ProgressMeter	SAM Entries Completed

Total GPU-BWA Mem + Sorting + MarkingDups + BQSR Generation + BAM writing
Processing time: 320.147299 seconds

[main] CMD: PARABRICKS mem -Z ./pbOpts.txt /home/ubuntu/tj_tsai/parabricks/dataset/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /home/ubuntu/tj_tsai/parabricks/dataset/parabricks_sample/Data/sample_1.fq.gz /home/ubuntu/tj_tsai/parabricks/dataset/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[main] Real time: 331.599 sec; CPU: 2371.570 sec
------------------------------------------------------------------------------
||        Program:                      GPU-BWA mem, Sorting Phase-I        ||
||        Version:                                            v3.1.1        ||
||        Start Time:                       Fri Oct 30 06:43:48 2020        ||
||        End Time:                         Fri Oct 30 06:49:19 2020        ||
||        Total Time:                           5 minutes 31 seconds        ||
------------------------------------------------------------------------------
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v3.1.1                              ||
||                             Sorting Phase-II                             ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
progressMeter - Percentage
[06:49:21]	0.0	 0.00 GB
[06:49:31]	67.3	 0.21 GB
Sorting and Marking: 20.003 seconds
------------------------------------------------------------------------------
||        Program:                                  Sorting Phase-II        ||
||        Version:                                            v3.1.1        ||
||        Start Time:                       Fri Oct 30 06:49:21 2020        ||
||        End Time:                         Fri Oct 30 06:49:41 2020        ||
||        Total Time:                                     20 seconds        ||
------------------------------------------------------------------------------
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v3.1.1                              ||
||                         Marking Duplicates, BQSR                         ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
Error happened when parsing args :-1 Exiting...
Please contact Parabricks-Support@nvidia.com for any questions. Exiting...

Could not run fq2bam
Exiting pbrun ...

my GPU summary

Here is the GPU summary of the Driver and CUDA versions.

How can I fix the issue?

Hello,

we are sorry that you are facing this error.
Can you please try to run the same command adding --x3 to the command line and share the full log?

Also can you please give what’s your working directory?

Thank you
Myrieme

Here is the full log.

My command (with --x3)

Same as the official instruction, and plus --x3.

pbrun fq2bam \
    --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
    --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
    --out-bam output.bam \
    --x3

The output log (with --x3)

/mnt/parabricks/dataset$ pbrun fq2bam \
>     --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
>     --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
>     --out-bam output.bam \
>     --x3
Please visit https://www.nvidia.com/en-us/docs/parabricks/ for detailed documentation


nvidia-docker run -u=1000:1000 --rm -w=/mnt/parabricks/dataset --net=host -v /opt/parabricks:/INSTALL/ -v /mnt/parabricks/dataset/K16N4GO0:/mnt/parabricks/dataset/K16N4GO0 -v /mnt/parabricks/dataset:/mnt/parabricks/dataset -v /mnt/parabricks/dataset/parabricks_sample/Ref:/mnt/parabricks/dataset/parabricks_sample/Ref -v /mnt/parabricks/dataset/parabricks_sample/Data:/mnt/parabricks/dataset/parabricks_sample/Data parabricks/release:v3.1.1 fq2bam --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz --out-bam output.bam --x3 --tmp-dir /mnt/parabricks/dataset/K16N4GO0
Please visit https://www.nvidia.com/en-us/docs/parabricks/ for detailed documentation


[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /mnt/parabricks/dataset/parabricks_sample/Data/sample_1.fq.gz and
/mnt/parabricks/dataset/parabricks_sample/Data/sample_2.fq.gz
[Parabricks Options Mesg]: @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
g 2 b 0 B 2 P 4 s 1 r 0 o 2 m 1 z 4 f 2 v 0 M 2 name /mnt/parabricks/dataset/output.bam
/parabricks/bwa mem /mnt/parabricks/dataset/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /mnt/parabricks/dataset/parabricks_sample/Data/sample_1.fq.gz /mnt/parabricks/dataset/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 -Z ./pbOpts.txt
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v3.1.1                              ||
||                       GPU-BWA mem, Sorting Phase-I                       ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs


WARNING
The system has 8 threads, however recommended number of threads with 2 GPU is 24.
The run might not finish or might have less than expected performance.

GPU-BWA mem
ProgressMeter	Reads		Base Pairs Aligned
[06:17:26]	5043564		590000000
[06:17:52]	10087128	1160000000
[06:18:18]	15130692	1740000000
[06:18:42]	20174256	2330000000
[06:19:06]	25217820	2890000000
[06:19:30]	30261384	3480000000
[06:19:55]	35304948	4060000000
[06:20:22]	40348512	4650000000
[06:20:47]	45392076	5220000000
[06:21:11]	50435640	5800000000

GPU-BWA Mem time: 290.837261 seconds
GPU-BWA Mem is finished.

GPU Sorting, Marking Dups, BQSR
ProgressMeter	SAM Entries Completed

Total GPU-BWA Mem + Sorting + MarkingDups + BQSR Generation + BAM writing
Processing time: 290.840556 seconds

[main] CMD: PARABRICKS mem -Z ./pbOpts.txt /mnt/parabricks/dataset/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta /mnt/parabricks/dataset/parabricks_sample/Data/sample_1.fq.gz /mnt/parabricks/dataset/parabricks_sample/Data/sample_2.fq.gz @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1
[main] Real time: 300.960 sec; CPU: 2251.416 sec
------------------------------------------------------------------------------
||        Program:                      GPU-BWA mem, Sorting Phase-I        ||
||        Version:                                            v3.1.1        ||
||        Start Time:                       Tue Nov  3 06:16:34 2020        ||
||        End Time:                         Tue Nov  3 06:21:35 2020        ||
||        Total Time:                             5 minutes 1 second        ||
------------------------------------------------------------------------------
/parabricks/sort -sort_unmapped -ft 10 -gb 88
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v3.1.1                              ||
||                             Sorting Phase-II                             ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
progressMeter - Percentage
[06:21:37]	0.0	 0.00 GB
[06:21:47]	81.3	 0.38 GB
Sorting and Marking: 20.001 seconds
------------------------------------------------------------------------------
||        Program:                                  Sorting Phase-II        ||
||        Version:                                            v3.1.1        ||
||        Start Time:                       Tue Nov  3 06:21:37 2020        ||
||        End Time:                         Tue Nov  3 06:21:57 2020        ||
||        Total Time:                                     20 seconds        ||
------------------------------------------------------------------------------
/parabricks/postsort /mnt/parabricks/dataset/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta -o /mnt/parabricks/dataset/output.bam -sort_unmapped -ft 4 -wt 2 -zt -1 -bq 2 -gb 88
------------------------------------------------------------------------------
||                 Parabricks accelerated Genomics Pipeline                 ||
||                              Version v3.1.1                              ||
||                         Marking Duplicates, BQSR                         ||
||                  Contact: Parabricks-Support@nvidia.com                  ||
------------------------------------------------------------------------------
Error happened when parsing args :-1 Exiting...
Please contact Parabricks-Support@nvidia.com for any questions. Exiting...

Could not run fq2bam
Exiting pbrun ...



My environments

Working directory

/mnt/parabricks/dataset$ pwd
/mnt/parabricks/dataset

Input data

/mnt/parabricks/dataset$ tree parabricks_sample
parabricks_sample
├── Data
│   ├── sample_1.fq.gz
│   └── sample_2.fq.gz
└── Ref
    ├── Homo_sapiens_assembly38.dict
    ├── Homo_sapiens_assembly38.fasta
    ├── Homo_sapiens_assembly38.fasta.amb
    ├── Homo_sapiens_assembly38.fasta.ann
    ├── Homo_sapiens_assembly38.fasta.bwt
    ├── Homo_sapiens_assembly38.fasta.fai
    ├── Homo_sapiens_assembly38.fasta.pac
    ├── Homo_sapiens_assembly38.fasta.sa
    ├── Homo_sapiens_assembly38.known_indels.vcf.gz
    └── Homo_sapiens_assembly38.known_indels.vcf.gz.tbi

2 directories, 12 files

File system / disk space usage

/mnt/parabricks/dataset$ df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdb        2.0T   53G  1.9T   3% /mnt



Main differences

The differences between the logs without and with --x3 are:

nvidia-docker run \
    -u=1000:1000 \
    --rm \
    -w=/mnt/parabricks/dataset \
    --net=host \
    -v /opt/parabricks:/INSTALL/ \
    -v /mnt/parabricks/dataset/K16N4GO0:/mnt/parabricks/dataset/K16N4GO0 \
    -v /mnt/parabricks/dataset:/mnt/parabricks/dataset \
    -v /mnt/parabricks/dataset/parabricks_sample/Ref:/mnt/parabricks/dataset/parabricks_sample/Ref \
    -v /mnt/parabricks/dataset/parabricks_sample/Data:/mnt/parabricks/dataset/parabricks_sample/Data \
   parabricks/release:v3.1.1 \
   fq2bam \
    --ref parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
    --in-fq parabricks_sample/Data/sample_1.fq.gz parabricks_sample/Data/sample_2.fq.gz \
    --out-bam output.bam \
    --x3 \
    --tmp-dir /mnt/parabricks/dataset/K16N4GO0

g 2 b 0 B 2 P 4 s 1 r 0 o 2 m 1 z 4 f 2 v 0 M 2 name /mnt/parabricks/dataset/output.bam

/parabricks/bwa mem \
    /mnt/parabricks/dataset/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
    /mnt/parabricks/dataset/parabricks_sample/Data/sample_1.fq.gz \
    /mnt/parabricks/dataset/parabricks_sample/Data/sample_2.fq.gz \
    @RG\tID:HK3TJBCX2.1\tLB:lib1\tPL:bar\tSM:sample\tPU:HK3TJBCX2.1 \
    -Z ./pbOpts.txt

/parabricks/sort -sort_unmapped -ft 10 -gb 88

/parabricks/postsort \
    /mnt/parabricks/dataset/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \
    -o /mnt/parabricks/dataset/output.bam \
    -sort_unmapped \
    -ft 4 -wt 2 -zt -1 -bq 2 -gb 88

Hello Nvidia team,

I found that it also occurs in parabricks/release:v3.0.0.
Originally, It works in nvcr.io/hpc/parabricks:v2.5.0

Do you have any plan or schedule to fix the error or is there any guidelines to avoid the error?

Best regards,
TJ Tsai

Hello,

We are sorry but your system does not meet the requirement for Clara Parabricks Pipelines to run.
For two GPUs you need at least 24 CPU threads. This error is happening because you only have 8 CPU threads.

Regards,
Myrieme