Hello,
I am a bit concerned about the usage of deepvariant as it exists in 4.6.0. Whether I call pangenome_aware_deepvariant or deepvariant both seem to actually call deepsomatic.
Deepsomatic has a different training model which is based on cancer tumor tissue, and it is not intended for non-model species. Deepvariant on the otherhand is ok for this purpose.
singularity exec --nv -B $PWD:$PWD parabricks_4.6.0-1.sif pbrun deepvariant --ref ref.fa --in-bam giraffe_out/sample_1.bam --gvcf --out-variants deepVar_out/sample_1_deepvar.g.vcf
[Parabricks Options Mesg]: Setting --num-streams-per-gpu based on available device memory.
Detected 2 CUDA Capable device(s), considering 2 device(s)
CUDA Driver Version / Runtime Version 13.0 / 12.9
Using model for CUDA Capability Major/Minor version number: 89
/usr/local/parabricks/binaries/bin/deepsomatic 2 4 --ref ref.fa --reads sample_1.bam -o sample_1_deepvar.g.vcf -n 6 --model /usr/local/parabricks/binaries/model/80+/shortread/deepvariant.eng -g --channel_insert_size --pileup_image_width 221 --max_reads_per_partition 1500 --partition_size 1000 --vsc_min_count_snps 2 --vsc_min_count_indels 2 --vsc_min_fraction_snps 0.12 --min_mapping_quality 5 --min_base_quality 10 --alt_aligned_pileup none --variant_caller VERY_SENSITIVE_CALLER --dbg_min_base_quality 15 --ws_min_windows_distance 80 --aux_fields_to_keep HP --p_error 0.001 --max_ins_size 1
and pangenome_aware_deepvariant is the same:
singularity exec --nv -B $PWD:$PWD parabricks_4.6.0-1.sif pbrun pangenome_aware_deepvariant --pangenome pangenome.gbz --ref ref.fa --in-bam sample_1.bam –ref-name myPG --out-variants sample_1_deepvar.vcf
Please visit
for detailed documentation
[Parabricks Options Mesg]: Setting --num-streams-per-gpu based on available device memory.
Detected 2 CUDA Capable device(s), considering 2 device(s)
CUDA Driver Version / Runtime Version 13.0 / 12.9
Using model for CUDA Capability Major/Minor version number: 89
/usr/local/parabricks/binaries/bin/deepsomatic 2 4 --ref ref.fa --reads sample_1.bam -o deepVar_out/sample_1_deepvar.vcf -n 6 --pangenome pangenome.gbz --model /usr/local/parabricks/binaries/model/80+/shortread/deepvariant_pangenome_aware.eng -long_reads --channel_insert_size --keep_legacy_allele_counter_behavior --keep_only_window_spanning_haplotypes --keep_supplementary_alignments --min_mapping_quality 0 --normalize_reads --sort_by_haplotypes --parse_sam_aux_fields
The pangenome deep variant version is not only invoking deepsomatic, but it is also setting the mode for –long-read despite in the default parameters saying it is short-read based.
Not only this but there is an issue in that the version is not reported. I would like to report the version of deepvariant as well as the parabricks version in a subsequent publication. Right now it is unclear.
Looking for clarity on these discrepancies before proceeding with actual datasets. Is there a reason why deepsomatic is being called instead of deepvariant?
Thanks,
Sam