following the documentation for rna_fq2bam: rna_fq2bam - NVIDIA Docs
couldn’t locate equivalent parameter to
–outSAMtype
although the documentation does suggest following commands are equivalent:
$ docker run --rm --gpus all --volume <INPUT_DIR>:/workdir --volume <OUTPUT_DIR>:/outputdir
-w /workdir \
nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \
pbrun rna_fq2bam \
--in-fq /workdir/${INPUT_FASTQ_1} /workdir/${INPUT_FASTQ_2} \
--genome-lib-dir /workdir/${PATH_TO_GENOME_LIBRARY}/ \
--output-dir /outputdir/${PATH_TO_OUTPUT_DIRECTORY} \
--ref /workdir/${REFERENCE_FILE} \
--out-bam /outputdir/${OUTPUT_BAM} \
--read-files-command zcat
# STAR Alignment
$ ./STAR \
--genomeDir <INPUT_DIR>/${PATH_TO_GENOME_LIBRARY} \
--readFilesIn <INPUT_DIR>/${INPUT_FASTQ_1} <INPUT_DIR>/${INPUT_FASTQ_2} \
--outFileNamePrefix <OUTPUT_DIR>/${PATH_TO_OUTPUT_DIRECTORY}/ \
--outSAMtype BAM SortedByCoordinate \
--readFilesCommand zcat
so I assume these parameters are set to be identical in STAR and rna-fq2bam calls on paired-end 101bp FASTQ. I do see 8-9x speedup running these versions on the cluster with 4 GPU node 120GB RAM. (45mins with STAR on CPU vs 5-7min with rna_fq2bam on GPU)
However, using the parameters in our pipeline both GPU and CPU took 1.5hrs (longer because FASTQ is 150bp ) to process the same dataset. Would anyone please help out with what might cause this and how to address this issue. I need to keep the same parameters (with --sjdb-overhang 149 and --two-pass-mode Basic) to match the pipeline. Please see the commands run below.
singularity exec --nv ${workdir}${SINGULARITY} /bin/bash -c " nvidia-smi;
pbrun rna_fq2bam --num-threads ${NUM_THREADS} \
--max-bam-sort-memory ${MAX_BAM_SORT_MEMORY} \
\
--two-pass-mode ${TWO_PASS_MODE} \
--genome-lib-dir ${PATH_TO_GENOME_LIBRARY} \
--ref ${REFERENCE_FILE} \
--output-dir ${PATH_TO_OUTPUT_DIRECTORY} \
--out-bam ${PATH_TO_OUTPUT_DIRECTORY}${OUTPUT_BAM} \
\
--in-fq ${INPUT_FASTQ_1} /${INPUT_FASTQ_2} \
--out-sam-unmapped Within \
--out-sam-attributes Standard \
--read-files-command zcat \
--sjdb-overhang 149"
with STAR
alias STAR='~/Project/RNAseq/bulk_RNAseq_gpu/STAR_2_7_2a/STAR-2.7.2a/bin/Linux_x86_64/STAR'
STAR \
--runThreadN ${NUM_THREADS} \
--limitBAMsortRAM ${MAX_BAM_SORT_MEMORY} \
--runMode alignReads \
--twopassMode ${TWO_PASS_MODE} \
--genomeDir ${PATH_TO_GENOME_LIBRARY} \
\
--outFileNamePrefix ${aligned_dir}${OUTPUT_BAM_CPU_prefix} \
\
--outSAMtype BAM SortedByCoordinate \
--readFilesIn ${INPUT_FASTQ_1} ${INPUT_FASTQ_2} \
--outSAMunmapped Within \
--outSAMattributes Standard \
--readFilesCommand zcat
Notice the STAR version is chosen to match the documetation. I am happy to post complete logs with details.