Parabricks fq2bam CRAM output fails samtools index with CRAM slice offset does not match landmark

confrench33 · June 2, 2026, 8:52pm

Hi NVIDIA Parabricks team,

I’m running Parabricks fq2bam as part of an nf-core/sarek-derived workflow on AWS Batch GPU instances, and I hit a CRAM integrity/indexing issue with one sample. The Parabricks fq2bam task completed successfully with exit code 0, but the downstream samtools index step failed on the CRAM produced by Parabricks.

Environment

Parabricks container: nvcr.io/nvidia/clara/clara-parabricks:4.7.0-1
Tool: pbrun fq2bam
Downstream indexer: samtools 1.21
Workflow: nf-core/sarek-derived Nextflow workflow
Platform: AWS Batch / Seqera Platform
Instance shape: GPU AWS Batch environment, using 4 GPUs
Reference: GRCh38 / GATK bundle-style reference files

Parabricks command shape

The task used pbrun fq2bam with paired FASTQs, GRCh38 BWA index/reference, known sites for BQSR, and interval restriction. The relevant options were approximately:

pbrun fq2bam \
  --ref <GRCh38_BWA_index_prefix> \
  --in-fq <sample>_R1.fastq.gz <sample>_R2.fastq.gz \
  --out-bam <sample>.cram \
  --knownSites dbsnp_146.hg38.vcf.gz \
  --knownSites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
  --knownSites Homo_sapiens_assembly38.known_indels.vcf.gz \
  --out-recal-file <sample>.table \
  --interval-file wgs_calling_regions_noseconds.hg38.bed \
  --num-gpus 4 \
  --bwa-cpu-thread-pool 48 \
  --monitor-usage \
  --read-group-id-prefix <sample>.L1 \
  --read-group-sm <patient_sample> \
  --read-group-lb <sample> \
  --read-group-pl ILLUMINA \
  --bwa-options='-K 100000000 -Y' \
  --gpuwrite \
  --gpusort \
  --bwa-nstreams auto

The output extension was .cram, so Parabricks produced CRAM output.

Observed failure

Parabricks itself finished successfully, but downstream indexing failed:

samtools index -@ 0 22A0018864.cram

with:

[E::cram_index_container] CRAM slice offset 74642 does not match landmark 1 in container header (202490)
samtools index: failed to create index for "22A0018864.cram"

The failing CRAM was large, roughly 74,939,777,141 bytes. I was not able to download and inspect the full CRAM locally, but I did inspect the associated .crai and BQSR recalibration table. The BQSR table looked normal and showed that a large number of reads were processed, so this does not appear to be an obvious early task failure. The problem seems specific to the CRAM structure/indexability.

Why I suspect the CRAM output

The pipeline stage immediately upstream was Parabricks fq2bam, which completed successfully. The next stage was a standard samtools index of the Parabricks-produced CRAM. The failure message appears to be about inconsistent internal CRAM container/slice offsets rather than a missing file, truncated file, or reference mismatch.

Questions

Is pbrun fq2bam CRAM output expected to be fully compatible with samtools index from htslib/samtools 1.21?
Are there known issues in Parabricks 4.7.0-1 with CRAM output, especially when using --gpuwrite, --gpusort, and/or --bwa-nstreams auto?
Are there recommended settings for producing CRAM safely from fq2bam at this scale?
Would you recommend avoiding CRAM output from fq2bam and writing BAM directly, then converting/indexing with another tool if CRAM is required?
What additional diagnostics would be most useful if the full CRAM is too large to download? For example, would the .crai, .command.log, BQSR table, or selected byte ranges from the CRAM be useful?

I can provide the Parabricks .command.log, the .crai, the BQSR recalibration table, and exact command/configuration details if useful. I cannot easily share the full CRAM due to size and data restrictions.

Thanks for any guidance on whether this is a known issue or if there are recommended Parabricks settings to avoid generating non-indexable CRAM output.

tongz · June 2, 2026, 11:05pm

Hello,

Thanks for reaching out to us. Just want to confirm that we are able to reproduce this issue, and we believe it’s related to --gpuwriteflag. We will try to fix it in the next release. Before that you can try 2 things to bypass the issue:

Convert the output cram file to a bam/sam for downstream process, which is still valid.
Run fq2bam again without passing --gpuwrite.

Topic		Replies	Views
Can Parabricks generate CSI index when running fq2bam? Parabricks fq2bam	3	39	May 6, 2026
RNA to bam issue, dont know how to generate index Parabricks ai	3	619	June 12, 2024
"Could not run fq2bam" Is the only verbose output from Parabricks 4.4.0-1 and 4.3.2-1 on tutorial data Parabricks ai , demos-and-tutorials , fq2bam	15	514	March 3, 2025
Deepvariant fails with cram file, OK with bam Parabricks deepvariant	8	168	January 19, 2026
Fq2bam rel3-nanopore-wgs-288418386-FAB39088.fastq.gz Parabricks	4	1286	January 29, 2024
Fq2bam: Unexpected Issue #1, Return code: 2 Parabricks	4	907	February 17, 2021
Fq2bam [E::bwa_idx_load_from_disk] fail to locate the index files Parabricks ai	3	1567	July 7, 2023
Error running parabricks on gcp Parabricks ai	2	1371	June 4, 2024
Fq2bam stalls in single-end recovery mode Parabricks fq2bam	3	82	March 9, 2026
Could not run fq2bam when try align a sequence Parabricks ai	1	1834	December 10, 2023

Parabricks fq2bam CRAM output fails samtools index with CRAM slice offset does not match landmark

Related topics