Order of sorting and mark duplicates in fastq2bam

Hi, I have a question as to why sorting is performed before mark duplicates in fastq2bam. It seems like the standard is to perform mark duplicates first. I’ve heard that changing the order can effect compatibility with structural variant calling. Has this every been tested?

Thanks!

A follow up question…

fq2bam default behavior is to do alignment –> coordinate sort –> mark duplicates –> bqsr.

I would like to mark duplicates on a queryname sorted bam isntead of a coordinate sorted bam.

I know i can do each step separately

  1. fq2bam –align-only
  2. bamsort –sort-order queryname
  3. markdup ‑‑markdups‑assume‑sortorder‑queryname
    1. I believe this returns a coordinate sorted bam
  4. bqsr
  5. applybqsr

I would prefer to use just fq2bam.

What is the behavior of the following command

  1. fq2bam ‑‑markdups‑assume‑sortorder‑queryname

Does it mimic the set of steps above?

Thanks