Best NVIDIA platform for ONT MinION

Dear All,

One of my researching colleagues at our university has already acquired a MinION sequencer from Oxford Nanopore Tech., and would like to apply for an NVIDIA research grant for a suitable GPU-based platform for base-calling of .fast5 files produced by the MinION as well as perhaps performing alignments on both MinION and MiSeq sequencing data.

I’d appreciate any constructive input on the suitability of the Clara AGX (or perhaps other NVIDIA) platform for the above purposes, thanks.



Initially, using ONT’s guppy_basecaller software we’d like to process a semi-weekly run consisting of ~400GB from the MinION in ~3 days or less.



Would you be able to describe what tools you use for alignment and any other processing steps you might be running (e.g. assembly, polishing, or variant analysis) and what organisms you are working with?

Additionally it’s possible with GPUs to perform basecalling in realtime with a MinION (and other ONT hardware) so you would not need to perform a batch based weekly run job and immediately have fasta files available. This reduces the amount of storage needed as well. Is that something of interest?


Thanks @Emmett for your reply.

Re organisms, I can only say mammalian for now. Re other processing steps, I’m not sure my colleague is interested in assembly so much as the other types you’ve mentioned. Thus, any software recommendations would be appreciated.

Also, I suspect that the high accuracy requirement for running guppy_basecaller would prevent us from base calling in realtime, true?

Your thoughts?

You should be able to run the high accuracy basecaller in realtime for a MinION on the majority of our GPUs. Based on external benchmarks in [0] a Jetson Xavier AGX should be fast enough for real time basecalling.

At the moment we release GPU accelerated alignment tools in Parabricks and GenomeWorks, but these are suited for x86 workstations so a traditional GPU (like a V100, A100, etc) would be the best. These libraries contain other common genomics pipeline tools accelerated for the GPU and are continuing to grow.



Thanks. So for running basecalling as well as tools downstream from basecalling, would you recommend, say, A100 over the Jetson Xavier AGX? If so, what features for the x86 workstation?

There unfortunately isn’t a one size fits all for the variety of workloads that exist in this space.

A Jetson AGX would be able to basecall in realtime, which means you no longer need to store fast5 files, and can keep the significantly smaller fasta/fastq files around. This can have large bandwidth and storage savings.

Some secondary analysis steps run well on a Jetson AGX, such as running Kraken2 with one of our reference docker images, while others require 8 GPUs or even cluster setups. I’d suggest reaching out on the Parabricks forums for more information about secondary analysis pipelines using x86.

Hope this helps,