Best NVIDIA platform for ONT MinION

cb4nvidia · July 16, 2021, 3:23pm

Dear All,

One of my researching colleagues at our university has already acquired a MinION sequencer from Oxford Nanopore Tech., and would like to apply for an NVIDIA research grant for a suitable GPU-based platform for base-calling of .fast5 files produced by the MinION as well as perhaps performing alignments on both MinION and MiSeq sequencing data.

I’d appreciate any constructive input on the suitability of the Clara AGX (or perhaps other NVIDIA) platform for the above purposes, thanks.

Best,
CB

cb4nvidia · July 16, 2021, 6:32pm

Update:

Initially, using ONT’s guppy_basecaller software we’d like to process a semi-weekly run consisting of ~400GB from the MinION in ~3 days or less.

Thanks.

emcquinn · July 16, 2021, 9:21pm

Hello!

Would you be able to describe what tools you use for alignment and any other processing steps you might be running (e.g. assembly, polishing, or variant analysis) and what organisms you are working with?

Additionally it’s possible with GPUs to perform basecalling in realtime with a MinION (and other ONT hardware) so you would not need to perform a batch based weekly run job and immediately have fasta files available. This reduces the amount of storage needed as well. Is that something of interest?

Thanks,
-Emmett

cb4nvidia · July 16, 2021, 9:48pm

Thanks @Emmett for your reply.

Re organisms, I can only say mammalian for now. Re other processing steps, I’m not sure my colleague is interested in assembly so much as the other types you’ve mentioned. Thus, any software recommendations would be appreciated.

Also, I suspect that the high accuracy requirement for running guppy_basecaller would prevent us from base calling in realtime, true?

Your thoughts?

emcquinn · July 16, 2021, 10:18pm

You should be able to run the high accuracy basecaller in realtime for a MinION on the majority of our GPUs. Based on external benchmarks in [0] a Jetson Xavier AGX should be fast enough for real time basecalling.

At the moment we release GPU accelerated alignment tools in Parabricks and GenomeWorks, but these are suited for x86 workstations so a traditional GPU (like a V100, A100, etc) would be the best. These libraries contain other common genomics pipeline tools accelerated for the GPU and are continuing to grow.

[0] a collection of my notes while working on nanopore basecalling on the Jetson Xavier · GitHub

-Emmett

cb4nvidia · July 19, 2021, 5:34pm

Thanks. So for running basecalling as well as tools downstream from basecalling, would you recommend, say, A100 over the Jetson Xavier AGX? If so, what features for the x86 workstation?

emcquinn · July 19, 2021, 5:56pm

There unfortunately isn’t a one size fits all for the variety of workloads that exist in this space.

A Jetson AGX would be able to basecall in realtime, which means you no longer need to store fast5 files, and can keep the significantly smaller fasta/fastq files around. This can have large bandwidth and storage savings.

Some secondary analysis steps run well on a Jetson AGX, such as running Kraken2 with one of our reference docker images, while others require 8 GPUs or even cluster setups. I’d suggest reaching out on the Parabricks forums for more information about secondary analysis pipelines using x86.

Hope this helps,
-Emmett

miles.benton · August 6, 2021, 7:02am

I’m going to weigh in here if I may. Sorry, I don’t frequent these boards much (at all), but I have a lot of experience with Oxford Nanopore sequencing and GPU compute. First a collection of resources that may be useful to you:

I have a document on selecting an appropriate GPU and compute set up for Nanopore data generation and analysis, I try to update this regularly: GPU musings (with an eye on genomics) - HackMD
I am also very interested in finding a “sweet spot” in terms of price vs performance for GPUs when being used for basecalling. Here is a document where I have started some benchmarking and will keep updating with additional GPUs and information: GPU price / performance comparisons for Nanopore basecalling - HackMD
If you are interested in what it looks like if you run windows I have a quick note on that (spoiler: use Linux!): Nanopore Guppy GPU basecalling on Windows using WSL2 - HackMD
For the last 2-3 years it has been a project of mine getting Nanopore sequencing and software running on Nvidia Jetson devices (ARM based devices in general). This has been highly successful, and I maintain a GitHub repo with notes and instructions: GitHub - sirselim/jetson_nanopore_sequencing: A place to collate notes and resources of our journey into porting nanopore sequencing over to accessible, portable technology.
If you are interested in using free GPUs in the cloud, we have a guide to do this using Google Colab: My notes on setting up basecalling on Google Colab · GitHub

OK, so that’s an overview of some hopefully useful bits and bobs. I would now like to comment on a few things. Firstly, you really want to hold on to your fast5 files, doing so allows you to return to them again and again as models improve accuracy and allow the deteciton of additional base modifications. Real-time basecalling while sequencing is awesome, and something like an RTX3060 can keep up easily in high accuracy mode (HAC), even in super high accuracy (SUP) with some tweaking. But you will always want to go back and basecall again after the fact. The Jetson Xavier AGX will not keep up with HAC calling in real-time (we’ve tried), but in FAST mode it’s great, you can actually run 2x MinIONs. The Clara AGX could easily keep up, but it’s not widely available yet. While the A10 is an amazing card, I would advise to spend that amount of money on other things. Something like a 308Ti or a 3090 is more than enough for 95% of people (probably 99.9% of people to be honest).

Sorry for the long post, happy to comment more if useful.

-Miles

system · April 13, 2022, 7:22pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.