CUDA error while loading fatbin [Guppy]

Hi,

I am new to this platform so any help would be appreciated. I am working on a Xavier AGX with Jetpack 4.4. CUDA 10.2 seems to be installed properly (DeviceQuery shows no issues).

I am having an issue to launch a basecalling software called Guppy built by Oxford Nanopore through the GPU. It is working fine when run through CPU, but obviously way too slow for our needs.

The error I get when I try to direct Guppy to the GPU is the following:
[guppy/info] CUDA device 0 (compute 7.2) initialised, memory limit 33469227008B (31323033600B free)

[guppy/error] Loading fatbin file shared.fatbin failed with: CUDA error at /builds/ofan/ont_core_cpp/ont_core/common/cuda_common.cpp:48: CUDA_ERROR_NO_BINARY_FOR_GPU

[guppy/warning] Common::CUDAModule::Load: Failed to load shared(shared) from fatbin

[guppy/error] main: Could not open CUDA kernel file: shared.cu
[guppy/warning] main: An error occurred in the basecaller. Aborting.

The Guppy CUDA kernels are compiled for compute version 6 and higher. I understand that the Xavier is built on 4.9 kernel version. So I am guessing it is kernel issue.

Could anyone direct me on how to manually setup/modify the kernel in these situations?

Thanks

Hi,

It looks like the error indicate the GPU kernel rather than Linux kernel.

Based on follow error, the library may not enabled the GPU support (or incorrect GPU architecture).

... cuda_common.cpp:48:  CUDA_ERROR_NO_BINARY_FOR_GPU

In general, it’s required to build library from source for the Jetson platform.
Or please make sure your library has built with GPU architecture sm=72 support first.

Thanks.

Hi, thanks for the reply. First of all, I am quite a newbie, so any specific direction will be appreciated.

You are telling me to make sure my library has built sm=72 support. How am I supposed to do that?
Running system monitoring (like jtop) clearly points the CUDA arch is 7.2 and the environment paths seem to be setup correctly.
Please let me know if I should post you any additional information.
Thanks

Hi again,
just an update I got from the Nanopore community. Apparently, this software was compiled for ARM64 on an Ubuntu 16.04 and that is what doesn’t make it work properly. Would this make sense?
Unfortunately the source code is not available so would there be a way to create a workaround this situation? (Docker?)

Hi,

Sorry for the late update.
May I know how do you install the library?
Do you build it from source or install the prebuilt package from apt-get or pip?

If you built it from source, please check if you use the sm_72 architecture in the CMakeLists.txt.
It should look like this:

If there is no cue with above information, would you mind to share the steps to reproduce this issue?

Thanks.

Here is the link (https://drive.google.com/file/d/1Lu84n205E3bdgt4pvNGlWR1xvgS3sWZV/view?usp=sharing) for guppy program.

Hi,

Have you fixed this issue?

We try to check the link but limit from the authority.
If the issue still goes on, would you mind to enable the permission for us?

Thanks.

Hi,
ONT fixed the issue in a new version of Guppy (4.0.14).
From what I understood the issue was that it was built under Ubuntu 16.04 for the TX2.
The results are impressive on a test run (about 9x faster than CPU).
I will upload some benchmarking when I do a proper sequencing run.

Good to know this!
Thanks for the update.