Run Triton kernels on Jetson AGX Orin

maurofirmani · May 7, 2023, 1:45pm

Hi everyone. First of all, sorry for my bad English.

I am trying to run and install Triton in my Jetson AGX Orin but I faced these two errors:

Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation

And when I tried to install Triton :

Could not find a version that satisfies the requirement triton (from versions: none)

Anyone can help me?

Thanks in advance

AastaLLL · May 8, 2023, 2:45am

Hi,

Could you share the error with us?
It looks like the kernel can be executed but just fallback to other supported operator?

You can find a Triton server for JetPack 5 below:

github.com

triton-inference-server/server/blob/main/docs/user_guide/jetson.md

<!--
# Copyright 2021-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#  * Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
#  * Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in the
#    documentation and/or other materials provided with the distribution.
#  * Neither the name of NVIDIA CORPORATION nor the names of its
#    contributors may be used to endorse or promote products derived
#    from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,

This file has been truncated. show original

Thanks.

maurofirmani · May 8, 2023, 4:13pm

Hi AastaLLL. Thanks for the answer.

This is the error “/whisper/venv/lib/python3.8/site-packages/whisper/timing.py:42: UserWarning: Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation.”

Thanks in advance.

AastaLLL · May 9, 2023, 5:03am

Hi,

How do you install the Triton package?
Are you using the package shared above?

Thanks.

maurofirmani · May 9, 2023, 10:15am

Hi AasstaLLL, thanks again.

I did not install Triton because when I tried to install Triton using “pip install triton”:

Could not find a version that satisfies the requirement triton (from versions: none)

Also, I tried to install it using all the commands and methods suggested in the official documentation. Here is the link https://triton-lang.org/main/getting-started/installation.html

This is the error shown when I run “pip install -e .” using the “From source” method:

error: subprocess-exited-with-error

** × Getting requirements to build editable did not run successfully.**
** │ exit code: 1**
** ╰─> [28 lines of output]**
** Traceback (most recent call last):**
** File “/home/mauro/whisper/venv/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py”, line 353, in **
** main()**
** File “/home/mauro/whisper/venv/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py”, line 335, in main**
** json_out[‘return_val’] = hook(hook_input[‘kwargs’])
** File “/home/mauro/whisper/venv/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py”, line 132, in get_requires_for_build_editable**
** return hook(config_settings)**
** File “/tmp/pip-build-env-ffd_amvg/overlay/lib/python3.8/site-packages/setuptools/build_meta.py”, line 450, in get_requires_for_build_editable**
** return self.get_requires_for_build_wheel(config_settings)**
** File “/tmp/pip-build-env-ffd_amvg/overlay/lib/python3.8/site-packages/setuptools/build_meta.py”, line 341, in get_requires_for_build_wheel**
** return self._get_build_requires(config_settings, requirements=[‘wheel’])**
** File “/tmp/pip-build-env-ffd_amvg/overlay/lib/python3.8/site-packages/setuptools/build_meta.py”, line 323, in _get_build_requires**
** self.run_setup()**
** File “/tmp/pip-build-env-ffd_amvg/overlay/lib/python3.8/site-packages/setuptools/build_meta.py”, line 487, in run_setup**
** super(_BuildMetaLegacyBackend,**
** File “/tmp/pip-build-env-ffd_amvg/overlay/lib/python3.8/site-packages/setuptools/build_meta.py”, line 338, in run_setup**
** exec(code, locals())**
** File “”, line 237, in **
** File “”, line 121, in download_and_copy_ptxas**
** File “/usr/lib/python3.8/subprocess.py”, line 415, in check_output**
** return run(popenargs, stdout=PIPE, timeout=timeout, check=True,*
** File “/usr/lib/python3.8/subprocess.py”, line 493, in run**
** with Popen(popenargs, kwargs) as process:
** File “/usr/lib/python3.8/subprocess.py”, line 858, in init*
** self._execute_child(args, executable, preexec_fn, close_fds,**
** File “/usr/lib/python3.8/subprocess.py”, line 1704, in _execute_child**
** raise child_exception_type(errno_num, err_msg, err_filename)**
** OSError: [Errno 8] Exec format error: ‘/home/mauro/whisper2/triton/python/triton/third_party/cuda/bin/ptxas’**
** [end of output]**

** note: This error originates from a subprocess, and is likely not a problem with pip.**
error: subprocess-exited-with-error

AastaLLL · May 10, 2023, 4:27am

Hi,

Just want to clarify first.

/whisper/venv/lib/python3.8/site-packages/whisper/timing.py:42: UserWarning: Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation.

Is the error above shown when you try to install the package?

Thanks.

maurofirmani · May 10, 2023, 10:59am

Hi,

That error happens when I tried to run whisper with cli options “–word_timestamp True”.

Thanks

AastaLLL · May 11, 2023, 3:42am

Hi,

So you try to run whisper without installing the Triton first?
Is Triton listed as a dependency of whisper?

We will check the Triton installation issue and update here later.
Thanks.

maurofirmani · May 11, 2023, 10:27am

Hi AastaLLL,

Yes, I try to run Whisper without installing Triton first because Trito is a dependency when I install Whisper.

Here below you can find an installation example in Google Colab:

Collecting git+https://github.com/openai/whisper.git
Cloning GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision to /tmp/pip-req-build-d8pwz7zk
Running command git clone --filter=blob:none --quiet GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision /tmp/pip-req-build-d8pwz7zk
Resolved GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision to commit 248b6cb124225dd263bb9bd32d060b6517e067f8
Installing build dependencies … done
Getting requirements to build wheel … done
Preparing metadata (pyproject.toml) … done
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from openai-whisper==20230314) (2.0.0)

I think that could be a problem with python version. Jetson AGX Orin has version 3.8 by default.

Thank you, dude!

Have a nice day.

AastaLLL · May 12, 2023, 6:28am

Hi,

Just want to confirm again.
Which JetPack version do you use? Is it JetPack 5.1.1?

Thanks.

maurofirmani · May 12, 2023, 12:46pm

Hi,

Yes, I am using JetPack 5.1.1.

AastaLLL · May 17, 2023, 6:33am

Hi,

We have confirmed that the Triton server can work normally on Orin+JetPack 5.1.1.
Could you give it a try to see if it helps the whisper issue?

Install dependency

$ sudo apt-get update
$ sudo apt-get install -y --no-install-recommends \
            software-properties-common \
            autoconf \
            automake \
            build-essential \
            git \
            bc \
            g++-8 \
            gcc-8 \
            clang-8 \
            lld-8 \
            curl \
            jq \
            libb64-dev \
            libre2-dev \
            libssl-dev \
            libtool \
            libboost-dev \
            rapidjson-dev \
            patchelf \
            pkg-config \
            libopenblas-dev \
            libarchive-dev \
            zlib1g-dev \
            python3 \
            python3-dev \
            python3-pip \
            libb64-0d \
            libre2-5 \
            libssl1.1 \
            zlib1g
$ pip3 install --upgrade wheel setuptools cython
$ pip3 install --upgrade flake8 flatbuffers expecttest xmlrunner hypothesis aiohttp pyyaml scipy ninja typing_extensions protobuf grpcio-tools numpy attrdict pillow

Install PyTorch

$ pip3 install --upgrade https://developer.download.nvidia.com/compute/redist/jp/v51/pytorch/torch-2.0.0a0+8aa34602.nv23.03-cp38-cp38-linux_aarch64.whl

Install the Triton inference server

$ wget https://github.com/triton-inference-server/server/releases/download/v2.33.0/tritonserver2.33.0-jetpack5.1.tgz
$ sudo mkdir /opt/tritonserver
$ sudo tar zxvf tritonserver2.33.0-jetpack5.1.tgz -C /opt/tritonserver/

Download model

$ git clone --depth 1 https://github.com/triton-inference-server/server
$ mkdir model_repository ; cp -r server/docs/examples/model_repository/simple model_repository

Test

$ /opt/tritonserver/bin/tritonserver --model-repository=./model_repository --backend-directory=/opt/tritonserver/backends --backend-config=tensorflow,version=2
$ /opt/tritonserver/clients/bin/perf_analyzer -m simple

*** Measurement Settings ***
  Batch size: 1
  Service Kind: Triton
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
  Client: 
    Request count: 21281
    Throughput: 1181.81 infer/sec
    Avg latency: 844 usec (standard deviation 1163 usec)
    p50 latency: 827 usec
    p90 latency: 896 usec
    p95 latency: 933 usec
    p99 latency: 1024 usec
    Avg HTTP time: 836 usec (send/recv 114 usec + response wait 722 usec)
  Server: 
    Inference count: 21283
    Execution count: 21283
    Successful request count: 21283
    Avg request latency: 449 usec (overhead 53 usec + queue 37 usec + compute input 37 usec + compute infer 297 usec + compute output 24 usec)

Inferences/Second vs. Client Average Batch Latency
Concurrency: 1, throughput: 1181.81 infer/sec, latency 844 usec
nvidia@tegra-ubuntu:~/bug_4115026$

Thanks.

maurofirmani · May 17, 2023, 11:41am

Hi! Thank you so much! I will try this solution when I flash the device again. The device comes with at least two errors:
- Wifi module can’t connect to any wifi signal.
- Eth connection fails suddenly.

After flashing the device I will test wifi and eth connection and then I will try your solution.

Thank you so much AastaLLL!

paulerikf · May 25, 2023, 10:21pm

I think there is some confusion in this thread between Nvidia’s Triton Inference Server and OpenAI’s Triton - which is what the error message maurofirmani originally posted is from.

These are two completely separate things afaict.

system · June 14, 2023, 12:48am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Installing Triton Server on Lenovo SE70 with Xavier NX Jetson Xavier NX inference-server-triton	20	1039	April 22, 2024
Triton on Jetson Orin Jetson AGX Orin	12	2233	January 4, 2024
Triton inference on AGX orin Jetson AGX Orin jetson-inference	2	617	December 26, 2023
Jetson AGX Orin test JetPack componts CUDNN ERROR Jetson AGX Orin cuda	7	690	September 21, 2022
Jetson Orin Nano Developer Kit, Jetpack, Cuda, Tensorflow with GPU and TensorRT Jetson Orin Nano tensorflow	16	3664	September 28, 2023
How to install nvidia-tensorrt? Jetson AGX Orin tensorrt	7	12092	May 17, 2023
Triton Inference Server + vLLM Backend on the NVIDIA Jetson AGX Orin 64GB Developer Kit Jetson Projects generative_ai	9	602	June 16, 2025
What is the way to run ONNX models with triton-server for GPU inferencing with Jetson Nano Orin NX with 16GB with Jetpack 5.1.1? Jetson Orin Nano onnx	2	93	October 8, 2024
Cannot run yolo on jetson agx orin Jetson AGX Orin pytorch , cudnn	10	226	October 28, 2024
Deploying Triton Server with TensorRT-LLM on Jetson AGX Orin (JetPack 6.2) — Any Working Example? Jetson AGX Orin tensorrt , jetson-inference , inference-server-triton , generative_ai , llm	10	232	June 17, 2025

Run Triton kernels on Jetson AGX Orin

Install dependency

Install PyTorch

Install the Triton inference server

Download model

Test

Related topics