Run Triton kernels on Jetson AGX Orin

Hi everyone. First of all, sorry for my bad English.

I am trying to run and install Triton in my Jetson AGX Orin but I faced these two errors:

  • Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation

And when I tried to install Triton :

  • Could not find a version that satisfies the requirement triton (from versions: none)

Anyone can help me?

Thanks in advance

Hi,

Could you share the error with us?
It looks like the kernel can be executed but just fallback to other supported operator?

You can find a Triton server for JetPack 5 below:

Thanks.

Hi AastaLLL. Thanks for the answer.

This is the error “/whisper/venv/lib/python3.8/site-packages/whisper/timing.py:42: UserWarning: Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation.”

Thanks in advance.

Hi,

How do you install the Triton package?
Are you using the package shared above?

Thanks.

Hi AasstaLLL, thanks again.

I did not install Triton because when I tried to install Triton using “pip install triton”:

  • Could not find a version that satisfies the requirement triton (from versions: none)

Also, I tried to install it using all the commands and methods suggested in the official documentation. Here is the link https://triton-lang.org/main/getting-started/installation.html

This is the error shown when I run “pip install -e .” using the “From source” method:

error: subprocess-exited-with-error

** × Getting requirements to build editable did not run successfully.**
** │ exit code: 1**
** ╰─> [28 lines of output]**
** Traceback (most recent call last):**
** File “/home/mauro/whisper/venv/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py”, line 353, in **
** main()**
** File “/home/mauro/whisper/venv/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py”, line 335, in main**
** json_out[‘return_val’] = hook(hook_input[‘kwargs’])
** File “/home/mauro/whisper/venv/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py”, line 132, in get_requires_for_build_editable**
** return hook(config_settings)**
** File “/tmp/pip-build-env-ffd_amvg/overlay/lib/python3.8/site-packages/setuptools/build_meta.py”, line 450, in get_requires_for_build_editable**
** return self.get_requires_for_build_wheel(config_settings)**
** File “/tmp/pip-build-env-ffd_amvg/overlay/lib/python3.8/site-packages/setuptools/build_meta.py”, line 341, in get_requires_for_build_wheel**
** return self._get_build_requires(config_settings, requirements=[‘wheel’])**
** File “/tmp/pip-build-env-ffd_amvg/overlay/lib/python3.8/site-packages/setuptools/build_meta.py”, line 323, in _get_build_requires**
** self.run_setup()**
** File “/tmp/pip-build-env-ffd_amvg/overlay/lib/python3.8/site-packages/setuptools/build_meta.py”, line 487, in run_setup**
** super(_BuildMetaLegacyBackend,**
** File “/tmp/pip-build-env-ffd_amvg/overlay/lib/python3.8/site-packages/setuptools/build_meta.py”, line 338, in run_setup**
** exec(code, locals())**
** File “”, line 237, in **
** File “”, line 121, in download_and_copy_ptxas**
** File “/usr/lib/python3.8/subprocess.py”, line 415, in check_output**
** return run(popenargs, stdout=PIPE, timeout=timeout, check=True,*
** File “/usr/lib/python3.8/subprocess.py”, line 493, in run**
** with Popen(popenargs, kwargs) as process:
** File “/usr/lib/python3.8/subprocess.py”, line 858, in init
*
** self._execute_child(args, executable, preexec_fn, close_fds,**
** File “/usr/lib/python3.8/subprocess.py”, line 1704, in _execute_child**
** raise child_exception_type(errno_num, err_msg, err_filename)**
** OSError: [Errno 8] Exec format error: ‘/home/mauro/whisper2/triton/python/triton/third_party/cuda/bin/ptxas’**
** [end of output]**

** note: This error originates from a subprocess, and is likely not a problem with pip.**
error: subprocess-exited-with-error

Hi,

Just want to clarify first.

/whisper/venv/lib/python3.8/site-packages/whisper/timing.py:42: UserWarning: Failed to launch Triton kernels, likely due to missing CUDA toolkit; falling back to a slower median kernel implementation.

Is the error above shown when you try to install the package?

Thanks.

Hi,

That error happens when I tried to run whisper with cli options “–word_timestamp True”.

Thanks

Hi,

So you try to run whisper without installing the Triton first?
Is Triton listed as a dependency of whisper?

We will check the Triton installation issue and update here later.
Thanks.

Hi AastaLLL,

Yes, I try to run Whisper without installing Triton first because Trito is a dependency when I install Whisper.

Here below you can find an installation example in Google Colab:

Collecting git+https://github.com/openai/whisper.git
Cloning GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision to /tmp/pip-req-build-d8pwz7zk
Running command git clone --filter=blob:none --quiet GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision /tmp/pip-req-build-d8pwz7zk
Resolved GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision to commit 248b6cb124225dd263bb9bd32d060b6517e067f8
Installing build dependencies … done
Getting requirements to build wheel … done
Preparing metadata (pyproject.toml) … done
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from openai-whisper==20230314) (2.0.0)

I think that could be a problem with python version. Jetson AGX Orin has version 3.8 by default.

Thank you, dude!

Have a nice day.

Hi,

Just want to confirm again.
Which JetPack version do you use? Is it JetPack 5.1.1?

Thanks.

Hi,

Yes, I am using JetPack 5.1.1.

Hi,

We have confirmed that the Triton server can work normally on Orin+JetPack 5.1.1.
Could you give it a try to see if it helps the whisper issue?

Install dependency

$ sudo apt-get update
$ sudo apt-get install -y --no-install-recommends \
            software-properties-common \
            autoconf \
            automake \
            build-essential \
            git \
            bc \
            g++-8 \
            gcc-8 \
            clang-8 \
            lld-8 \
            curl \
            jq \
            libb64-dev \
            libre2-dev \
            libssl-dev \
            libtool \
            libboost-dev \
            rapidjson-dev \
            patchelf \
            pkg-config \
            libopenblas-dev \
            libarchive-dev \
            zlib1g-dev \
            python3 \
            python3-dev \
            python3-pip \
            libb64-0d \
            libre2-5 \
            libssl1.1 \
            zlib1g
$ pip3 install --upgrade wheel setuptools cython
$ pip3 install --upgrade flake8 flatbuffers expecttest xmlrunner hypothesis aiohttp pyyaml scipy ninja typing_extensions protobuf grpcio-tools numpy attrdict pillow

Install PyTorch

$ pip3 install --upgrade https://developer.download.nvidia.com/compute/redist/jp/v51/pytorch/torch-2.0.0a0+8aa34602.nv23.03-cp38-cp38-linux_aarch64.whl

Install the Triton inference server

$ wget https://github.com/triton-inference-server/server/releases/download/v2.33.0/tritonserver2.33.0-jetpack5.1.tgz
$ sudo mkdir /opt/tritonserver
$ sudo tar zxvf tritonserver2.33.0-jetpack5.1.tgz -C /opt/tritonserver/

Download model

$ git clone --depth 1 https://github.com/triton-inference-server/server
$ mkdir model_repository ; cp -r server/docs/examples/model_repository/simple model_repository

Test

$ /opt/tritonserver/bin/tritonserver --model-repository=./model_repository --backend-directory=/opt/tritonserver/backends --backend-config=tensorflow,version=2
$ /opt/tritonserver/clients/bin/perf_analyzer -m simple
*** Measurement Settings ***
  Batch size: 1
  Service Kind: Triton
  Using "time_windows" mode for stabilization
  Measurement window: 5000 msec
  Using synchronous calls for inference
  Stabilizing using average latency

Request concurrency: 1
  Client: 
    Request count: 21281
    Throughput: 1181.81 infer/sec
    Avg latency: 844 usec (standard deviation 1163 usec)
    p50 latency: 827 usec
    p90 latency: 896 usec
    p95 latency: 933 usec
    p99 latency: 1024 usec
    Avg HTTP time: 836 usec (send/recv 114 usec + response wait 722 usec)
  Server: 
    Inference count: 21283
    Execution count: 21283
    Successful request count: 21283
    Avg request latency: 449 usec (overhead 53 usec + queue 37 usec + compute input 37 usec + compute infer 297 usec + compute output 24 usec)

Inferences/Second vs. Client Average Batch Latency
Concurrency: 1, throughput: 1181.81 infer/sec, latency 844 usec
nvidia@tegra-ubuntu:~/bug_4115026$ 

Thanks.

Hi! Thank you so much! I will try this solution when I flash the device again. The device comes with at least two errors:
- Wifi module can’t connect to any wifi signal.
- Eth connection fails suddenly.

After flashing the device I will test wifi and eth connection and then I will try your solution.

Thank you so much AastaLLL!

I think there is some confusion in this thread between Nvidia’s Triton Inference Server and OpenAI’s Triton - which is what the error message maurofirmani originally posted is from.

These are two completely separate things afaict.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.