We are pleased to announce the production release of JetPack 7.0. JetPack 7.0 is a major upgrade in the JetPack series, supporting the NVIDIA Jetson AGX Thor Developer Kit and the Thor based T5000 module. With JetPack 7, Jetson software aligns with the Server Base System Architecture (SBSA), positioning Jetson Thor alongside industry-standard ARM server design. JetPack 7.0 packages Jetson Linux 38.2 with Linux Kernel 6.8 and Ubuntu 24.04 LTS based root file system.
Support for NVIDIA Jetson AGX Thor Developer Kit and T5000 module.
SBSA architecture based design for Jetson Thor.
Kernel 6.8 and Ubuntu 24.04 based root file system.
Latest AI compute stack: CUDA 13, cuDNN 9.12, and TensorRT 10.13.
Support for AI serving frameworks (vLLM, SGLang,) with regular container releases on NGC and support for frameworks from Jetson AI Lab (MLC, LAMA CPP, OLAMA, Hugging Face Transformers).
CoE [CSI over Ethernet ] support using Holoscan Sensor Bridge. Out-of-the-box experience with Eagle Camera Sensor Module LI-VB1940.
CSI/GMSL is supported via Argus whereas CoE is supported via SIPL Camera API.
NVIDIA optimized preemptable realtime kernel.
You can install JetPack 7.0 on your Jetson AGX Thor Developer Kit with any of the methods below
ISO Image: You can download the JetPack 7.0 ISO image from JetPack 7.0 Downloads and Notes page and use Balena Etcher to prepare the installer USB drive. Please follow the instructions in the User Guide on flashing the Developer Kit with JetPack 7.0.
SDK Manager: You can do a fresh install of JetPack 7.0 using SDK Manager.
Manual Flashing: If you prefer to install using the command line, you can flash Jetson device from a linux host by following steps here.
Once Jetson Linux is flashed, you can install the compute stack using:
SDK Manager (using linux host) or
By running “sudo apt update” followed by “sudo apt install nvidia-jetpack” on your Jetson.
Important Notes
The manual flashing instructions have slightly changed because of Thor leveraging SBSA architecture. Please follow the manual flashing instructions carefully;
If you are re-installing JetPack 7.0 using ISO on an already installed system, please carefully follow the instructions in the Getting Started Guide.
Containers
Note: Since Jetson Thor is based on the SBSA stack, the “-ipgu” container tag is no longer required when running on Thor.
I am quite curious about the USB thumb drive load. Does this mean the new Thor essentially has a BIOS that can load and flash from a bootable flash media? Or is this installation method itself specific to the one media? It would be quite a game changer if one could even attempt to flash from media instead of just using JetPack/SDKM.
The doc above says to use Balena Etcher to burn the ISO image to usb stick.
I just used Ubuntu “Startup Disk Creator” aka /usr/bin/usb-creator-gtk
and it appears to have and said it did burn the ISO image to usb stick.
I’ll test it in a couple of days once I’ve received my Thor; but browsing it now it certainly looks fine.
Also the Jetson_Linux_Release_Notes_r38.2.pdf linked above briefly discusses USB drive flashing; it has a section about steps needed after first flash to re-enable the ability of Thor Developer Kit to re-use to do subsequent flashes from the usb stick.
AND
If you do what I did and download Jetson_Linux_R38.2.0_aarch64.tbz2 and
Tegra_Linux_Sample-Root-Filesystem_R38.2.0_aarch64.tbz2 and tar x them. Then
sudo ./apply_binaries.sh –openrm
Jetson Thor supports the OpenRM driver architecture. When using the apply_binaries.sh script, you must add the option --openrm, as follows:
$ sudo ./apply_binaries.sh --openrm
Does this relate to Thor or Orin? I tried installing nvcr.io/nvidia/tritonserver:25.08-vllm-python-py3 on Thor but when the server started up automatically I saw these messages (I don’t know if they’re indicating the server can’t communicate with Thor’s GPU):
W0909 03:02:41.329271 1 metrics.cc:643] “Unable to get power limit for GPU 0. Status:Success, value:0.000000”
W0909 03:02:41.329302 1 metrics.cc:661] “Unable to get power usage for GPU 0. Status:Success, value:0.000000”
W0909 03:02:41.329307 1 metrics.cc:685] “Unable to get energy consumption for GPU 0. Status:Success, value:0”
W0909 03:02:41.329310 1 metrics.cc:724] “Unable to get memory usage for GPU 0. Memory usage status:Success, value:0. Memory total status:Success, value:0”
Also, there were no models, so I tried to install the model_repository. The instructions were not clear and when I tried running the scripts I started getting errors. The first was:
File “/models/tf2onnx/lib/python3.12/site-packages/tf2onnx/utils.py”, line 46, in
onnx_pb.TensorProto.BOOL: np.bool,
^^^^^^^
File “/models/tf2onnx/lib/python3.12/site-packages/numpy/_init_.py”, line 324, in _getattr_
raise AttributeError(\__former_attrs_\_\[attr\])
AttributeError: module ‘numpy’ has no attribute ‘bool’.
`np.bool` was a deprecated alias for the builtin `bool`. To avoid this error in existing code, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
I made that change in utils.py but got more errors if different places and finally stopped fixing them.
This container is for Thor. When you see errors while starting server, do you mean while launching the docker container? We used following command to start the container without any issues:
Your command has –ipc=host in there twice (probably not a problem) but the “http://” needed to be deleted because I got an “invalid reference format error”. When I took out the “http://” the container ran and I’m in it, but if I try to start the server
bin/tritonserver
I get the message:
I0909 04:16:02.020257 170 server.cc:309] "No server context available. Exiting immediately."
error: creating server: Invalid argument - --model-repository must be specified
And your command does not have a -v argument, so it is not connected to the filesystem outside the container. I would do that but I was unable to create the model_repository as I mentioned in my previous message.
Did you leave out some steps you followed to get the tritonserver running?
OK, thanks for your quick response. I will say that I have been enjoying the Thor so far and I got both (1) llama.cpp (compiled myself and running in Nvidia’s PyTorch container) and (2) the nvcr.io Ollama container running (although Ollama seems to have memory limitations and puts larger models on the CPU).
Actually I would like to be using the vLLM server. Can you tell me how I would start it from within the container? Also, where would the models go for vLLM to serve?
Well, I sort of answered my own question – I ran
vllm
and it looks like I can point it at some hugging face models. When I just ran vllm with no arguments the server started up without error and is running the facebook/opt-125m server (must be the default).
Would you be able to tell me how to assign a volume with -v to hold the hugging face models and have them work with the vllm inside the container? I assume you’ve done that for your own system.
Thanks again for your help.
Once again, I think I figured it out. The models go in the .cache directory and I can download them in the container with
hf download <model>
So I think I can take it from here. Sorry to bother you but perhaps this info will help somebody else.
INFO 09-17 12:37:16 [default_loader.py:272] Loading weights took 2.26 seconds
ERROR 09-17 12:37:17 [core.py:586] EngineCore failed to start.
ERROR 09-17 12:37:17 [core.py:586] Traceback (most recent call last):
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 577, in run_engine_core
ERROR 09-17 12:37:17 [core.py:586] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 09-17 12:37:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 404, in init
ERROR 09-17 12:37:17 [core.py:586] super().init(vllm_config, executor_class, log_stats,
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 75, in init
ERROR 09-17 12:37:17 [core.py:586] self.model_executor = executor_class(vllm_config)
ERROR 09-17 12:37:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py”, line 53, in init
ERROR 09-17 12:37:17 [core.py:586] self._init_executor()
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py”, line 48, in _init_executor
ERROR 09-17 12:37:17 [core.py:586] self.collective_rpc(“load_model”)
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py”, line 57, in collective_rpc
ERROR 09-17 12:37:17 [core.py:586] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 09-17 12:37:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/utils/init.py”, line 2736, in run_method
ERROR 09-17 12:37:17 [core.py:586] return func(*args, **kwargs)
ERROR 09-17 12:37:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py”, line 185, in load_model
ERROR 09-17 12:37:17 [core.py:586] self.model_runner.load_model()
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 1776, in load_model
ERROR 09-17 12:37:17 [core.py:586] self.model = model_loader.load_model(
ERROR 09-17 12:37:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py”, line 42, in load_model
ERROR 09-17 12:37:17 [core.py:586] process_weights_after_loading(model, model_config, target_device)
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py”, line 113, in process_weights_after_loading
ERROR 09-17 12:37:17 [core.py:586] quant_method.process_weights_after_loading(module)
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py”, line 614, in process_weights_after_loading
ERROR 09-17 12:37:17 [core.py:586] layer.scheme.process_weights_after_loading(layer)
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py”, line 197, in process_weights_after_loading
ERROR 09-17 12:37:17 [core.py:586] self.kernel.process_weights_after_loading(layer)
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/kernels/mixed_precision/machete.py”, line 105, in process_weights_after_loading
ERROR 09-17 12:37:17 [core.py:586] self._transform_param(layer, self.w_s_name, transform_w_s)
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/kernels/mixed_precision/MPLinearKernel.py”, line 71, in _transform_param
ERROR 09-17 12:37:17 [core.py:586] new_param = fn(old_param)
ERROR 09-17 12:37:17 [core.py:586] ^^^^^^^^^^^^^
ERROR 09-17 12:37:17 [core.py:586] File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/kernels/mixed_precision/machete.py”, line 89, in transform_w_s
ERROR 09-17 12:37:17 [core.py:586] x.data = x.data.contiguous()
ERROR 09-17 12:37:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^
ERROR 09-17 12:37:17 [core.py:586] torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device
ERROR 09-17 12:37:17 [core.py:586] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 09-17 12:37:17 [core.py:586] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 09-17 12:37:17 [core.py:586] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
ERROR 09-17 12:37:17 [core.py:586]
Process EngineCore_0:
Traceback (most recent call last):
File “/usr/lib/python3.12/multiprocessing/process.py”, line 314, in _bootstrap
self.run()
File “/usr/lib/python3.12/multiprocessing/process.py”, line 108, in run
self._target(*self._args, **self._kwargs)
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 590, in run_engine_core
raise e
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 577, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 404, in init
super().init(vllm_config, executor_class, log_stats,
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py”, line 75, in init
self.model_executor = executor_class(vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py”, line 53, in init
self._init_executor()
File “/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py”, line 48, in _init_executor
self.collective_rpc(“load_model”)
File “/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py”, line 57, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/utils/init.py”, line 2736, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py”, line 185, in load_model
self.model_runner.load_model()
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py”, line 1776, in load_model
self.model = model_loader.load_model(
^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py”, line 42, in load_model
process_weights_after_loading(model, model_config, target_device)
File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/utils.py”, line 113, in process_weights_after_loading
quant_method.process_weights_after_loading(module)
File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py”, line 614, in process_weights_after_loading
layer.scheme.process_weights_after_loading(layer)
File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py”, line 197, in process_weights_after_loading
self.kernel.process_weights_after_loading(layer)
File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/kernels/mixed_precision/machete.py”, line 105, in process_weights_after_loading
self._transform_param(layer, self.w_s_name, transform_w_s)
File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/kernels/mixed_precision/MPLinearKernel.py”, line 71, in _transform_param
new_param = fn(old_param)
^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/kernels/mixed_precision/machete.py”, line 89, in transform_w_s
x.data = x.data.contiguous()
^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
[rank0]:[W917 12:37:17.756065815 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see Distributed communication package - torch.distributed — PyTorch 2.8 documentation (function operator())
Traceback (most recent call last):
File “/usr/local/bin/vllm”, line 8, in
sys.exit(main())
^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py”, line 65, in main
args.dispatch_function(args)
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py”, line 55, in cmd
uvloop.run(run_server(args))
File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/runners.py”, line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File “/usr/lib/python3.12/asyncio/runners.py”, line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “uvloop/loop.pyx”, line 1518, in uvloop.loop.Loop.run_until_complete
File “/usr/local/lib/python3.12/dist-packages/uvloop/init.py”, line 61, in wrapper
return await main
^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1431, in run_server
await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 1451, in run_server_worker
async with build_async_engine_client(args, client_config) as engine_client:
File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 158, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File “/usr/lib/python3.12/contextlib.py”, line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py”, line 194, in build_async_engine_client_from_engine_args
async_llm = AsyncLLM.from_vllm_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 162, in from_vllm_config
return cls(
^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py”, line 124, in init
self.engine_core = EngineCoreClient.make_async_mp_client(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 96, in make_async_mp_client
return AsyncMPClient(*client_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 666, in init
super().init(
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py”, line 403, in init
with launch_core_engines(vllm_config, executor_class,
File “/usr/lib/python3.12/contextlib.py”, line 144, in exit
next(self.gen)
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 434, in launch_core_engines
wait_for_engine_startup(
File “/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py”, line 484, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}