Flash Attention in Jetson Orin

dwi.arpit · May 21, 2025, 5:00pm

Hello all,
I’m currently evaluating the deployment of a Transformer-based model (using [FlashAttention]) for real-time inference compute time on NVIDIA Jetson hardware.
Could someone from the community share:

Experiences running FlashAttention / HuggingFace Transformers on Jetson Orin?
Recommendations for model size limits or batch sizes for Orin NX 16GB vs AGX Orin?
Any caveats with FlashAttention’s Triton/CUDA kernels or memory usage in these embedded environments?

Thanks in advance!

AastaLLL · May 22, 2025, 3:55am

Hi,

You can find our sharing for different models below:

The prebuilt flash attention package for JetPack 6.2 can be found in the below link:
https://pypi.jetson-ai-lab.dev/jp6/cu126

Thanks.

dwi.arpit · May 22, 2025, 5:34am

Thank you very much for sharing this!

system · June 17, 2025, 6:34am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Flash Attention/Torch SDPA on Orin Nano Devleoper kit Jetson Orin Nano generative_ai	2	295	February 12, 2025
Installing flash-attn causes jetson orin to crash Jetson Orin Nano cuda , pytorch , python , jetson	5	636	December 25, 2024
Tensorflow running very slow on Nvidia Jetson AGX Orin Jetson AGX Orin tensorflow	3	164	March 4, 2025
What are Transformer and CNN model recommended sizes for deployment on edge device AGX Orin? Jetson AGX Orin jetson-inference	3	100	November 6, 2025
FAQ: Can llama3.2 vision LM be deployed in Jetson Orin Nx 16g Jetson AGX Orin jetson-inference , generative_ai	5	367	November 27, 2024
TensorRT-LLM on Jetson Orin NX(16GB) Jetson Orin NX tensorrt , jetson-inference , generative_ai	9	1275	February 12, 2025
The token speed of LLM on Jetson AGX Orin Jetson AGX Orin generative_ai , llm , llama , deepseek	5	749	October 22, 2025
TensorRT for Large Language Models Jetson AGX Orin	2	657	September 11, 2023
TensorRT-LLM for Jetson Jetson AGX Orin generative_ai	11	4111	July 7, 2025
Torch2trt on AGX orin flashed as Nano Jetson AGX Orin tensorrt , python	12	1210	March 21, 2023

Flash Attention in Jetson Orin

Related topics