Best Practices for Running LLMs on Drive Orin: NIM Framework vs Generic Approach

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.6

Target Operating System
Linux
QNX

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-100)

SDK Manager Version
2.1.0

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager

Issue Description

1. LLM Deployment Approach

I would like to understand the recommended method for running LLMs on Drive Orin:

  • Option A: Using the NIM (Nvidia Inference Microservices) framework
  • Option B: Taking a generic approach to run LLMs directly

2. Host System GPU Requirements

  • Is a GPU required on the host system for development and deployment?
  • Can we run the code directly on the SoC without a host GPU?
  • What is the recommended development workflow (cross-compilation vs native)?

3. AI Agent Framework Selection

  • Which framework is best suited for AI Agent development and deployment on Drive Orin SoC?
  • Are there Nvidia-recommended frameworks optimized for Drive Orin’s architecture?
  • What frameworks provide the best balance of performance, ease of deployment, and resource efficiency?

Dear @lohith.r ,
We have DRIVE OS LLM SDK available in DRIVE OS 7.x releases for DRIVE Thor. Please see https://developer.nvidia.com/docs/drive/drive-os/7.0.3/public/drive-os-linux-sdk/embedded-software-components/DRIVE_AGX_SoC/LLM_SDK/llm_sdk.html . It is not available in devzone releases targeted for DRIVE Orin Devkit. NIM is not available on DRIVE. As Orin devkit has limited memory, bigger models are not expected to run. You can use TensorRT APIs to deploy any DL model.

May I know if you have access to NVONLINE?

Dear @SivaRamaKrishnaNV
Thank you for this information. I do not have access to NVONLINE.
Can you guide me on how to get access to NVONLINE?

Please reach out to your NVIDIA representative to evaluate the need