Add server to nano_llm

Hello,

I would like to create a container that use NanoLLM but as a server for LLM inference
The idea is to have the context a bit like the example but then use websocket for example to stream the answer to the rest of the application.

l looked at the examples and the different code in the jetson-container repository but I couldn’t find how to add my own script inside a docker container

Is there any tutorial or example to follow to do such thing ?

Thanks in advance

Hi,

Are you looking for functionality like Jetson Platform Services?

Thanks.

Hello,

thank you for your answer

I am already using the jetpack 6 on an orin AGX for a robotic application.
I have benchmark several LLM inference solutions for my use case which is having a good dialog manager with function calling abilities.

This is why I am looking at NanoLLM inside a Jetson Containers.
Though, I can run it manually using the docker and following the tutorial, I can’t communicate with the rest of my robotic application as I didn’t understood how to build directly the docker I need and add a websocket based script to handle the Text to LLM to “Text and function calling” output.
Yet, I just saw dusty-NV answers in a github issue about it and will use this workaround solution as a first step

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.