RAGJet: Retrieval-Augmented Generation on Jetson Xavier AGX (LLaMA + FastApi)

As a proof of concept (PoC), I have specialized a large language model (LLM) on the reference documentation of the cuDNN library. This specialization enables the model to provide precise and context-aware responses specific to cuDNN. Furthermore, the model has been deployed within a FastAPI application, allowing for integration and interaction in a real-world scenario on the Jetson Xavier hardware. Github repo.

1 Like