I am trying to load LLaMA 3 on the Jetson Nano, which has 4GB of VRAM. However, I am unsure if it is capable of handling such a large model. I managed to load the model after swapping 8GB of memory, but the response time was extremely slow.
I am considering upgrading the board and would like confirmation on whether the Jetson Nano is truly capable of handling and loading such a large model. Any suggestions or insights would be appreciated.
Hi,
Thank you so much for your reply. I have one last question: does the GPU in the Jetson Nano need to be enabled manually, or does it activate automatically? If the GPU was not enabled, could that explain the slow response time while running an LLM? I’d appreciate it if you could provide an answer.
Usually, the LLM sample/source will by default run the tasks on GPU.
The long latency might be more related to memory as Nano’s resources are quite limited.
To confirm this, you can run tegrastats concurrently to check if GPU is in use (utilization > 0%).