DGX Spark Power Consumption

Please provide the following information when creating a topic:

  • Hardware Platform (GPU model and numbers)
  • System Memory
  • Ubuntu Version
  • NVIDIA GPU Driver Version (valid for GPU only)
  • Issue Type( questions, new requirements, bugs)
  • How to reproduce the issue ? (This is for bugs. Including the command line used and other details for reproducing)
  • Requirement details (This is for new requirement. Including the logs for the pods, the description for the pods)

This discussion is based around the DGX Spark, which has 128 GB of unified memory and is based on Ubuntu 24.04 with CUDA 13.0. I have a potential issue with performance.

The docker container I’m using for this project is from the base image: nvcr.io/nvidia/vllm:25.09-py3

Here is a github link to the project: GitHub - andrewcapatina/research-assistant: This project will summarize a collection of research papers gathered from internet sources on the NVIDIA DGX Spark/Jetson development boards.

If you have the time, please review this github repo and see if you could spot any obvious mistakes.

I want to open a discussion because I think I could get more performance from the Spark, but am not sure if I’m missing any steps to get more performance.

Steps I’ve taken to improve performance:

  1. vLLM usage
  2. Adjusting max_model_len, max_num_batched_tokens
  3. Batched inference

When I run my app, it never gets above 55W, far from the 240W max. The GPU is above 90% utilization during LLM inference. Checked via the DGX dashboard.

The purpose of the app is to gather/summarize research papers. For 7 research papers, I’m getting a total latency of 5.08 seconds with 45W of average power. I believe this should be faster. Below is a snippet from vLLM logs I believe.

Adding requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 1696.43it/s]
Processed prompts: 100%|████████████████████████████████████████| 7/7 [00:05<00:00,  1.38it/s, est. speed input: 619.00 toks/s, output: 207.58 toks/s]
Latency: 5.08s | Avg Power: 44.91W

So I want to keep this discussion open ended and receive any feedback that might help me get more utilization from the spark.

Are there any nvidia utilities I could use to safely achieve better numbers, any steps regarding the libraries provided with the docker image, or other techniques?

Thanks

Please reference our topic on DGX Spark power usage: DGX Spark Power Clarification