GPU Usage Stuck at Placeholder in C++ Llama 3.2 App - Need NVML Help!

Description

Hey NVIDIA crew, I’m working on this C++ terminal app for Llama 3.2 (shoutout to your GPU tech in the README!), and I’ve hit a snag. The GPU usage is hardcoded to int gpu_usage = 0;—no real measurement, just a placeholder. I’m on the “optimize-algorithm” branch trying to juice up performance, but without actual GPU stats, I’m stuck. The README teases “GPU usage: 5%” in an example, but it’s fake. How do I hook up something like NVML to get the real deal? Appreciate any pointers!

Environment

Here’s my setup (fill in your own if different):

  • TensorRT Version: N/A (not using it here)
  • GPU Type: NVIDIA GTX 1660 (mid-tier, might upgrade—yours?)
  • Nvidia Driver Version: 535.104.05
  • CUDA Version: 11.8
  • CUDNN Version: 8.9.0
  • Operating System + Version: Ubuntu 22.04
  • Python Version: N/A (pure C++)
  • TensorFlow Version: N/A
  • PyTorch Version: N/A
  • Baremetal or Container: Baremetal

Relevant Files

Check my repo:

Steps To Reproduce

  1. Clone it: git clone -b optimize-algorithm https://github.com/bniladridas/cpp_terminal_app.git
  2. Build: mkdir build && cd build && cmake .. && make
  3. Run: ./LlamaTerminalApp
  4. Output shows “GPU usage: 0%” (or 5% in README example)—all fake, no crash, just no real data.

No traceback, just a quiet fail on the GPU front.

Question

I’m thinking NVML could fix this since you guys rock CUDA. How do I plug it in to measure actual GPU usage for this Llama 3.2 beast? Code snippets or tips would be clutch—thanks!

Hi @bniladridas
Apologies for the delay,
This forum talks about issues related to TRT, hence i suggest you to pls raise it top CUDA forum.

thanks