Description
Hey NVIDIA crew, I’m working on this C++ terminal app for Llama 3.2 (shoutout to your GPU tech in the README!), and I’ve hit a snag. The GPU usage is hardcoded to int gpu_usage = 0;
—no real measurement, just a placeholder. I’m on the “optimize-algorithm” branch trying to juice up performance, but without actual GPU stats, I’m stuck. The README teases “GPU usage: 5%” in an example, but it’s fake. How do I hook up something like NVML to get the real deal? Appreciate any pointers!
Environment
Here’s my setup (fill in your own if different):
- TensorRT Version: N/A (not using it here)
- GPU Type: NVIDIA GTX 1660 (mid-tier, might upgrade—yours?)
- Nvidia Driver Version: 535.104.05
- CUDA Version: 11.8
- CUDNN Version: 8.9.0
- Operating System + Version: Ubuntu 22.04
- Python Version: N/A (pure C++)
- TensorFlow Version: N/A
- PyTorch Version: N/A
- Baremetal or Container: Baremetal
Relevant Files
Check my repo:
main.cpp
—where the GPU usage sits at zero.README.md
—admits it’s a placeholder and gives NVIDIA props.- Link: https://github.com/bniladridas/cpp_terminal_app/tree/optimize-algorithm
Steps To Reproduce
- Clone it:
git clone -b optimize-algorithm https://github.com/bniladridas/cpp_terminal_app.git
- Build:
mkdir build && cd build && cmake .. && make
- Run:
./LlamaTerminalApp
- Output shows “GPU usage: 0%” (or 5% in README example)—all fake, no crash, just no real data.
No traceback, just a quiet fail on the GPU front.
Question
I’m thinking NVML could fix this since you guys rock CUDA. How do I plug it in to measure actual GPU usage for this Llama 3.2 beast? Code snippets or tips would be clutch—thanks!