API's to access to the Dashboard? Automating updates

Hello bright minds!

Let me start by saying I did try to search for this question as the “README Before Posting” rules said I couldn’t seem to find a direct hit for my question so I am posting. I’m sure my question will draw a lot of eye rolls but I apologize. I am not an AI Developer, I’m an sys-admin so compared to you all I am DUMB!

My question is pretty simple, we have about 8 of the DGX sparks in our datacenter. They have not been clustered in pairs (yet). I am in charge of the care and feeding of them, whereas our dev team are the ones doing the real work with them (or will be soon).

I wanted to know if it exists (if so where) or if its something in the pipeline. I am looking for API’s to access the dashboard. I am trying to automate the process of installing updates since they come out on a daily basis (literally). Currently I always have to manually connect using NVIDIA SYNC first at least I think so to access the dashboard, so I don’t know if that part can also be automated too. If so or if there are plans for API’s for that I would love to be able to get my hands on them. You may be asking I’m sure why I’m trying to do all the patching/updates through the dashboard vs setting up some sort of job within the OS to do it.

Well we are treating these as “enterprise” machines if you will. They are locked up in a datacenter, no one has GUI access, and our security team will not allow it to be enabled. The NVIDIA documentation states very strongly to perform updates via the dashboard instead of doing so through the command line via ssh. I’m not sure the reasoning but based on the documentation my company has decided that is ‘the way’ we are going to do it, unless NVIDIA provides other directions (I know they gave the CLI commands) but unless they update documentation saying its a fully supported and OK method to update patches that way we must use the dashboard for now. So I am looking to try and automate as much of this as possible.

Right now every morning I go and log in manually to each DGX10 via the SYNC tool, access the dashboard and update them if they have updates available. I want to simply this process while staying in line with NVIDIAs “use the dashboard best practice” for updates. So API’s is the only I could think of but I can’t find any, I don’t know if they exist or not, and I know the dashboard has on-going development by NVIDIA to improve it and add more features/information happening everyday ect..

Are there API’s I can leverage to automate this task? Or is there another way you can think of that would work? Again has to be through the dashboard. I don’t get to make the rules, I’m just the computer janitor so to speak.

Thank you kindly in advance for your patience and kindness with someone who is nowhere at the caliber that most of you reading are at. I know I’m likely one of the dumbest people here!

I know that probably most people here have these sitting on their desk vs locked up in a cage like we do. Again I don’t make the rules…

Much appreciated :)

I would use the update process via ssh, personally.

Though application of some updates require reboots. If your users have long running tasks, rebooting without warning could disrupt their work. Usually maintenance windows are required for such things.

1 Like

You can use the normal apt way on Ubuntu (plus fwupdmgr).

I recommend ansible for automation as you can also orchestrate any required reboots.

1 Like

I would agree with you both, however, how come the documentation “heavily recommends” using the dashboard for updates and not the apt mgr via cli? My leadership has read that and now I have no choice but to use the Dashboard.

Thanks!

1 Like

The “use the dashboard” wording is mainly about reducing user error, not about a technical limitation of the CLI.

NVIDIA support has commented here on the forum that the dashboard/TTX update path is the recommended default because not all users are comfortable administering Linux correctly (repos, dependencies, package pinning/holds, kernel/driver alignment, etc.). If someone runs the wrong commands or updates the wrong components in the wrong order, it can leave the system in a broken or partially-upgraded state. The documentation is written to steer the broadest audience toward the most controlled/guardrailed update method and to avoid people clicking an update button (or doing an equivalent action) without understanding what Linux package updates actually do.

If you understand how the Linux update process works and you’re following the correct procedure for your distribution/environment, there’s nothing inherently preventing you from updating via the terminal/apt. In other words: the dashboard is “heavily recommended” for safety and consistency across a diverse user base, not because CLI updates are unsupported when performed properly.

2 Likes

100% agree, I mean NVIDIA even gives to the commands and the order to run them in on their docs. I guess I would need an official NVIDIA employee to chime in to that I can sway my leadership. I have to patches scores of Linux environments of all kinds as part of normal maintenance so I am very comfortable doing so. I just can’t un-do what leadership has seen in the official documentation and the very strong wording they used. I know what you are thinking and I agree! Sadly unless NVIDIA states something like what you wrote above I’m stuck doing it the “safe” way. Hence I was looking for API’s to automate. I appreciate all the feedback from all of you I really do, thank you. Maybe this post will ‘nudge’ NVIDIA to post a blurb I can share with leadership or update the documentation to read a little differently. Being that these are locked up in a datacenter, if we do implode them for any reason, its a trip to the DC to go use the USB Key to factory restore (until we get a backup solution tested and working). My job says they are going to ship me one to play with, not really “play” but to figure out things like backups so we can restore to a good known state without having to reconfigure everything WHEN (not if) our devs implode the boxes. I get the awesome job of documenting all the procedures when I get a physical device, lucky me. I don’t get to keep it, its going back to corporate and into the DC where it then does become for “our” team. I live on the other side of the country compared to corporate and the datacenter so I have to make documentation that “anyone” I’m sure you know what I mean by that, can follow. Thanks again. You will see me posting more and more believe me! Not necessarily on this thread but in general.

I appreciate the kind responses and respect. Sometimes forums can be nasty especially when looking down at someone they feel is inferior to them. Not saying you guys feel like that. Just I know forums can sometimes be. Glad this appears to be a good space where people are treated with dignity regardless of how dumb the question may be.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.