DeepOps on DGX-Station

Hi Nvidia support,

Would you please provide your comments for our usage case?

In case of using the DeepOps with management server and DGX-Station, how should we share the learning data on DGX-Station?

We would like to avoid the access the DGX-Station directly from each developers.
I think that we should use the NFS server on DGX-Station and use the RAID as NFS storage.

If you have any questions, please let me know.
Best regards.
Kaka

Are you using Kubernetes, or Slurm, or just having users use each DGX Station as a standalone system (not controlled with a job scheduler)?

Also, remember that as a DGX customer, you can always contact NVIDIA Enterprise Support ( https://www.nvidia.com/en-us/support/enterprise/ ) to get more real-time assistance with your DGX product - including questions and issues with DeepOps software. We’re happy to use this forum to communicate and help too, but want to make sure you know there’s a formal support path as well!

Hi Scott-san

Thank you for your response. We will use the Kubernetes.
Here is our development environments.

Workstation (K8s Master) — DGX-Station (K8s Slave)
Note: I would like to avoid to access the DGX-Station directly from each users.

And I understand that the Enterprise support. I will check it.

If you have any questions, please let me know.
Best regards.
Kaka

Hi Scott-son,

Would you give us your comments?

Best regards.
Kaka

You’ll want to add the server and client DGX Station systems to the [nfs-server] and [nfs-clients] sections of the deepops/config/inventory file, and then modify deepops/config/group_vars/all.yml to setup the new_exports to where you’re exporting on the “server” DGX Station, and similarly where you want to mount it to on the “client” DGX Stations. E.g., if you were going to export “/data” on the server and want it visible as “/mnt/data” on the clients, the config would look like:

# ~/deepops/config/group_vars/all.yml
nfs_exports:
  - path: /data
    options: "*(rw,sync,no_root_squash)"

nfs_mounts:
  - mountpoint: /mnt/data
    server: '{{ groups["nfs-server"][0] }}'
    path: /data
    options: async,vers=3

To put that into effect, run the ‘nfs.yml’ playbook, which should install the NFS client and server bits on the systems you defined in the inventory.

At that point, you’ll have basic NFS working, with all systems sharing the (for example) “/data” directory from the server DGX Station.

To make use of it in Kubernetes, edit the deepops/services/nfs-client.yml to include the NFS server and client information, for example:

# ~/deepops/services/nfs-client.yml
  nfs:
    server: 10.0.10.1
    path: "/data"

You can then create PVCs with kubegtl create -f services/nfs-client.yml, which will let your NFS export be visible from within PODs.