DeepOps on DGX-Station

Kaka_m · January 10, 2020, 8:07am

Hi Nvidia support,

Would you please provide your comments for our usage case?

In case of using the DeepOps with management server and DGX-Station, how should we share the learning data on DGX-Station?

We would like to avoid the access the DGX-Station directly from each developers.
I think that we should use the NFS server on DGX-Station and use the RAID as NFS storage.

If you have any questions, please let me know.
Best regards.
Kaka

ScottEllis · January 11, 2020, 1:25am

Are you using Kubernetes, or Slurm, or just having users use each DGX Station as a standalone system (not controlled with a job scheduler)?

Also, remember that as a DGX customer, you can always contact NVIDIA Enterprise Support ( Enterprise Customer Support | NVIDIA ) to get more real-time assistance with your DGX product - including questions and issues with DeepOps software. We’re happy to use this forum to communicate and help too, but want to make sure you know there’s a formal support path as well!

Kaka_m · January 13, 2020, 11:38pm

Hi Scott-san

Thank you for your response. We will use the Kubernetes.
Here is our development environments.

Workstation (K8s Master) — DGX-Station (K8s Slave)
Note: I would like to avoid to access the DGX-Station directly from each users.

And I understand that the Enterprise support. I will check it.

If you have any questions, please let me know.
Best regards.
Kaka

Kaka_m · January 21, 2020, 1:19am

Hi Scott-son,

Would you give us your comments?

Best regards.
Kaka

ScottEllis · January 21, 2020, 8:32pm

You’ll want to add the server and client DGX Station systems to the [nfs-server] and [nfs-clients] sections of the deepops/config/inventory file, and then modify deepops/config/group_vars/all.yml to setup the new_exports to where you’re exporting on the “server” DGX Station, and similarly where you want to mount it to on the “client” DGX Stations. E.g., if you were going to export “/data” on the server and want it visible as “/mnt/data” on the clients, the config would look like:

# ~/deepops/config/group_vars/all.yml
nfs_exports:
  - path: /data
    options: "*(rw,sync,no_root_squash)"

nfs_mounts:
  - mountpoint: /mnt/data
    server: '{{ groups["nfs-server"][0] }}'
    path: /data
    options: async,vers=3

To put that into effect, run the ‘nfs.yml’ playbook, which should install the NFS client and server bits on the systems you defined in the inventory.

At that point, you’ll have basic NFS working, with all systems sharing the (for example) “/data” directory from the server DGX Station.

To make use of it in Kubernetes, edit the deepops/services/nfs-client.yml to include the NFS server and client information, for example:

# ~/deepops/services/nfs-client.yml
  nfs:
    server: 10.0.10.1
    path: "/data"

You can then create PVCs with kubegtl create -f services/nfs-client.yml, which will let your NFS export be visible from within PODs.

Topic		Replies	Views
RAID storage on DGX-Station DGX User Forum	2	770	January 10, 2020
[Ask] Fresh Installation DGX User Forum	6	1079	January 27, 2022
DGX practical management for multi user system DGX User Forum	2	1449	August 27, 2019
DGX-2 Server Virtualization Leverages NVSwitch for Faster GPU Enabled Virtual Machines Technical Blog	1	490	October 23, 2019
[DGX-1]Ubuntu and DGX OS upgrade question DGX User Forum kernel	6	956	July 17, 2023
NVIDIA GPU Operator: Simplifying GPU Management in Kubernetes Technical Blog	0	469	August 25, 2020
DGX User Support DGX User Forum	3	2774	July 12, 2021
NVIDIA DGX SuperPOD Delivers World Record Supercomputing to Any Enterprise Technical Blog	0	426	August 25, 2020
Maximizing NVIDIA DGX with Kubernetes Technical Blog	0	378	August 25, 2020
GPU management deploying deepstream on Kubernetes DeepStream SDK	4	1175	December 1, 2022

DeepOps on DGX-Station

Related topics