can Morpheus containers be run directly on the BlueField when BlueField-X mode is configured, thereby exercising a local GPU? Does this mean that the BlueField host processor is used for more than just data path initial configuration? If so, could you please point me to any relevant documentation / and / or examples for such a set up?
Also would be interested to see any information on the server’s host processor set up for such a case -
Hey Brandt, thanks for the question. Morpheus is engineered with that use case in mind. For example, the core runtime builds, compiles, and runs ARM native. However, there are parts of Morpheus that still are reliant on a discrete GPU (e.g., BF-X operating in discrete mode rather than embedded mode). The engineering teams are working on it, and it would be great to hear more about your use case so we can better understand how early adopters may want to use this. Is that something you are comfortable providing in a public forum like this?
Eventually I would like to have multiple DPUs in a data center directing streams against a Triton based inference engine running a cyber security application.
To start with however, am just now getting a BlueField 2 system set up and would then like to just bring up, within a docker, the Digital Finger Printing example, running on canned data. This would be on a single BlueField2 DPU either with a local embedded GPU in a converged accelerator or with a discrete A2 or A30 GPU. I’m picturing this would be an analogous step to an experiment I ran bringing up the docker sudo docker run --rm -ti --net=host --gpus=all nvcr.io/nvidia/morpheus/morpheus:22.09-runtime bash
on a x86 server with a discrete A2 GPU. Does that sound possible?
Yes, that should be possible. We’ve designed DFP so that it could eventually run on the BF2x in embedded mode (like you describe). The one issue though could be model management and model size. With a traditional DFP deployment, you’re creating 1000s / 10,000s models. The limited GPU memory on a single BF2x card could limit the amount you can hold in memory, leading to many model context switches (which means I/O). Did you have an idea on how many models you were thinking to run?
It is possible to run Morpheus containers directly on the BlueField when BlueField-X mode is configured, as long as the container is configured to use the local GPU resources. This would involve configuring the container to use the appropriate GPU drivers and libraries, and configuring the host system to allocate the appropriate resources to the container.
The BlueField host processor can be used for more than just data path initialization in this scenario, as it would also be responsible for managing the container and allocating resources to it.
Documentation and examples for this type of setup may vary depending on the specific platform and version of BlueField being used. It is recommended to check the vendor’s website and documentation for the specific instructions and guidelines on running Morpheus containers on BlueField with GPU acceleration.
Thank you for the insight and great reply. My apologies for the disconnect on getting back to you. Am just now picking this up again. As we look to sketch out and possibly demonstrate the capabilities of DPUs at the concept level, and if we are envisioning the DPU to orchestrate the GPU, we will, based on your comment, outline plans that have enough DPUs allocated such that the number of models run on each will together fit within memory. Are there any published references that indicate the size of models resulting from different scenarios and feature sizes? As I run experiments will keep track of the models and try to plan out based on that - Thank you again for your help, Brandt
Thank you for your helpful reply regarding BlueField-X mode! I was away from this topic for a while and meant to acknowledge that Bartley’s earlier reply is very helpful as is yours. Thank you for clarifying how the DPU can be responsible for the Morpheus container. I’ll also look to the specific server instructions for details on how the DPU can tie up and integrate with the GPU accelerators. Thank you again, Brandt