Seeking Advice on Selecting the Right GPU for Deep Learning: Grace Hopper Superchip vs. H100 Series vs. HGX H100

Hello everyone,
As part of my membership in the Inception Program, I’ve received an email containing the full list of offers, which has prompted me to delve deeper into selecting the right GPU for my deep learning projects. I’m currently exploring options between the Grace Hopper Superchip, H100 NVL 94GB, H100 80 GB, and the HGX H100. Each option presents unique capabilities, but determining the best fit for my specific requirements is a complex task.

I’m also interested in understanding the operational readiness of HGX machines for deployment. It’s crucial for my planning to know if these systems are plug-and-play for deep learning tasks or if they need additional setup and components to be functional.

Given these intricate considerations, I’m seeking expert advice or resources for a more detailed comparison of these GPUs.

Thank you all in advance for your support and insights!
Best regards.

I am not export on this, so this is only a basic suggestion.

First, is Confidential Computing your use case? If so, then Grace Hopper Superchip should not be considered because it does not support confidential computing (Grace CPU does not support ARM CCA, Does the Grace CPU support Arm CCA?). If not so, this category might not be the best place to ask about general accelerated computing.

Then, the other products would differ in GPU memory bandwidth, peer-to-peer bandwidth, and so on. Could you detail your use case? Training (whether finetuning or not), inferencing; LLM, recommendation system or vision tasks. It can greatly help other expects to make suggestions for you.

Hi, although confidential computing isn’t directly related to my use case, I encountered some difficulty in posting this topic in a more suitable category. My intention is to use these GPUs primarily for training vision tasks on satellite data.