Issues Encountered During AWS Model Training Setup

Hi,

Over the past few weeks, we’ve been diligently working on setting everything up for training our model. However, last week, when attempting to run the training task on AWS, we encountered an error related to the VCPU limit while trying to create an EC2 instance with sufficient GPU.

Despite dedicating a week to working with the Amazon support team, unfortunately, we are still struggling to create the required EC2 instance and run our job. I was wondering if anyone could assist us in resolving this issue. Our goal is to run the code with a configuration similar to 8*A100 GPUs.