Hello,
I am facing an error when requesting the creation of CIs within a previously created GI on a Nvidia A30 GPU.
The steps to reproduce the error are as follows:
Setup:
Nvidia A30, 24 Gb.
Driver version: 525.85.12
Step 1: Activate MIG
$ nvidia-smi -i 0 -mig 1
Enabled MIG Mode for GPU 00000000:8D:00.0
All done.
Step 2: List available GI profiles
$nvidia-smi mig -lgip
±----------------------------------------------------------------------------+
| GPU instance profiles: |
| GPU Name ID Instances Memory P2P SM DEC ENC |
| Free/Total GiB CE JPEG OFA |
|=============================================================================|
| 0 MIG 1g.6gb 14 4/4 5.81 No 14 1 0 |
| 1 0 0 |
±----------------------------------------------------------------------------+
| 0 MIG 1g.6gb+me 21 1/1 5.81 No 14 1 0 |
| 1 1 1 |
±----------------------------------------------------------------------------+
| 0 MIG 2g.12gb 5 2/2 11.69 No 28 2 0 |
| 2 0 0 |
±----------------------------------------------------------------------------+
| 0 MIG 2g.12gb+me 6 1/1 11.69 No 28 2 0 |
| 2 1 1 |
±----------------------------------------------------------------------------+
| 0 MIG 4g.24gb 0 1/1 23.44 No 56 4 0 |
| 4 1 1 |
±----------------------------------------------------------------------------+
Step 3: Create GI instance covering the full GPU
$ nvidia-smi mig -cgi 0 -C
Successfully created GPU instance ID 0 on GPU 0 using profile MIG 4g.24gb (ID 0)
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 0 using profile MIG 4g.24gb (ID 3)
Step 4: List available CI profiles
nvidia-smi mig -lcip -gi 0
±-------------------------------------------------------------------------------------+
| Compute instance profiles: |
| GPU GPU Name Profile Instances Exclusive Shared |
| Instance ID Free/Total SM DEC ENC OFA |
| ID CE JPEG |
|======================================================================================|
| 0 0 MIG 1c.4g.24gb 0 0/4 14 4 0 1 |
| 4 1 |
±-------------------------------------------------------------------------------------+
| 0 0 MIG 2c.4g.24gb 1 0/2 28 4 0 1 |
| 4 1 |
±-------------------------------------------------------------------------------------+
| 0 0 MIG 4g.24gb 3* 0/1 56 4 0 1 |
| 4 1 |
±-------------------------------------------------------------------------------------+
Step 5: Create 4 CIs for profile 0
$ nvidia-smi mig -cci 0,0,0,0 -gi 0
Unable to create a compute instance on GPU 0 GPU instance ID 0 using profile 0: Insufficient Resources
Failed to create compute instances: Insufficient Resources
==================
This error appears for every GI profile, and for any CI profile requested for creation within it.
I have not seen any restriction in the creation of CIs for the A30 in the official documentation.
The availability (Instances Free/Total marked, e.g. as 0/4) is suspicious, but it is exactly the same as the example for the A100 in the official user guide, and in that case, the creation of CIs is allowed:
NVIDIA Multi-Instance GPU User Guide :: NVIDIA Tesla Documentation
Regards,
Francisco