Error creating CIs with MIG on Nvidia A30

Hello,

I am facing an error when requesting the creation of CIs within a previously created GI on a Nvidia A30 GPU.

The steps to reproduce the error are as follows:

Setup:

Nvidia A30, 24 Gb.
Driver version: 525.85.12

Step 1: Activate MIG

$ nvidia-smi -i 0 -mig 1
Enabled MIG Mode for GPU 00000000:8D:00.0
All done.

Step 2: List available GI profiles

$nvidia-smi mig -lgip
±----------------------------------------------------------------------------+
| GPU instance profiles: |
| GPU Name ID Instances Memory P2P SM DEC ENC |
| Free/Total GiB CE JPEG OFA |
|=============================================================================|
| 0 MIG 1g.6gb 14 4/4 5.81 No 14 1 0 |
| 1 0 0 |
±----------------------------------------------------------------------------+
| 0 MIG 1g.6gb+me 21 1/1 5.81 No 14 1 0 |
| 1 1 1 |
±----------------------------------------------------------------------------+
| 0 MIG 2g.12gb 5 2/2 11.69 No 28 2 0 |
| 2 0 0 |
±----------------------------------------------------------------------------+
| 0 MIG 2g.12gb+me 6 1/1 11.69 No 28 2 0 |
| 2 1 1 |
±----------------------------------------------------------------------------+
| 0 MIG 4g.24gb 0 1/1 23.44 No 56 4 0 |
| 4 1 1 |
±----------------------------------------------------------------------------+

Step 3: Create GI instance covering the full GPU

$ nvidia-smi mig -cgi 0 -C
Successfully created GPU instance ID 0 on GPU 0 using profile MIG 4g.24gb (ID 0)
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 0 using profile MIG 4g.24gb (ID 3)

Step 4: List available CI profiles

nvidia-smi mig -lcip -gi 0
±-------------------------------------------------------------------------------------+
| Compute instance profiles: |
| GPU GPU Name Profile Instances Exclusive Shared |
| Instance ID Free/Total SM DEC ENC OFA |
| ID CE JPEG |
|======================================================================================|
| 0 0 MIG 1c.4g.24gb 0 0/4 14 4 0 1 |
| 4 1 |
±-------------------------------------------------------------------------------------+
| 0 0 MIG 2c.4g.24gb 1 0/2 28 4 0 1 |
| 4 1 |
±-------------------------------------------------------------------------------------+
| 0 0 MIG 4g.24gb 3* 0/1 56 4 0 1 |
| 4 1 |
±-------------------------------------------------------------------------------------+

Step 5: Create 4 CIs for profile 0

$ nvidia-smi mig -cci 0,0,0,0 -gi 0
Unable to create a compute instance on GPU 0 GPU instance ID 0 using profile 0: Insufficient Resources
Failed to create compute instances: Insufficient Resources

==================

This error appears for every GI profile, and for any CI profile requested for creation within it.

I have not seen any restriction in the creation of CIs for the A30 in the official documentation.

The availability (Instances Free/Total marked, e.g. as 0/4) is suspicious, but it is exactly the same as the example for the A100 in the official user guide, and in that case, the creation of CIs is allowed:

NVIDIA Multi-Instance GPU User Guide :: NVIDIA Tesla Documentation

Regards,

Francisco

Solved. The problem was in the creation of the GI and the use of the -C option, that creates a default CI and hence makes it impossible to create more.

This is clearly a bug in the User Guide, and should be solved.

Regards,

Francisco

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.