Jetson Thor MIG support: first impression

I tested our MIG (Multi Instance GPU) on Jetson Thor with latest Jetpack 7.2. Here are few observations that things that worked and did not work. Sharing it for the benefit of others:

What does MIG do?

  • Thor has 20 SM (streaming multiprocessor) cores. MIG splits the SM’s into isolated slices so that you could have dedicated workloads on each slice. Since Thor has unified memory, both workloads share the same memory.
  • On Jetpack 7.2, the slices are fixed: one slice with 8 SM another slice with 12 SM.
  • The workload I wanted to test out is, can I have one inference served from one slice and a different one from another slice. I wanted to check if the slicing helps in GPU contention since memory itself isnt a problem on thor, but single GPU is. (I know this could be different then the actual use-case people might want to use it for)

How to enable it?

  • The documentation is pretty clear, you cannot enable MIG if GPU is being used. Which means first step is to disable gdm (UI), since its using GPU

sudo systemctl stop gdm

  • MIG resets on a reboot, if you want a permanent split, run the following command. If you are just testing out in a session, this is optional

sudo nvidia-smi -pm 1 #optional

  • Next, we will actually carry the split:

sudo nvidia-smi -i 0 -mig 1

sudo nvidia-smi mig -cgi 83,78 -C # ONLY these profiles work. Took this from release notes

nvidia-smi -L # This should show a split

  • That’s it! Here is a sample output how it looks like on my Thor:

amar@localhost:~$ nvidia-smi -L
GPU 0: NVIDIA Thor (UUID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600)
MIG 2g.0gb Device 0: (UUID: MIG-c16cc329-4600-51d3-a578-e5c5bf35344e)
MIG 1g.0gb Device 1: (UUID: MIG-9f59caa3-c7a7-5270-b22e-d94cff3d9432)

How to test it?

  • Once you have the split, you can use CUDA_VISIBLE_DEVICES to specify which slice you want to target.
  • I used llama.cpp as an inference engine since my cursory research showed it respects the env variable.

CUDA_VISIBLE_DEVICES=<1g-uuid> llama-server -m gemma-12b.gguf --port 8082 &
CUDA_VISIBLE_DEVICES=<2g-uuid> llama-server -m qwen-27b.gguf --port 8081 &

Few gotchas to keep in mind

  • If you want your instances to persist, ensure you are running sudo nvidia-smi -pm 1
  • While launching workloads, ensure that smaller slice’s workload gets launched first and then do the larger slice.
    • If you start larger slice first I observed that the smaller slice hangs at CUDA init indefinitely. Reverse works fine.
  • nvidia-smi is not correct as of R39.2 release notes. There is a known issue 6162096 — “Output of nvidia-smi is incorrect when using MIG.”
    • This gives you incorrect information of the SM split being 12 and 6
  • Teardown: ordering matters:

# Stop everything running on the slices

pkill -f llama-server # or whatever holds contexts

# Destroy compute instances, then GPU instances (order matters)

sudo nvidia-smi mig -dci

sudo nvidia-smi mig -dgi

# Disable MIG mode → whole 20-SM GPU, takes effect immediately, no reboot

sudo nvidia-smi -i 0 -mig 0

# Bring the desktop back if you want it

sudo systemctl start nvargus-daemon gdm

Hi,

Thanks for sharing this observation.
About the thing that did not work, is it only the nvidia-smi output? (this is a known issue)

Thanks.

Played around more with it and saw few issues. Hopefully addressing them will help solidify the feature.

Here is some common information to help set the baseline:

  1. My Thor’s config
Item Value
Device Jetson AGX Thor Developer Kit (P3834-0008)
L4T / JetPack R39.2.0 GA (nvidia-l4t-core 39.2.0-20260601141651) = JetPack 7.2
Kernel 6.8.12-1021-tegra
Driver / CUDA 595.78 / CUDA 13.2 (V13.2.78)
Boot medium NVMe (SDK Manager flash)
Display gdm + nvargus-daemon stopped (required to create instances); persistence mode ON
MIG profiles used 83 = 2g.0gb+gfx (12 SM) + 78 = 1g.0gb+me — the pair the R39.2 release notes recommend (issues 6237160/6238288)
  1. Wrote a small program to test without any additional frameworks. The bugs were reproduced with this to minimize other overhead
#include <cstdio>
#include <cuda_runtime.h>
#define STEP(name, call) do{ printf("STEP %-28s ", name); fflush(stdout); \
  cudaError_t e=(call); printf("-> %s\n", cudaGetErrorString(e)); fflush(stdout);}while(0)
int main(){
  int n=0; cudaDeviceProp p;
  STEP("cudaGetDeviceCount",      cudaGetDeviceCount(&n));
  STEP("cudaGetDeviceProperties", cudaGetDeviceProperties(&p,0));
  printf("   name=%s SMs=%d\n", p.name, p.multiProcessorCount);
  STEP("cudaSetDevice",           cudaSetDevice(0));   // <-- hangs here in bug states
  void* d=0; STEP("cudaMalloc 1MB", cudaMalloc(&d,1<<20));
  printf("ALL DONE\n"); return 0;
}
// build: nvcc -arch=sm_110 devq2.cu -o devq2
  1. Common setup used:

sudo systemctl stop gdm nvargus-daemon # GPU must be client-free to partition
sudo nvidia-smi -pm 1 # else MIG mode resets on reboot
sudo nvidia-smi -i 0 -mig 1
sudo nvidia-smi mig -cgi 83,78 -C # recommended pair: 2g.0gb+gfx + 1g.0gb+me
nvidia-smi -L # note the two MIG UUIDs; SMALL = the 1g device


Now, here are the bugs:

Bug #1: CUDA context creation on the small slice hangs while the big slice has an active context (reverse order works)
Steps to reproduce:

  1. Fresh instances as above (verify small slice works first: CUDA_VISIBLE_DEVICES=<SMALL> ./devq2 → ALL DONE).
  2. Start any long-lived CUDA process on the big slice (e.g. CUDA_VISIBLE_DEVICES=<BIG> llama-server ..., or any app that holds a context). Wait until it is fully initialized and serving.
  3. Now run CUDA_VISIBLE_DEVICES=<SMALL> timeout 30 ./devq2.

Expected output: ALL DONE

Actual output: cudaGetDeviceCount and cudaGetDeviceProperties succeed (reports NVIDIA Thor MIG 1g.0gb, SMs=8), then cudaSetDevice(0) never returns (process must be SIGKILLed; timeout exit 124).

Note: Running the step steps in reverse works.


Bug #2: After a hung client is SIGKILLed, the small slice stays broken until the GPU instance is destroyed/recreated
Steps to reproduce:

  1. Trigger Bug 1 (client hung in cudaSetDevice on the small slice).
  2. kill -9 the hung process. Stop the big-slice workload too — GPU now fully idle.
  3. Run CUDA_VISIBLE_DEVICES=<SMALL> timeout 30 ./devq2 again on the now-idle GPU

Expected output: ALL DONE (slice was proven good before step 1, and the GPU is idle).
Actual: still hangs at cudaSetDevice. The slice remains unusable until: sudo nvidia-smi mig -dci && sudo nvidia-smi mig -dgi && sudo nvidia-smi mig -cgi 83,78 -C after which devq2 passes again.


Bug #3: Profile 78 created alone (without 83) hangs CUDA init
Steps:

  1. sudo nvidia-smi mig -dci; sudo nvidia-smi mig -dgi
  2. sudo nvidia-smi mig -cgi 78 -C (the 1g.0gb+me instance is the ONLY instance)
  3. CUDA_VISIBLE_DEVICES=<that UUID> timeout 30 ./devq2

Expected: ALL DONE, or instance creation refused if unsupported alone.
Actual: hangs (release notes say 78+83 are the recommended profiles, but nothing prevents creating 78 alone, and the failure mode is a silent hang rather than an error).


Bug #4: MIG mode cannot be re-enabled after disabling; -mig 0 silently leaves the GPU in a requires-reset state (reboot is the only recovery)

Steps:

  1. Working MIG state (83+78, both slices serving). Stop all GPU clients.
  2. sudo nvidia-smi mig -dci; sudo nvidia-smi mig -dgi
  3. sudo nvidia-smi -i 0 -mig 0 → prints Disabled MIG Mode for GPU ... / All done.no warning. Full-GPU CUDA works immediately and at full speed (we benchmarked two LLMs on it).
  4. Later: sudo nvidia-smi -i 0 -mig 1

Expected: MIG re-enabled (it worked the same way earlier in the boot), or step 3 warning that a reset is now pending.

Actual: Unable to enable MIG Mode for GPU ...: GPU requires reset. From then on mig.mode.current reports [GPU requires reset]. sudo nvidia-smi --gpu-reset -i 0Not Supported (iGPU).

Only a full reboot recovers. Consequence: any orchestration that toggles MIG off (e.g. to run a whole-GPU job) cannot get its partitions back without rebooting the robot; meanwhile CUDA apps relaunched expecting MIG silently fall back to CPU (no MIG devices enumerate).

Just put together all my observations with steps to repro. Hope this helps!

Hi,

We try to reproduce this issue but the test works well.

Dividing by 78, 83 profile:

$ nvidia-smi -L  
GPU 0: NVIDIA Thor (UUID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600)
  MIG 2g.0gb      Device  0: (UUID: MIG-c465d416-67ab-59e7-b780-1a8f1a762326)
  MIG 1g.0gb      Device  1: (UUID: MIG-b59006b0-f2c1-5668-b295-053748350196)

Then we first ran matrixMul on MIG 2g.0gb:

$ CUDA_VISIBLE_DEVICES=MIG-c465d416-67ab-59e7-b780-1a8f1a762326 ./matrixMul -wA=512 -wB=8192 -hA=8192 -hB=512

And repeatedly ran your sample on MIG 1g.0gb (since it takes some time for the matrixMul to finish)

$ CUDA_VISIBLE_DEVICES=MIG-b59006b0-f2c1-5668-b295-05374835019 ./test 
STEP cudaGetDeviceCount           -> no error
STEP cudaGetDeviceProperties      -> no error
   name=NVIDIA Thor MIG 1g.0gb SMs=8
STEP cudaSetDevice                -> no error
STEP cudaMalloc 1MB               -> no error
ALL DONE

No error is observed. Is there anything missing in our setting?

Thanks.

The only thing that comes to my mind is, is the matrixmul exiting or is it running continous operations, still holding onto the GPU cores?

I tried the steps I provided and it still happens. The key difference being slice 1 is occupied by llama server which is continuously delivering inference and not exiting.

I can try with other apps when I find some time.

Below screenshot is me waiting >15 mins for cudaSetDevice to work (still didnt) when the big slice was running first.

Hi,

Could you try to add the following configuration when launching llama-server?

$ GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 ./llama-server ...

This test helps to check if this issue is memory-related.
Thanks.