Jetson Thor MIG support: first impression

amarbalutkar · June 10, 2026, 3:19pm

I tested our MIG (Multi Instance GPU) on Jetson Thor with latest Jetpack 7.2. Here are few observations that things that worked and did not work. Sharing it for the benefit of others:

What does MIG do?

Thor has 20 SM (streaming multiprocessor) cores. MIG splits the SM’s into isolated slices so that you could have dedicated workloads on each slice. Since Thor has unified memory, both workloads share the same memory.
On Jetpack 7.2, the slices are fixed: one slice with 8 SM another slice with 12 SM.
The workload I wanted to test out is, can I have one inference served from one slice and a different one from another slice. I wanted to check if the slicing helps in GPU contention since memory itself isnt a problem on thor, but single GPU is. (I know this could be different then the actual use-case people might want to use it for)

How to enable it?

The documentation is pretty clear, you cannot enable MIG if GPU is being used. Which means first step is to disable gdm (UI), since its using GPU

sudo systemctl stop gdm

MIG resets on a reboot, if you want a permanent split, run the following command. If you are just testing out in a session, this is optional

sudo nvidia-smi -pm 1 #optional

Next, we will actually carry the split:

sudo nvidia-smi -i 0 -mig 1

sudo nvidia-smi mig -cgi 83,78 -C # ONLY these profiles work. Took this from release notes

nvidia-smi -L # This should show a split

That’s it! Here is a sample output how it looks like on my Thor:

amar@localhost:~$ nvidia-smi -L
GPU 0: NVIDIA Thor (UUID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600)
MIG 2g.0gb Device 0: (UUID: MIG-c16cc329-4600-51d3-a578-e5c5bf35344e)
MIG 1g.0gb Device 1: (UUID: MIG-9f59caa3-c7a7-5270-b22e-d94cff3d9432)

How to test it?

Once you have the split, you can use CUDA_VISIBLE_DEVICES to specify which slice you want to target.
I used llama.cpp as an inference engine since my cursory research showed it respects the env variable.

CUDA_VISIBLE_DEVICES=<1g-uuid> llama-server -m gemma-12b.gguf --port 8082 &
CUDA_VISIBLE_DEVICES=<2g-uuid> llama-server -m qwen-27b.gguf --port 8081 &

Few gotchas to keep in mind

If you want your instances to persist, ensure you are running sudo nvidia-smi -pm 1
While launching workloads, ensure that smaller slice’s workload gets launched first and then do the larger slice.
- If you start larger slice first I observed that the smaller slice hangs at CUDA init indefinitely. Reverse works fine.
nvidia-smi is not correct as of R39.2 release notes. There is a known issue 6162096 — “Output of nvidia-smi is incorrect when using MIG.”
- This gives you incorrect information of the SM split being 12 and 6
Teardown: ordering matters:

# Stop everything running on the slices

pkill -f llama-server # or whatever holds contexts

# Destroy compute instances, then GPU instances (order matters)

sudo nvidia-smi mig -dci

sudo nvidia-smi mig -dgi

# Disable MIG mode → whole 20-SM GPU, takes effect immediately, no reboot

sudo nvidia-smi -i 0 -mig 0

# Bring the desktop back if you want it

sudo systemctl start nvargus-daemon gdm

AastaLLL · June 11, 2026, 6:56am

Hi,

Thanks for sharing this observation.
About the thing that did not work, is it only the nvidia-smi output? (this is a known issue)

Thanks.

amarbalutkar · June 11, 2026, 4:31pm

Played around more with it and saw few issues. Hopefully addressing them will help solidify the feature.

Here is some common information to help set the baseline:

My Thor’s config

Item	Value
Device	Jetson AGX Thor Developer Kit (P3834-0008)
L4T / JetPack	R39.2.0 GA (`nvidia-l4t-core 39.2.0-20260601141651`) = JetPack 7.2
Kernel	6.8.12-1021-tegra
Driver / CUDA	595.78 / CUDA 13.2 (V13.2.78)
Boot medium	NVMe (SDK Manager flash)
Display	gdm + nvargus-daemon stopped (required to create instances); persistence mode ON
MIG profiles used	83 = `2g.0gb+gfx` (12 SM) + 78 = `1g.0gb+me` — the pair the R39.2 release notes recommend (issues 6237160/6238288)

Wrote a small program to test without any additional frameworks. The bugs were reproduced with this to minimize other overhead

#include <cstdio>
#include <cuda_runtime.h>
#define STEP(name, call) do{ printf("STEP %-28s ", name); fflush(stdout); \
  cudaError_t e=(call); printf("-> %s\n", cudaGetErrorString(e)); fflush(stdout);}while(0)
int main(){
  int n=0; cudaDeviceProp p;
  STEP("cudaGetDeviceCount",      cudaGetDeviceCount(&n));
  STEP("cudaGetDeviceProperties", cudaGetDeviceProperties(&p,0));
  printf("   name=%s SMs=%d\n", p.name, p.multiProcessorCount);
  STEP("cudaSetDevice",           cudaSetDevice(0));   // <-- hangs here in bug states
  void* d=0; STEP("cudaMalloc 1MB", cudaMalloc(&d,1<<20));
  printf("ALL DONE\n"); return 0;
}
// build: nvcc -arch=sm_110 devq2.cu -o devq2

Common setup used:

sudo systemctl stop gdm nvargus-daemon # GPU must be client-free to partition
sudo nvidia-smi -pm 1 # else MIG mode resets on reboot
sudo nvidia-smi -i 0 -mig 1
sudo nvidia-smi mig -cgi 83,78 -C # recommended pair: 2g.0gb+gfx + 1g.0gb+me
nvidia-smi -L # note the two MIG UUIDs; SMALL = the 1g device

Now, here are the bugs:

Bug #1: CUDA context creation on the small slice hangs while the big slice has an active context (reverse order works)
Steps to reproduce:

Fresh instances as above (verify small slice works first: CUDA_VISIBLE_DEVICES=<SMALL> ./devq2 → ALL DONE).
Start any long-lived CUDA process on the big slice (e.g. CUDA_VISIBLE_DEVICES=<BIG> llama-server ..., or any app that holds a context). Wait until it is fully initialized and serving.
Now run CUDA_VISIBLE_DEVICES=<SMALL> timeout 30 ./devq2.

Expected output: ALL DONE

Actual output: cudaGetDeviceCount and cudaGetDeviceProperties succeed (reports NVIDIA Thor MIG 1g.0gb, SMs=8), then cudaSetDevice(0) never returns (process must be SIGKILLed; timeout exit 124).

Note: Running the step steps in reverse works.

Bug #2: After a hung client is SIGKILLed, the small slice stays broken until the GPU instance is destroyed/recreated
Steps to reproduce:

Trigger Bug 1 (client hung in cudaSetDevice on the small slice).
kill -9 the hung process. Stop the big-slice workload too — GPU now fully idle.
Run CUDA_VISIBLE_DEVICES=<SMALL> timeout 30 ./devq2 again on the now-idle GPU

Expected output: ALL DONE (slice was proven good before step 1, and the GPU is idle).
Actual: still hangs at cudaSetDevice. The slice remains unusable until: sudo nvidia-smi mig -dci && sudo nvidia-smi mig -dgi && sudo nvidia-smi mig -cgi 83,78 -C after which devq2 passes again.

Bug #3: Profile 78 created alone (without 83) hangs CUDA init
Steps:

sudo nvidia-smi mig -dci; sudo nvidia-smi mig -dgi
sudo nvidia-smi mig -cgi 78 -C (the 1g.0gb+me instance is the ONLY instance)
CUDA_VISIBLE_DEVICES=<that UUID> timeout 30 ./devq2

Expected: ALL DONE, or instance creation refused if unsupported alone.
Actual: hangs (release notes say 78+83 are the recommended profiles, but nothing prevents creating 78 alone, and the failure mode is a silent hang rather than an error).

Bug #4: MIG mode cannot be re-enabled after disabling; -mig 0 silently leaves the GPU in a requires-reset state (reboot is the only recovery)

Steps:

Working MIG state (83+78, both slices serving). Stop all GPU clients.
sudo nvidia-smi mig -dci; sudo nvidia-smi mig -dgi
sudo nvidia-smi -i 0 -mig 0 → prints Disabled MIG Mode for GPU ... / All done. — no warning. Full-GPU CUDA works immediately and at full speed (we benchmarked two LLMs on it).
Later: sudo nvidia-smi -i 0 -mig 1

Expected: MIG re-enabled (it worked the same way earlier in the boot), or step 3 warning that a reset is now pending.

Actual: Unable to enable MIG Mode for GPU ...: GPU requires reset. From then on mig.mode.current reports [GPU requires reset]. sudo nvidia-smi --gpu-reset -i 0 → Not Supported (iGPU).

Only a full reboot recovers. Consequence: any orchestration that toggles MIG off (e.g. to run a whole-GPU job) cannot get its partitions back without rebooting the robot; meanwhile CUDA apps relaunched expecting MIG silently fall back to CPU (no MIG devices enumerate).

amarbalutkar · June 11, 2026, 4:32pm

Just put together all my observations with steps to repro. Hope this helps!

AastaLLL · June 16, 2026, 7:00am

Hi,

We try to reproduce this issue but the test works well.

Dividing by 78, 83 profile:

$ nvidia-smi -L  
GPU 0: NVIDIA Thor (UUID: GPU-a7c66ad2-6dbb-0ab8-c1a2-37ba6dba3600)
  MIG 2g.0gb      Device  0: (UUID: MIG-c465d416-67ab-59e7-b780-1a8f1a762326)
  MIG 1g.0gb      Device  1: (UUID: MIG-b59006b0-f2c1-5668-b295-053748350196)

Then we first ran matrixMul on MIG 2g.0gb:

$ CUDA_VISIBLE_DEVICES=MIG-c465d416-67ab-59e7-b780-1a8f1a762326 ./matrixMul -wA=512 -wB=8192 -hA=8192 -hB=512

And repeatedly ran your sample on MIG 1g.0gb (since it takes some time for the matrixMul to finish)

$ CUDA_VISIBLE_DEVICES=MIG-b59006b0-f2c1-5668-b295-05374835019 ./test 
STEP cudaGetDeviceCount           -> no error
STEP cudaGetDeviceProperties      -> no error
   name=NVIDIA Thor MIG 1g.0gb SMs=8
STEP cudaSetDevice                -> no error
STEP cudaMalloc 1MB               -> no error
ALL DONE

No error is observed. Is there anything missing in our setting?

Thanks.

amarbalutkar · June 16, 2026, 2:37pm

The only thing that comes to my mind is, is the matrixmul exiting or is it running continous operations, still holding onto the GPU cores?

I tried the steps I provided and it still happens. The key difference being slice 1 is occupied by llama server which is continuously delivering inference and not exiting.

I can try with other apps when I find some time.

Below screenshot is me waiting >15 mins for cudaSetDevice to work (still didnt) when the big slice was running first.

AastaLLL · June 22, 2026, 6:24am

Hi,

Could you try to add the following configuration when launching llama-server?

$ GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 ./llama-server ...

This test helps to check if this issue is memory-related.
Thanks.

Topic		Replies	Views
How to use MIG technology to divide computing units in Thor？ Jetson Thor	10	782	October 2, 2025
Multi-Instance GPU (MIG) feature for Jetson AGX Thor and Drive AGX Thor? Jetson Thor	1	156	January 14, 2026
MIG Configuration Not Working on Thor Jetson Thor gpu	4	410	February 7, 2026
Jetson Thor, before MIG Jetson Thor jetson	2	165	February 5, 2026
Getting the Most Out of the NVIDIA A100 GPU with Multi-Instance GPU Technical Blog	11	1847	January 19, 2023
Jetson Thor MIG Enabled but Unable to Create GPU Instances (Driver 580.00 / L4T R38.2) Jetson Thor mig	3	83	June 25, 2026
Can we fix the number of GPU Cores used during a TensorRT inference execution? & is (Multi-instance GPU) MIG Available on Jetson AGX Thor Jetson Thor tensorrt , mig	4	46	June 29, 2026
MIG support(NVIDIA Jetson Thor )? Jetson Thor	3	118	June 3, 2026
When MIG support on Thor will be avaiable. ? Critical Jetson Thor	1	132	December 21, 2025
MIG performance CUDA Programming and Performance	15	1497	November 28, 2024

Jetson Thor MIG support: first impression

Related topics