Hi
I am experimenting with Multi Instance GPU (MIG) mode.
MIG allows the GPU to be partitioned into multiple seperate GPUs.
I have experimented with pytorch and TensorRT models,
these models work fine in both MIG and non MIG mode.
I am facing issue when I try to use ONNX models with ONNXRuntime in MIG mode.
Exactly same model and codebase which works fine in Non MIG mode,
Shows issues if I run in MIG mode.
Relevant Area: Multi Instance GPU, ONNX, ONNXRuntime
Description
BERT Base model from onnxruntime git.
Test Script: onnxruntime/run_benchmark.sh at main · microsoft/onnxruntime · GitHub
BERT base ONNX model, runs w/o any issues on a non MIG GPU.
Same Codebase and Model on MIG GPU runs into following issue:
packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 456, in bind_input
self._iobinding.bind_input(
RuntimeError: Error when binding input: There’s no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]No any result avaiable.
Environment:
NVIDIA GPU: A100 40 GB
NVIDIA Software Version :
NVIDIA-SMI 470.82.01
Driver Version: 470.82.01
CUDA Version: 11.4
OS : Linux 18.04.1-Ubuntu SMP x86_64 GNU/Linux
Other Details :
torch:1.7.0+cu110
onnx:1.10.2
onnxruntime:1.10.0
transformers:3.0.2
numpy:1.19.5