Torch2trt on AGX orin flashed as Nano

Hello everyone!
I’m facing an issue.
I’m working on a Jetson AGX orin developer Kit.
I want to optimize a model created with Pytorch using Torch2trt and trt_pose modules. Using the AGX Orin 32GB my code works fine.
So I tried to re-flash the host machine as a Nano 8GB. I re-run my code but I got the looping “error” lines like below and then my machine freezes:

[TRT] [E] 3: [builderConfig.cpp::canRunOnDLA::493] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/builderConfig.cpp::canRunOnDLA::493, condition: dlaEngineCount > 0

I don’t know if you can help me, but I would like to understand what is the problem at least…
Thank you very much!

Hi,

Which JetPack do you use?
Since Orin 4GB, 8GB variant is added in TRT 8.5, please try it with the latest JetPack 5.1.

Thanks.

Thanks for the reply!
My setup is as follow:

Package: nvidia-jetpack
Version: 5.1-b147
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 5.1-b147), nvidia-jetpack-dev (= 5.1-b147)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_5.1-b147_arm64.deb
Size: 29306
SHA256: 750acd147aa354a2dff225245149c8ac6a3802234157f2185c5d1b6fa9b9d2d9
SHA1: 8363c940eadd7300de57a70e2cd99dd321781b1c
MD5sum: 3da9b145351144eb1588e07f04e1e3d3
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

Hi @AastaLLL !
I’ve tried to re-flash the host machine, always considering the emulation of a AGX Nano 8GB, and after re-installing all the dependencies, that issue didn’t occur…
But since I’ve done nothing different from before I don’t know if there was a problem with the hardware. The only thing is that to install Pytorch I’ve downloaded the wheel from “Pytorch for Jetpack” here
And I had to install torchvision=0.10.0 because of some dependencies error. Do you think I did all right?

By the way, re-running my code to optimize the NN model, the host machine seems freezed (now it’s more or less 30mins). It happens also for the Jetson Nano developer kit, and the way I’ve fixed that problem was to do:

sudo swapoff -a
sudo swapon -a
sudo sysctl vm.swappiness=100

So I did the same operation on the AGX Nano 8GB, but as said, it freezes…

Hi,

We don’t have PyTorch for JetPack 5.1 yet but will be available soon.
Some users try the prebuilt for JetPack 5.0.2 and report it can work.
You can also use it as a temporal workaround.

For the freeze, do you have any idea how much memory it takes when running on the Nano devkit?

Thanks.

Hi!
Thanks for the support as always!
About version of JetPack, you suggest to re-flash the AGX as Nano 8GB again or it is not needed?
And then install the JetPack component with version 5.0.2?
What I’m asking is, which are the correct steps starting from my current machine setup?

As for the freeze, I’m not able to know how much memory it takes when running on Nano devkit… When executing the optimization also the Nano freezes for 10/20 min more or less and I’m not able to check the memory usage.
As for an update, I let my AGX as Nano 8GB turned on for all night and it’s still freezed. Now it’s 15h that it’s like that. Do you suggest to force the shutdown and re-try with JetPack 5.0.2?

Just to have more detail (maybe it can help), while using the torch2trt modules, the functions that have used and are freezing the machine are those below. After calling:

torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)

the freezing is caused by the function below and in particular while doing the instruction under # BUILD ENGINE section (“engine = builder.build_engine(network, config)”):

def torch2trt(module,
              inputs,
              input_names=None,
              output_names=None,
              log_level=trt.Logger.ERROR,
              fp16_mode=False,
              max_workspace_size=1<<25,
              strict_type_constraints=False,
              keep_network=True,
              int8_mode=False,
              int8_calib_dataset=None,
              int8_calib_algorithm=DEFAULT_CALIBRATION_ALGORITHM,
              use_onnx=False,
              default_device_type=trt.DeviceType.GPU,
              dla_core=0,
              gpu_fallback=True,
              device_types={},
              min_shapes=None,
              max_shapes=None,
              opt_shapes=None,
              onnx_opset=None,
              max_batch_size=None,
              **kwargs):

    # capture arguments to provide to context
    kwargs.update(locals())
    kwargs.pop('kwargs')
        
    # handle inputs as dataset of list of tensors
    if issubclass(inputs.__class__, Dataset):
        dataset = inputs
        if len(dataset) == 0:
            raise ValueError('Dataset must have at least one element to use for inference.')
        inputs = dataset[0]
    else:
        dataset = ListDataset()
        dataset.insert(inputs)
        inputs = dataset[0]

    outputs = module(*inputs)
    input_flattener = Flattener.from_value(inputs)
    output_flattener = Flattener.from_value(outputs)

    # infer default parameters from dataset

    if min_shapes == None:
        min_shapes_flat = [tuple(t) for t in dataset.min_shapes(flat=True)]
    else:
        min_shapes_flat = input_flattener.flatten(min_shapes)

    if max_shapes == None:
        max_shapes_flat = [tuple(t) for t in dataset.max_shapes(flat=True)]
    else:
        max_shapes_flat = input_flattener.flatten(max_shapes)
    
    if opt_shapes == None:
        opt_shapes_flat = [tuple(t) for t in dataset.median_numel_shapes(flat=True)]
    else:
        opt_shapes_flat = input_flattener.flatten(opt_shapes)

    # handle legacy max_batch_size
    if max_batch_size is not None:
        min_shapes_flat = [(1,) + s[1:] for s in min_shapes_flat]
        max_shapes_flat = [(max_batch_size,) + s[1:] for s in max_shapes_flat]

    dynamic_axes_flat = infer_dynamic_axes(min_shapes_flat, max_shapes_flat)
    
    if default_device_type == trt.DeviceType.DLA:
        for value in dynamic_axes_flat:
            if len(value) > 0:
                raise ValueError('Dataset cannot have multiple shapes when using DLA')

    logger = trt.Logger(log_level)
    builder = trt.Builder(logger)
    config = builder.create_builder_config()

    if input_names is None:
        input_names = default_input_names(input_flattener.size)
    if output_names is None:
        output_names = default_output_names(output_flattener.size)

    if use_onnx:
        import onnx_graphsurgeon as gs
        import onnx
        
        module_flat = Flatten(module, input_flattener, output_flattener)
        inputs_flat = input_flattener.flatten(inputs)

        f = io.BytesIO()
        torch.onnx.export(
            module_flat, 
            inputs_flat, 
            f, 
            input_names=input_names, 
            output_names=output_names,
            dynamic_axes={
                name: {int(axis): 'axis_%d' % axis for axis in dynamic_axes_flat[index]}
                for index, name in enumerate(input_names)
            },
            opset_version=onnx_opset
        )
        f.seek(0)
        
        onnx_graph = gs.import_onnx(onnx.load(f))
        onnx_graph.fold_constants().cleanup()


        f = io.BytesIO()
        onnx.save(gs.export_onnx(onnx_graph), f)
        f.seek(0)

        onnx_bytes = f.read()
        network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
        parser = trt.OnnxParser(network, logger)
        parser.parse(onnx_bytes)

    else:
        network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
        with ConversionContext(network, torch2trt_kwargs=kwargs, builder_config=config, logger=logger) as ctx:
            
            inputs_flat = input_flattener.flatten(inputs)

            ctx.add_inputs(inputs_flat, input_names, dynamic_axes=dynamic_axes_flat)

            outputs = module(*inputs)

            outputs_flat = output_flattener.flatten(outputs)
            ctx.mark_outputs(outputs_flat, output_names)

    # set max workspace size
    config.max_workspace_size = max_workspace_size

    if fp16_mode:
        config.set_flag(trt.BuilderFlag.FP16)

    config.default_device_type = default_device_type
    if gpu_fallback:
        config.set_flag(trt.BuilderFlag.GPU_FALLBACK)
    config.DLA_core = dla_core
    
    if strict_type_constraints:
        config.set_flag(trt.BuilderFlag.STRICT_TYPES)

    if int8_mode:

        # default to use input tensors for calibration
        if int8_calib_dataset is None:
            int8_calib_dataset = dataset

        config.set_flag(trt.BuilderFlag.INT8)

        #Making sure not to run calibration with QAT mode on
        if not 'qat_mode' in kwargs:
            calibrator = DatasetCalibrator(
                int8_calib_dataset, algorithm=int8_calib_algorithm
            )
            config.int8_calibrator = calibrator

    # OPTIMIZATION PROFILE
    profile = builder.create_optimization_profile()
    for index, name in enumerate(input_names):
        profile.set_shape(
            name,
            min_shapes_flat[index],
            opt_shapes_flat[index],
            max_shapes_flat[index]
        )
    config.add_optimization_profile(profile)

    if int8_mode:
        config.set_calibration_profile(profile)

    # BUILD ENGINE

    engine = builder.build_engine(network, config)

    module_trt = TRTModule(engine, input_names, output_names, input_flattener=input_flattener, output_flattener=output_flattener)

    if keep_network:
        module_trt.network = network

    return module_trt

Thank you very much!!

Hi,

Since Orin Nano variance is added from TensorRT 8.5, it’s recommended to stay on JetPack 5.1.

Could you share how you install the PyTorch package?
If you are not following the doc below, please give it a try.

If no luck with the above installation (JP5.1+our prebuilt PyTorch), could you share your model and the script for us to test?

For the freeze issue, could you try if the app can be terminated with control-C?
Or force killing with the PID?

Thanks.

Hello again!

Starting from the second question, as for the freeze, I cannot shutdown the process with shortcut Ctrl+C or by killing the process with the PID since totally no operation can be done: no keyboard or mouse actions are no more allowed. Hardware is completely freezed. I’m sorry…

As for the first question, I have used exactly that documentation to install Pytorch. And the Pytorch wheel is the one mentioned in the first comment. I will attach the files for the model optimization, but for the model it exceeds the file size limit (186MB) even if it is zipped… How can I share it with you?
nnHardwareOptimization.py (3.4 KB)
object_pose.json (4.2 KB)

Hi,

Is it possible to ssh the system and terminate the app?

We just release the PyTorch that was built with JetPack 5.1 last weekend.
Would you mind testing if the same issue still occurs?

Thanks.

Hi!
As soon as I’ll collect those data and tests I will come back again to you!
Thank you very much!

Is this still an issue to support? Any result can be shared? Thanks

Hello!
I didn’t tried it yet. I have prioritized other stuff. If it is the case, you can close the topic and as soon as I’ll do further test and some issue occurs I will post again.
Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.