Error exporting with post-processing quantization

Hi,

I’m trying to export a RetinaNet model trained using TLT 2.0 GA, to a Jetson AGX Xavier. I trained this model with qat enabled.

I’m able to export it successfully using the following command:

tlt-export retinanet -e configs/train_spec/retinanet.txt -k $KEY -m experiments/retinanet/weights/retinanet_resnet_epoch_200.tlt --data_type int8 -s --cal_image_dir data/val/images -o experiments/retinanet/model.etlt --cal_cache_file experiments/retinanet/cal.bin

However, when I try to export using the --force_ptq flag, as with the following command:

tlt-export retinanet -e configs/train_spec/retinanet.txt -k $KEY -m experiments/retinanet/weights/retinanet_resnet_epoch_200.tlt --data_type int8 -s --cal_image_dir data/val/images -o experiments/retinanet/model.etlt --cal_cache_file experiments/retinanet/cal.bin --force_ptq

I’m getting the following error:

Traceback (most recent call last):
  File "/usr/local/bin/tlt-export", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py", line 185, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/app.py", line 263, in run_export
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/base_exporter.py", line 478, in export
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/base_exporter.py", line 206, in get_calibrator
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/export/base_exporter.py", line 309, in generate_tensor_file
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/715c8bafe7816f3bb6f309cd506049bb/execroot/ai_infra/bazel-out/k8-py3-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/export/data.py", line 54, in __init__
  File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 148, in make_fid
    fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 98, in h5py.h5f.create
ValueError: Invalid file name (invalid file name)

Am I missing something? Or is it an unexpected behaviour?

Thanks!

I run a quick test based one model which is trained with detectnet_v2 resnet10.
But it is not reproduced.

Could you please double check?