TAO converter happened some bugs

Please provide the following information when requesting support.

• Hardware (RTX5000)
• Network Type (Classification )
• TLT Version ( I don’t get,but tf is 15.5)
• Training spec file(If have, please share here)

model_config {
  arch: "resnet",
  n_layers: 18
  # Setting these parameters to true to match the template downloaded from NGC.
  use_batch_norm: true
  all_projections: true
  freeze_blocks: 0
  freeze_blocks: 1
  input_image_size: "3,3072,2048"
}
train_config {
  train_dataset_path: "/workspace/tao-experiments/data/acSplit/train"
  val_dataset_path: "/workspace/tao-experiments/data/acSplit/val"
  pretrained_model_path: "/workspace/tao-experiments/classification/pretrained_resnet18/pretrained_classification_vresnet18/resnet_18.hdf5"
  optimizer {
    sgd {
    lr: 0.01
    decay: 0.0
    momentum: 0.9
    nesterov: False
  }
}
  batch_size_per_gpu: 16
  n_epochs: 80
  n_workers: 16
  preprocess_mode: "caffe"
  enable_random_crop: True
  enable_center_crop: True
  label_smoothing: 0.0
  mixup_alpha: 0.1
  # regularizer
  reg_config {
    type: "L2"
    scope: "Conv2D,Dense"
    weight_decay: 0.00005
  }

  # learning_rate
  lr_config {
    step {
      learning_rate: 0.006
      step_size: 10
      gamma: 0.1
    }
  }
}
eval_config {
  eval_dataset_path: "/workspace/tao-experiments/data/acSplit/test"
  model_path: "/workspace/tao-experiments/classification/output/weights/resnet_080.tlt"
  top_k: 3
  batch_size: 256
  n_workers: 8
  enable_center_crop: True
}

• How to reproduce the issue ?

!tao converter $USER_EXPERIMENT_DIR/export18/resnet18_final_batch1.etlt\
               -k $KEY \
               -c $USER_EXPERIMENT_DIR/export18/final_model_int8_cache.bin \
               -o predictions/Softmax \
               -d 3,3072,2048 \
               -i nchw \
               -m 64 -t int8 \
               -e $USER_EXPERIMENT_DIR/export18/final_trt_model.trt \
               -b 64

This cause some problem,could someone response this?

Please try to set lower -m .

this is a new stop

!tao converter $USER_EXPERIMENT_DIR/export18/resnet18_final_batch1.etlt\
               -k $KEY \
               -c $USER_EXPERIMENT_DIR/export18/final_model_int8_cache.bin \
               -o predictions/Softmax \
               -d 3,3072,2048 \
               -i nchw \
               -m 8 -t int8 \
               -e $USER_EXPERIMENT_DIR/export18/final_trt_model.trt \
               -b 64

Please try to add "-w 1000000000 " .

[0x562861cde620]:151 :ScratchObject in storeCachedObject: at optimizer/gpu/cudnn/convolutionBuilder.cpp: 168 idx: 30 time: 8.47e-07
-------------- The current device memory allocations dump as below --------------
[0]:34359738368 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 246 idx: 14 time: 8.8563e-05
[0x302000000]:16777216 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 246 idx: 9 time: 0.000347891
[0x7fced4000000]:4831839232 :DeviceActivationSize in reserveNetworkTensorMemory: at optimizer/common/tactic/optimizer.cpp: 4603 idx: 8 time: 0.007254
[WARNING] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[WARNING] Skipping tactic 4 due to insuficient memory on requested size of 34359738368 detected for tactic -4420849921117327522.
Try decreasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().
[ERROR] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node conv1/convolution.)
[ERROR] Unable to create engine
2022-09-29 19:10:38,263 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

command is

!tao converter $USER_EXPERIMENT_DIR/export18/resnet18_final_batch1.etlt\
               -k $KEY \
               -c $USER_EXPERIMENT_DIR/export18/final_model_int8_cache.bin \
               -o predictions/Softmax \
               -d 3,3072,2048 \
               -i nchw \
               -m 8 -t int8 \
               -e $USER_EXPERIMENT_DIR/export18/final_trt_model.trt \
               -b 64 -w 1000000000\

Please try to set a lower -m.

Hi
I try to set -m to 1, and the command is :

!tao converter $USER_EXPERIMENT_DIR/export18/resnet18_final_batch1.etlt\
               -k $KEY \
               -c $USER_EXPERIMENT_DIR/export18/final_model_int8_cache.bin \
               -o predictions/Softmax \
               -d 3,3072,2048 \
               -i nchw \
               -m 1 -t int8 \
               -e $USER_EXPERIMENT_DIR/export18/final_trt_model.trt \
               -b 64 -w 1000000000\

And the print show the stop flag

[0x55b83cb5d2e0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 15 time: 1.15e-07
[0x55b83f1783b0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 22 time: 1.64e-07
[0x55b8417a7380]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 25 time: 8.4e-08
[0x55b83be4eeb0]:4 :: weight scales in internalAllocate: at runtime/common/weightsPtr.cpp: 100 idx: 6 time: 7.8e-08
-------------- The current device memory allocations dump as below --------------
[0]:34359738368 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 246 idx: 14 time: 9.3234e-05
[0x302000000]:16777216 :HybridGlobWriter in reserveRegion: at optimizer/common/globWriter.cpp: 246 idx: 9 time: 0.000369565
[0x7fdf28000000]:4831839232 :DeviceActivationSize in reserveNetworkTensorMemory: at optimizer/common/tactic/optimizer.cpp: 4603 idx: 8 time: 0.00733354
[WARNING] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[WARNING] Skipping tactic 4 due to insuficient memory on requested size of 34359738368 detected for tactic -4420849921117327522.
Try decreasing the workspace size with IBuilderConfig::setMaxWorkspaceSize().
[ERROR] 10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node conv1/convolution.)
[ERROR] Unable to create engine
2022-09-30 09:49:48,633 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Can you set -b to 1 as well?
More, how about fp16 mode?

BTW, for RTX5000, its GPU Memory is16 GB. So it cannot meet the GPU memory (34359738368 bytes) which mentioned in the log.

root@df5f93481c11:/workspace# converter -h
usage: converter [-h] [-e ENGINE_FILE_PATH]
        [-k ENCODE_KEY] [-c CACHE_FILE]
        [-o OUTPUTS] [-d INPUT_DIMENSIONS]
        [-b BATCH_SIZE] [-m MAX_BATCH_SIZE]
        [-w MAX_WORKSPACE_SIZE] [-t DATA_TYPE]
        [-i INPUT_ORDER] [-s] [-u DLA_CORE]
        input_file

Generate TensorRT engine from exported model

positional arguments:
  input_file            Input file (.etlt exported model).

required flag arguments:
  -d            comma separated list of input dimensions(not required for TLT 3.0 new models).
  -k            model encoding key.

optional flag arguments:
  -b            calibration batch size (default 8).
  -c            calibration cache file (default cal.bin).
  -e            file the engine is saved to (default saved.engine).
  -i            input dimension ordering -- nchw, nhwc, nc (default nchw).
  -m            maximum TensorRT engine batch size (default 16). If meet with out-of-memory issue, please decrease the batch size accordingly.
  -o            comma separated list of output node names (default none).
  -p            comma separated list of optimization profile shapes in the format <input_name>,<min_shape>,<opt_shape>,<max_shape>, where each shape has `x` as delimiter, e.g., NxC, NxCxHxW, NxCxDxHxW, etc. Can be specified multiple times if there are multiple input tensors for the model. This argument is only useful in dynamic shape case.
  -s            TensorRT strict_type_constraints flag for INT8 mode(default false).
  -t            TensorRT data type -- fp32, fp16, int8 (default fp32).
  -u            Use DLA core N for layers that support DLA(default = -1, which means no DLA core will be utilized for inference. Note that it'll always allow GPU fallback).
  -w            maximum workspace size of TensorRT engine (default 1<<30). If meet with out-of-memory issue, please increase the workspace size accordingly.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.