Issue while running train_id.sh in Person Re-Identification code

Pritam · September 19, 2023, 6:55am

Hi Team,

I want to test the Person Re-Identification and I am using below code link.

I did the complete setup as mentioned in the README.

I am facing issue while running bash train_id.sh

I am using dGPU (2080TI) machine.

Below is the error.

root@smarg:~/data/Pritam/PersonREID/reid# bash train_id.sh
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2023-09-19 06:01:56.078355: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2023-09-19 06:01:56.103458: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2023-09-19 06:01:56.104983: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x326c010 executing computations on platform Host. Devices:
2023-09-19 06:01:56.105003: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/control_flow_ops.py:3632: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
n_labels:  702
False
excluding block block1
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Initializing new variables..
[<tf.Variable 'resnet_v1_50/conv1/weights:0' shape=(7, 7, 3, 64) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block1/unit_1/bottleneck_v1/shortcut/weights:0' shape=(1, 1, 64, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights:0' shape=(1, 1, 64, 64) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block1/unit_1/bottleneck_v1/conv2/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block1/unit_1/bottleneck_v1/conv3/weights:0' shape=(1, 1, 64, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block1/unit_2/bottleneck_v1/conv1/weights:0' shape=(1, 1, 256, 64) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block1/unit_2/bottleneck_v1/conv2/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block1/unit_2/bottleneck_v1/conv3/weights:0' shape=(1, 1, 64, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block1/unit_3/bottleneck_v1/conv1/weights:0' shape=(1, 1, 256, 64) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block1/unit_3/bottleneck_v1/conv2/weights:0' shape=(3, 3, 64, 64) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block1/unit_3/bottleneck_v1/conv3/weights:0' shape=(1, 1, 64, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_1/bottleneck_v1/shortcut/weights:0' shape=(1, 1, 256, 512) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_1/bottleneck_v1/conv1/weights:0' shape=(1, 1, 256, 128) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_1/bottleneck_v1/conv2/weights:0' shape=(3, 3, 128, 128) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_1/bottleneck_v1/conv3/weights:0' shape=(1, 1, 128, 512) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_2/bottleneck_v1/conv1/weights:0' shape=(1, 1, 512, 128) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_2/bottleneck_v1/conv2/weights:0' shape=(3, 3, 128, 128) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_2/bottleneck_v1/conv3/weights:0' shape=(1, 1, 128, 512) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_3/bottleneck_v1/conv1/weights:0' shape=(1, 1, 512, 128) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_3/bottleneck_v1/conv2/weights:0' shape=(3, 3, 128, 128) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_3/bottleneck_v1/conv3/weights:0' shape=(1, 1, 128, 512) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_4/bottleneck_v1/conv1/weights:0' shape=(1, 1, 512, 128) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_4/bottleneck_v1/conv2/weights:0' shape=(3, 3, 128, 128) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block2/unit_4/bottleneck_v1/conv3/weights:0' shape=(1, 1, 128, 512) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_1/bottleneck_v1/shortcut/weights:0' shape=(1, 1, 512, 1024) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_1/bottleneck_v1/conv1/weights:0' shape=(1, 1, 512, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_1/bottleneck_v1/conv2/weights:0' shape=(3, 3, 256, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_1/bottleneck_v1/conv3/weights:0' shape=(1, 1, 256, 1024) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_2/bottleneck_v1/conv1/weights:0' shape=(1, 1, 1024, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_2/bottleneck_v1/conv2/weights:0' shape=(3, 3, 256, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_2/bottleneck_v1/conv3/weights:0' shape=(1, 1, 256, 1024) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_3/bottleneck_v1/conv1/weights:0' shape=(1, 1, 1024, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_3/bottleneck_v1/conv2/weights:0' shape=(3, 3, 256, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_3/bottleneck_v1/conv3/weights:0' shape=(1, 1, 256, 1024) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_4/bottleneck_v1/conv1/weights:0' shape=(1, 1, 1024, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_4/bottleneck_v1/conv2/weights:0' shape=(3, 3, 256, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_4/bottleneck_v1/conv3/weights:0' shape=(1, 1, 256, 1024) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_5/bottleneck_v1/conv1/weights:0' shape=(1, 1, 1024, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_5/bottleneck_v1/conv2/weights:0' shape=(3, 3, 256, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_5/bottleneck_v1/conv3/weights:0' shape=(1, 1, 256, 1024) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_6/bottleneck_v1/conv1/weights:0' shape=(1, 1, 1024, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_6/bottleneck_v1/conv2/weights:0' shape=(3, 3, 256, 256) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block3/unit_6/bottleneck_v1/conv3/weights:0' shape=(1, 1, 256, 1024) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block4/unit_1/bottleneck_v1/shortcut/weights:0' shape=(1, 1, 1024, 2048) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block4/unit_1/bottleneck_v1/conv1/weights:0' shape=(1, 1, 1024, 512) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block4/unit_1/bottleneck_v1/conv2/weights:0' shape=(3, 3, 512, 512) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block4/unit_1/bottleneck_v1/conv3/weights:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block4/unit_2/bottleneck_v1/conv1/weights:0' shape=(1, 1, 2048, 512) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block4/unit_2/bottleneck_v1/conv2/weights:0' shape=(3, 3, 512, 512) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block4/unit_2/bottleneck_v1/conv3/weights:0' shape=(1, 1, 512, 2048) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block4/unit_3/bottleneck_v1/conv1/weights:0' shape=(1, 1, 2048, 512) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block4/unit_3/bottleneck_v1/conv2/weights:0' shape=(3, 3, 512, 512) dtype=float32_ref>,
 <tf.Variable 'resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/weights:0' shape=(1, 1, 512, 2048) dtype=float32_ref>]
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Starting training....
2023-09-19 06:02:02.065943: W tensorflow/core/framework/allocator.cc:124] Allocation of 67108864 exceeds 10% of system memory.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 96128 values, but the requested shape requires a multiple of 702
	 [[{{node Reshape}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main_id.py", line 55, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "main_id.py", line 51, in main
    num_epochs=FLAGS.num_epochs)
  File "/root/data/Pritam/PersonREID/reid/model_id.py", line 139, in train
    {self.is_train: True})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 96128 values, but the requested shape requires a multiple of 702
	 [[node Reshape (defined at /root/data/Pritam/PersonREID/reid/model_id.py:29) ]]

Caused by op 'Reshape', defined at:
  File "main_id.py", line 55, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "main_id.py", line 41, in main
    write_images=FLAGS.write_images)
  File "/root/data/Pritam/PersonREID/reid/model_id.py", line 29, in __init__
    self.labels = tf.reshape(self.labels, [-1, self.n_labels])
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 7179, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 96128 values, but the requested shape requires a multiple of 702
	 [[node Reshape (defined at /root/data/Pritam/PersonREID/reid/model_id.py:29) ]]

Please help me in resolving this issue.

Thanks.

Morganh · September 19, 2023, 7:11am

The github is not maintained by Nvidia. For Re-identification in TAO, please refer to Re-Identification - NVIDIA Docs. Thanks.

Pritam · September 19, 2023, 7:23am

Okay Thanks.

Is there any code or script to test the re-identification model trained on TAO?

Morganh · September 19, 2023, 7:31am

You can refer to ReIdentificationNet - NVIDIA Docs

Also, for inference in pytorch, you can also refer to https://github.com/NVIDIA/tao_pytorch_backend/blob/main/nvidia_tao_pytorch/cv/re_identification/scripts/inference.py

More info can be found in
Re-Identification | NVIDIA NGC.

Pritam · September 19, 2023, 7:34am

Thanks @Morganh

Pritam · September 19, 2023, 1:27pm

Hi @Morganh

I am trying to train model Re-Identification model using TAO.

While running below command.

print("Train model")
!tao re_identification train \
                  -e $SPECS_DIR/experiment_market1501.yaml \
                  -r $RESULTS_DIR/market1501 \
                  -k $KEY

I am getting below issue.

Train model
2023-09-19 18:50:16,867 [INFO] root: Registry: ['nvcr.io']
2023-09-19 18:50:16,907 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
2023-09-19 18:50:16,924 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/smarg/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
[NeMo W 2023-09-19 13:20:23 nemo_logging:349] <frozen cv.re_identification.scripts.train>:91: UserWarning: 
    'experiment_market1501.yaml' is validated against ConfigStore schema with the same name.
    This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
    See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
    
Error merging 'experiment_market1501.yaml' with schema
Key 'results_dir'' not in 'ReIDTrainExpConfig'
    full_key: results_dir'
    object_type=ReIDTrainExpConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Execution status: FAIL
2023-09-19 18:50:29,009 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Please help.

Thanks.

Morganh · September 19, 2023, 1:44pm

Please refer to https://github.com/NVIDIA/tao_tutorials/blob/main/notebooks/tao_launcher_starter_kit/re_identification_net/specs/experiment_market1501.yaml.
Can you share your yaml file?

Pritam · September 19, 2023, 1:48pm

Hi @Morganh ,

I am using default file comes with getting_started_v5.0.0
Below is the file content.

results_dir': "/results/market1501"
encryption_key: nvidia_tao
model:
  backbone: resnet_50
  last_stride: 1
  pretrain_choice: imagenet
  pretrained_model_path: "/model/market1501/resnet50_pretrained.pth"
  input_channels: 3
  input_width: 128
  input_height: 256
  neck: bnneck
  feat_dim: 256
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: True
dataset:
  train_dataset_dir: "/data/market1501/sample_train"
  test_dataset_dir: "/data/market1501/sample_test"
  query_dataset_dir: "/data/market1501/sample_query"
  num_classes: 100
  batch_size: 64
  val_batch_size: 128
  num_workers: 1
  pixel_mean: [0.485, 0.456, 0.406]
  pixel_std: [0.226, 0.226, 0.226]
  padding: 10
  prob: 0.5
  re_prob: 0.5
  sampler: softmax_triplet
  num_instances: 4
re_ranking:
  re_ranking: True
  k1: 20
  k2: 6
  lambda_value: 0.3
train:
  optim:
    name: Adam
    steps: [40, 70]
    gamma: 0.1
    bias_lr_factor: 1
    weight_decay: 0.0005
    weight_decay_bias: 0.0005
    warmup_factor: 0.01
    warmup_iters: 10
    warmup_method: linear
    base_lr: 0.00035
    momentum: 0.9
    center_loss_weight: 0.0005
    center_lr: 0.5
    triplet_loss_margin: 0.3
  num_epochs: 120
  checkpoint_interval: 10

Thanks.

Morganh · September 19, 2023, 1:57pm

Which TAO version did you run?

Pritam · September 19, 2023, 2:08pm

Below is the TAO details.

Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit’]
format_version: 2.0
toolkit_version: 4.0.1
published_date: 03/06/2023

Morganh · September 19, 2023, 2:28pm

Please update to TAO5.0. The new yaml file is compatible with TAO 5.0.

Morganh · September 19, 2023, 2:31pm

Or you can download 4.0.1 notebook and use its yaml file instead if you are still going to use TAO 4.0.1.

Pritam · September 20, 2023, 9:17am

Hi @Morganh

Thanks for the help.

Is there any python script where I can test the generated etlt file of re-identification model?

Morganh · September 20, 2023, 9:20am

There is not. You can leverage https://github.com/NVIDIA/tao_pytorch_backend/blob/main/nvidia_tao_pytorch/cv/re_identification/scripts/inference.py.

Pritam · September 28, 2023, 9:43am

Hi @Morganh and Team,

How we can convert resnet50_market1501_model.etlt model file to engine file?

There is no command mention to convert reid model to engine file in tao reid sample notebook.

Please suggest how to convert reid etlt model file to engine.

or can we use etlt file directly for the inference.

below is the code

import imp
import tensorrt as trt
import pycuda.driver as cuda
import numpy as np
from PIL import Image,ImageDraw
import cv2
import numpy

#import pycuda.autoinit 


fire_engine_file_path = './REID_MODELS/resnet50_market1501_model.etlt'
image_path = './images/Img1.jpg'


TRT_LOGGER = trt.Logger(trt.Logger.INTERNAL_ERROR)
trt_runtime = trt.Runtime(TRT_LOGGER)

def allocate_buffers(engine, batch_size, data_type):
   cuda.init()
   device = cuda.Device(0)  # enter your Gpu id here
   ctx = device.make_context()
   h_input_1 = cuda.pagelocked_empty(batch_size * trt.volume(engine.get_binding_shape(0)), dtype=trt.nptype(data_type))
   h_output = cuda.pagelocked_empty(batch_size * trt.volume(engine.get_binding_shape(1)), dtype=trt.nptype(data_type))
   d_input_1 = cuda.mem_alloc(h_input_1.nbytes)
   d_output = cuda.mem_alloc(h_output.nbytes)
   stream = cuda.Stream()
   return h_input_1, d_input_1, h_output, d_output, stream

def load_engine(trt_runtime, engine_path):
   with open(engine_path, 'rb') as f:
       engine_data = f.read()
   engine = trt_runtime.deserialize_cuda_engine(engine_data)
   return engine

def load_images_to_buffer(pics, pagelocked_buffer):
   preprocessed = np.asarray(pics).ravel()
   np.copyto(pagelocked_buffer, preprocessed) 

def do_inference(engine, pics_1, h_input_1, d_input_1, h_output, d_output, stream, batch_size, height, width):

   image = np.asarray(pics_1.resize((height, width), Image.ANTIALIAS)).transpose([2, 0, 1]).astype(trt.nptype(trt.float32)).ravel()
   np.copyto(h_input_1, image.ravel())

   with engine.create_execution_context() as context:
       context.debug_sync = False
       # Transfer input data to the GPU.
       cuda.memcpy_htod_async(d_input_1, h_input_1, stream)
       context.execute(batch_size=1, bindings=[int(d_input_1), int(d_output)])
       #print('Transfer predictions back from the GPU.')
       # Transfer predictions back from the GPU.
       cuda.memcpy_dtoh_async(h_output, d_output, stream)
       # Synchronize the stream
       stream.synchronize()

       out = h_output.reshape((1,-1))
       return out

engine = load_engine(trt_runtime, fire_engine_file_path)
h_input, d_input, h_output, d_output, stream = allocate_buffers(engine, 1, trt.float32)


opencv_image = cv2.imread(image_path)
embedding_vec = do_inference(engine, opencv_image, h_input, d_input, h_output, d_output, stream, 1, 224, 224)

print(embedding_vec)

Thanks

Morganh · September 29, 2023, 3:55pm

You can export to onnx model and then user trtexec to generate tensorrt engine.
Refer to TRTEXEC with ReIdentificationNet - NVIDIA Docs.

Pritam · October 4, 2023, 10:14am

Hi @Morganh

While converting tlt to onnx model. I am getting below issue.

Please help me out.

Command

!tao re_identification export \
                   -e $SPECS_DIR/experiment_market1501.yaml \
                   -r $RESULTS_DIR/market1501 \
                   -k $KEY \
                   export.checkpoint=$RESULTS_DIR/market1501/train/resnet50_market1501_model.tlt \
                   export.onnx_file=$RESULTS_DIR/market1501/export/resnet50_market1501_model.onnx

Issue:

2023-10-04 15:41:35,986 [INFO] root: Registry: ['nvcr.io']
2023-10-04 15:41:36,218 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt
2023-10-04 15:41:36,409 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/smarg/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
ANTLR runtime and generated code versions disagree: 4.8!=4.9.3
[NeMo W 2023-10-04 10:11:50 nemo_logging:349] <frozen cv.re_identification.scripts.export>:110: UserWarning: 
    'experiment_market1501.yaml' is validated against ConfigStore schema with the same name.
    This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
    See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
    
Error merging 'experiment_market1501.yaml' with schema
Key 'results_dir'' not in 'ReIDExportExpConfig'
    full_key: results_dir'
    object_type=ReIDExportExpConfig

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Execution status: FAIL
2023-10-04 15:42:00,449 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Thanks

Morganh · October 4, 2023, 4:15pm

Which TAO version did you use?

Pritam · October 5, 2023, 6:22am

Hi @Morganh

I am using TAO version 4.0

Configuration of the TAO Toolkit Instance
dockers: [‘nvidia/tao/tao-toolkit’]
format_version: 2.0
toolkit_version: 4.0.1
published_date: 03/06/2023

While updating tao version 5
I am getting below issue.

pip3 install nvidia-tao==5.0
Collecting nvidia-tao==5.0
  Using cached nvidia_tao-5.0.0-py3-none-any.whl (35 kB)
Requirement already satisfied: urllib3<2.0.0,>=1.26.15 in /home/smarg/miniconda3/envs/launcher/lib/python3.6/site-packages (from nvidia-tao==5.0) (1.26.16)
Requirement already satisfied: docker-pycreds==0.4.0 in /home/smarg/miniconda3/envs/launcher/lib/python3.6/site-packages (from nvidia-tao==5.0) (0.4.0)
Collecting certifi>=2022.12.07
  Using cached certifi-2023.7.22-py3-none-any.whl (158 kB)
Requirement already satisfied: idna==2.10 in /home/smarg/miniconda3/envs/launcher/lib/python3.6/site-packages (from nvidia-tao==5.0) (2.10)
Requirement already satisfied: six==1.15.0 in /home/smarg/miniconda3/envs/launcher/lib/python3.6/site-packages (from nvidia-tao==5.0) (1.15.0)
ERROR: Could not find a version that satisfies the requirement requests>=2.31.0 (from nvidia-tao) (from versions: 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.4.0, 0.4.1, 0.5.0, 0.5.1, 0.6.0, 0.6.1, 0.6.2, 0.6.3, 0.6.4, 0.6.5, 0.6.6, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.7.4, 0.7.5, 0.7.6, 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.8.4, 0.8.5, 0.8.6, 0.8.7, 0.8.8, 0.8.9, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.10.0, 0.10.1, 0.10.2, 0.10.3, 0.10.4, 0.10.6, 0.10.7, 0.10.8, 0.11.1, 0.11.2, 0.12.0, 0.12.1, 0.13.0, 0.13.1, 0.13.2, 0.13.3, 0.13.4, 0.13.5, 0.13.6, 0.13.7, 0.13.8, 0.13.9, 0.14.0, 0.14.1, 0.14.2, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.1.0, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.2.1, 2.3.0, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.7.0, 2.8.0, 2.8.1, 2.9.0, 2.9.1, 2.9.2, 2.10.0, 2.11.0, 2.11.1, 2.12.0, 2.12.1, 2.12.2, 2.12.3, 2.12.4, 2.12.5, 2.13.0, 2.14.0, 2.14.1, 2.14.2, 2.15.1, 2.16.0, 2.16.1, 2.16.2, 2.16.3, 2.16.4, 2.16.5, 2.17.0, 2.17.1, 2.17.2, 2.17.3, 2.18.0, 2.18.1, 2.18.2, 2.18.3, 2.18.4, 2.19.0, 2.19.1, 2.20.0, 2.20.1, 2.21.0, 2.22.0, 2.23.0, 2.24.0, 2.25.0, 2.25.1, 2.26.0, 2.27.0, 2.27.1)
ERROR: No matching distribution found for requests>=2.31.0

I had also tried with python3.10 -m pip install nvidia-tao==5 but while running tao
i am getting below result.

(launcher) smarg@smarg:~/Documents/PritamDocsData/PyTorch/tao_pytorch_backend$ sudo python3.10 -m pip install nvidia-tao==5.0
Collecting nvidia-tao==5.0
  Using cached nvidia_tao-5.0.0-py3-none-any.whl (35 kB)
Requirement already satisfied: docker-pycreds==0.4.0 in /usr/local/lib/python3.10/dist-packages (from nvidia-tao==5.0) (0.4.0)
Requirement already satisfied: docker==4.3.1 in /usr/local/lib/python3.10/dist-packages (from nvidia-tao==5.0) (4.3.1)
Requirement already satisfied: six==1.15.0 in /usr/local/lib/python3.10/dist-packages (from nvidia-tao==5.0) (1.15.0)
Requirement already satisfied: chardet==3.0.4 in /usr/local/lib/python3.10/dist-packages (from nvidia-tao==5.0) (3.0.4)
Requirement already satisfied: idna==2.10 in /usr/local/lib/python3.10/dist-packages (from nvidia-tao==5.0) (2.10)
Requirement already satisfied: requests>=2.31.0 in /usr/local/lib/python3.10/dist-packages (from nvidia-tao==5.0) (2.31.0)
Requirement already satisfied: tabulate==0.8.7 in /usr/local/lib/python3.10/dist-packages (from nvidia-tao==5.0) (0.8.7)
Requirement already satisfied: certifi>=2022.12.07 in /usr/local/lib/python3.10/dist-packages (from nvidia-tao==5.0) (2023.7.22)
Requirement already satisfied: websocket-client==0.57.0 in /usr/local/lib/python3.10/dist-packages (from nvidia-tao==5.0) (0.57.0)
Requirement already satisfied: urllib3<2.0.0,>=1.26.15 in /usr/local/lib/python3.10/dist-packages (from nvidia-tao==5.0) (1.26.16)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.31.0->nvidia-tao==5.0) (3.2.0)
Installing collected packages: nvidia-tao
Successfully installed nvidia-tao-5.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
(launcher) smarg@smarg:~/Documents/PritamDocsData/PyTorch/tao_pytorch_backend$ tao
bash: /home/smarg/miniconda3/envs/launcher/bin/tao: No such file or directory

Thanks.

rferrandis · October 5, 2023, 6:46am

Hi,

With python 3.10 it gave me some errors, did you try an environment with Python 3.7? It worked for me.

Topic		Replies	Views
Wrong output of Person REID model with custom python script TAO Toolkit tensorrt , inception	6	845	October 25, 2023
The effect is very poor when converted to trt TAO Toolkit tensorrt , ubuntu	61	2049	September 11, 2023
TAO re_identification export failure TAO Toolkit	5	552	September 26, 2023
Re-Identification training got stop automatically TAO Toolkit	21	267	September 23, 2024
Not generating PersonREID output tlt file even after training is finished TAO Toolkit inception	2	461	April 14, 2024
Tao training - Visualise inference after training provides 98% accuracy, however, after model export to TensorRT, the inference result is 0% TAO Toolkit	5	692	March 12, 2022
Tao-converter [ERROR] Failed to parse the model, please check the encoding key to make sure its correct TAO Toolkit deepstream	70	2528	July 10, 2023
Does TAO 5.0 support exporting a model trained by TAO 3.0? TAO Toolkit jetson , deepstream	22	331	January 6, 2026
Cannot use TensorRT model exported by NVIDIA TAO TAO Toolkit	8	1301	May 17, 2022
Inference with tensorrt engine file has different results compared with trained hdf5 model TAO Toolkit	9	349	July 8, 2024

Issue while running train_id.sh in Person Re-Identification code

Related topics