Installing Tao-converter and running: Where is the "Encoding key" of FPEnet?

erence · June 18, 2022, 12:08pm

Hi. I want to convert the fpenet model.tlt to trt engine and use it in a deepstream python environment on jetson nx. I have a Ubuntu 22.04 LTS 64-bit os with a GeForce RTX 2070 and Intel® Core™ i7-9750H CPU @ 2.60GHz × 12

I have my ngc api-key, I have the virtual env named launcher. Tao is installed in virtual env “launcher” and working.

Now I want to install tao-converter- I have downloaded the " cuda113-cudnn80-trt72" ( My sytstem has cuda 11.6 and no cudnn yet. But shouldnt I be able to install tao-converter? I have the file downloaded, but even after chod, the file tao-coverter cannot be run, or tao-converter canot be installed

Is there any way to install tao-converter? (If easy preferable)

erence · June 18, 2022, 2:26pm

$chmod + /filepath/to/filename
makes the file executeable

to execute it in linux, I had to type $./file -h

As I wanted to convert the etlt file to trt engine and use it on jetson, I also had to convert it on jetson nx and not on x86 machine(info from forums) … So the tao-converter can run…But what how do I know the Encoding key of FPEnet…the ngc api key is not the right one isn it?

erence · June 18, 2022, 10:18pm

The key is: nvidia_tlt
and this is valid for all etlt to trt conversions…

my command is as follows now:
tao-converter -k nvidia_tlt -t fp32 -p input_face_images:0,1x1x80x80,1x1x80x80,2x1x80x80 -e /models/triton_model_repository/faciallandmarks_tlt/1/model.plan -b 1 /home/eren/FPEnet/model.etlt

but I get this error:
[ERROR] 1: Unexpected exception _Map_base::at
[ERROR] Unable to create engine

Morganh · June 21, 2022, 7:03am

Hi,
Are you downloading the correct version of tao-converter for Jetson NX?

erence · June 23, 2022, 3:48pm

Many thanks you for help… Yes I was using the true version. As I wanted to use the converted trt.engine file in my jetson nx, I had to convert it on it. (where this information was not obvious and difficult to find in NVDIA pages.)

I checked the jetpack version with “sudo apt-cache show nvidia-jetpack” and actually downloaded the latest one from " TensorRT — TAO Toolkit 3.22.05 documentation

Eventually I converted the etlt model to trt.engine with the following command:

tao-converter -k nvidia_tlt -t fp16 -p input_face_images:0,1x1x80x80,1x1x80x80,2x1x80x80 -e /target/path/folder -m 1 -w 1000000000 /path/to/etlt_file/to_be_converted/model.etlt

-w was needed for unnown reasons… others in the forum did not need that… I hope that the engine file works

Morganh · June 23, 2022, 3:55pm

Please share the full log. Thanks.

erence · June 23, 2022, 4:10pm

I tried again, without "-w " and it converted without problems… Thanks… I still share the log…(Is there better way to share log?)

$ tao-converter -k nvidia_tlt -t fp16 -p input_face_images:0,1x1x80x80,1x1x80x80,2x1x80x80 -e /home/eren/FPEnet/model.engine -m 1 /home/eren/FPEnet/model.etlt
[INFO] [MemUsageChange] Init CUDA: CPU +363, GPU +0, now: CPU 381, GPU 6718 (MiB)
[INFO] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 381 MiB, GPU 6748 MiB
[INFO] [MemUsageSnapshot] End constructing builder kernel library: CPU 486 MiB, GPU 6858 MiB
[INFO] ----------------------------------------------------------------
[INFO] Input filename: /tmp/file9da4fw
[INFO] ONNX IR version: 0.0.5
[INFO] Opset version: 10
[INFO] Producer name: tf2onnx
[INFO] Producer version: 1.6.3
[INFO] Domain:
[INFO] Model version: 0
[INFO] Doc string:
[INFO] ----------------------------------------------------------------
[WARNING] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[INFO] Detected input dimensions from the model: (-1, 1, 80, 80)
[INFO] Model has dynamic shape. Setting up optimization profiles.
[INFO] Using optimization profile min shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Using optimization profile opt shape: (1, 1, 80, 80) for input: input_face_images:0
[INFO] Using optimization profile max shape: (2, 1, 80, 80) for input: input_face_images:0
[WARNING] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
[INFO] ---------- Layers Running on DLA ----------
[INFO] ---------- Layers Running on GPU ----------
[INFO] [GpuLayer] row_indexes:0
[INFO] [GpuLayer] column_indexes:0
[INFO] [GpuLayer] block_1a_conv_1/Pad + block_1a_conv_1/BiasAdd + activation_1/Relu
[INFO] [GpuLayer] (Unnamed Layer* 7) [Identity]
[INFO] [GpuLayer] max_pooling2d_1/MaxPool
[INFO] [GpuLayer] block_2a_conv_1/Pad + block_2a_conv_1/BiasAdd + activation_2/Relu
[INFO] [GpuLayer] (Unnamed Layer* 16) [Identity]
[INFO] [GpuLayer] max_pooling2d_2/MaxPool
[INFO] [GpuLayer] block_3a_conv_1/Pad + block_3a_conv_1/BiasAdd + activation_3/Relu
[INFO] [GpuLayer] (Unnamed Layer* 25) [Identity]
[INFO] [GpuLayer] max_pooling2d_3/MaxPool
[INFO] [GpuLayer] block_4a_conv_1/Pad + block_4a_conv_1/BiasAdd + activation_4/Relu
[INFO] [GpuLayer] (Unnamed Layer* 34) [Identity]
[INFO] [GpuLayer] max_pooling2d_4/MaxPool
[INFO] [GpuLayer] block_5a_conv_1/Pad + block_5a_conv_1/BiasAdd + activation_5/Relu
[INFO] [GpuLayer] block_5a_conv_2/Pad + block_5a_conv_2/BiasAdd + activation_6/Relu
[INFO] [GpuLayer] block_5a_conv_3/convolution + activation_7/Relu
[INFO] [GpuLayer] conv2d_transpose_1/conv2d_transpose
[INFO] [GpuLayer] conv2d_transpose_1/conv2d_transpose:0 copy
[INFO] [GpuLayer] block_6a_conv_1/Pad + block_6a_conv_1/BiasAdd + activation_8/Relu
[INFO] [GpuLayer] block_6a_conv_2/convolution + activation_9/Relu
[INFO] [GpuLayer] conv2d_transpose_2/conv2d_transpose
[INFO] [GpuLayer] conv2d_transpose_2/conv2d_transpose:0 copy
[INFO] [GpuLayer] block_7a_conv_1/Pad + block_7a_conv_1/BiasAdd + activation_10/Relu
[INFO] [GpuLayer] block_7a_conv_2/convolution + activation_11/Relu
[INFO] [GpuLayer] conv2d_transpose_3/conv2d_transpose
[INFO] [GpuLayer] conv2d_transpose_3/conv2d_transpose:0 copy
[INFO] [GpuLayer] block_8a_conv_1/Pad + block_8a_conv_1/BiasAdd + activation_12/Relu
[INFO] [GpuLayer] block_8a_conv_2/convolution + activation_13/Relu
[INFO] [GpuLayer] conv2d_transpose_4/conv2d_transpose
[INFO] [GpuLayer] conv2d_transpose_4/conv2d_transpose:0 copy
[INFO] [GpuLayer] block_9a_conv_1/Pad + block_9a_conv_1/BiasAdd + activation_14/Relu
[INFO] [GpuLayer] block_9a_conv_2/convolution + activation_15/Relu
[INFO] [GpuLayer] conv_keypoints_m80/convolution
[INFO] [GpuLayer] softargmax/Max
[INFO] [GpuLayer] softargmax/Max_1
[INFO] [GpuLayer] PWN(PWN(softargmax/sub, softargmax/mul/x:0 + (Unnamed Layer* 415) [Shuffle] + softargmax/mul), softargmax/Exp)
[INFO] [GpuLayer] softargmax/Sum
[INFO] [GpuLayer] softargmax/Sum_1
[INFO] [GpuLayer] softargmax/truediv
[INFO] [GpuLayer] softargmax/mul_2
[INFO] [GpuLayer] softargmax/mul_1
[INFO] [GpuLayer] softargmax/Max_2
[INFO] [GpuLayer] softargmax/Sum_4
[INFO] [GpuLayer] softargmax/Sum_2
[INFO] [GpuLayer] softargmax/Sum_5
[INFO] [GpuLayer] softargmax/Sum_3
[INFO] [GpuLayer] softargmax/Sum_3:0 copy
[INFO] [GpuLayer] softargmax/Sum_5:0 copy
[INFO] [GpuLayer] softargmax/Max_2:0 copy
[INFO] [GpuLayer] softargmax/Squeeze
[INFO] [GpuLayer] softargmax/strided_slice_1
[INFO] [GpuLayer] softargmax/strided_slice
[INFO] [GpuLayer] softargmax/strided_slice_1__242
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +227, GPU +166, now: CPU 716, GPU 7032 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +307, GPU -601, now: CPU 1023, GPU 6431 (MiB)
[INFO] Local timing cache in use. Profiling results in this builder pass will not be stored.
[INFO] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[INFO] Detected 1 inputs and 2 output network tensors.
[INFO] Total Host Persistent Memory: 37056
[INFO] Total Device Persistent Memory: 1027072
[INFO] Total Scratch Memory: 2048000
[INFO] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 1 MiB, GPU 677 MiB
[INFO] [BlockAssignment] Algorithm ShiftNTopDown took 11.0284ms to assign 7 blocks to 51 nodes requiring 7917056 bytes.
[INFO] Total Activation Memory: 7917056
[INFO] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +3, now: CPU 1488, GPU 6992 (MiB)
[INFO] [MemUsageChange] Init cuDNN: CPU +0, GPU +0, now: CPU 1488, GPU 6992 (MiB)
[INFO] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +1, GPU +4, now: CPU 1, GPU 4 (MiB)

Morganh · June 23, 2022, 4:21pm

Thanks for the info. For sharing log, you can also click “upload” button

to attach log file.

erence · June 23, 2022, 7:05pm

I used the test.py that was used in the forums previosly, but it gave an pycude memory error. What could be the problem???

eren@erennx:~$ /home/eren/env/bin/python /home/eren/FPEnet/test.py --input facepic.jpg
Traceback (most recent call last):
File “/home/eren/FPEnet/test.py”, line 148, in
fpenet_obj = FpeNet(‘/home/eren/FPEnet/model.trt’)
File “/home/eren/FPEnet/test.py”, line 35, in init
self._allocate_buffers()
File “/home/eren/FPEnet/test.py”, line 62, in _allocate_buffers
host_mem = cuda.pagelocked_empty(size, dtype)
pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory
[06/23/2022-20:58:54] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::35] Error Code 1: Cuda Runtime (invalid argument)
Segmentation fault (core dumped)

the code is as below…

import cv2
import numpy as np
import pycuda
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt
import time

from PIL import Image

class HostDeviceMem(object):
def init(self, host_mem, device_mem):
self.host = host_mem
self.device = device_mem

def __str__(self):
    return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

def __repr__(self):
    return self.__str__()

class FpeNet(object):
def init(self, trt_path, input_size=(80, 80), batch_size=1):
self.trt_path = trt_path
self.input_size = input_size
self.batch_size = batch_size

    TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
    trt_runtime = trt.Runtime(TRT_LOGGER)
    self.trt_engine = self._load_engine(trt_runtime, self.trt_path)

    self.inputs, self.outputs, self.bindings, self.stream = \
        self._allocate_buffers()

    self.context = self.trt_engine.create_execution_context()
    self.list_output = None

def _load_engine(self, trt_runtime, engine_path):
    with open(engine_path, "rb") as f:
        engine_data = f.read()
    engine = trt_runtime.deserialize_cuda_engine(engine_data)
    return engine

def _allocate_buffers(self):
    inputs = []
    outputs = []
    bindings = []
    stream = cuda.Stream()

    binding_to_type = {
        "input_face_images:0": np.float32,
        "softargmax/strided_slice:0": np.float32,
        "softargmax/strided_slice_1:0": np.float32
    }

    for binding in self.trt_engine:
        size = trt.volume(self.trt_engine.get_binding_shape(binding)) \
               * self.batch_size
        dtype = binding_to_type[str(binding)]
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        bindings.append(int(device_mem))
        if self.trt_engine.binding_is_input(binding):
            inputs.append(HostDeviceMem(host_mem, device_mem))
        else:
            outputs.append(HostDeviceMem(host_mem, device_mem))

    return inputs, outputs, bindings, stream

def _do_inference(self, context, bindings, inputs,
                  outputs, stream):
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) \
     for inp in inputs]
    context.execute_async(
        batch_size=self.batch_size, bindings=bindings,
        stream_handle=stream.handle)

    [cuda.memcpy_dtoh_async(out.host, out.device, stream) \
     for out in outputs]

    stream.synchronize()

    return [out.host for out in outputs]

def _process_image(self, image):
    image = cv2.imread(image)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    w = self.input_size[0]
    h = self.input_size[1]
    self.image_height = image.shape[0]
    self.image_width = image.shape[1]
    image_resized = Image.fromarray(np.uint8(image))
    image_resized = image_resized.resize(size=(w, h), resample=Image.BILINEAR)
    img_np = np.array(image_resized)
    img_np = img_np.astype(np.float32) #/ 255  #this was corrected in a forum
    img_np = np.expand_dims(img_np, axis=0)  # the shape would be 1x80x80

    return img_np, image

def predict(self, img_path):
    img_processed, image = self._process_image(img_path)

    np.copyto(self.inputs[0].host, img_processed.ravel())
    t_time = 0
    landmarks = None

    for i in range(1):
        t1 = time.perf_counter()
        landmarks, probs = self._do_inference(
            self.context, bindings=self.bindings, inputs=self.inputs,
            outputs=self.outputs, stream=self.stream)
        t2 = time.perf_counter()
        t_time += (t2 - t1)
    print('inferece time:', t_time)

    # to make (x, y)s from the (160, ) output
    landmarks = landmarks.reshape(-1, 2)
    visualized = self._visualize(image, landmarks)

    return visualized

@staticmethod
def _postprocess(landmarks):
    landmarks = landmarks.reshape(-1, 2)
    return landmarks

def _visualize(self, frame, landmarks):
    visualized = cv2.cvtColor(frame, cv2.COLOR_GRAY2BGR)
    for x, y in landmarks:
        x = x * self.image_width / self.input_size[0]
        y = y * self.image_height / self.input_size[1]
        x = int(x)
        y = int(y)
        cv2.circle(visualized, (x, y), 1, (0, 255, 0), 1)
    return visualized

if name == ‘main’:
import argparse

arg_parser = argparse.ArgumentParser()
arg_parser.add_argument('--input', '-i', type=str, required=True)
args = arg_parser.parse_args()
img_path = args.input

fpenet_obj = FpeNet('/home/eren/FPEnet/model.trt')
output = fpenet_obj.predict(img_path)
cv2.imwrite('landmarks.jpg', output)
print('image has been writen to landmarks.jpg')

system · July 7, 2022, 7:05pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inference problem with FPEnet Jetson Xavier NX jetson-inference	14	1052	July 28, 2022
Convert model to Jetson Error during model export step in TAO notebook TAO Toolkit	21	2094	February 15, 2022
Tao-converter [ERROR] Failed to parse the model, please check the encoding key to make sure its correct TAO Toolkit deepstream	70	1786	July 10, 2023
Tao-converter doesn't convert ".etlt" to ".engine" TAO Toolkit debugging-and-troubleshooting , tao , deepstream	10	664	October 20, 2023
Tao-converter failed to convert etlt to engine file due to could not find any implementation for node conv1/convolution + activate_1/Relu6 TAO Toolkit	9	851	April 26, 2022
Converting etlt file to .engine for jetson TAO Toolkit	17	2992	October 25, 2022
Tao-converter error TAO Toolkit	34	2018	November 10, 2021
The effect is very poor when converted to trt TAO Toolkit tensorrt , ubuntu	61	1435	September 11, 2023
Cannot infer with fpenet with TensorRT8.0 TAO Toolkit	14	1602	March 3, 2022
[ERROR] Model has dynamic shape but no optimization profile specified. Aborted (core dumped) TAO Toolkit	30	2066	December 13, 2021

Installing Tao-converter and running: Where is the "Encoding key" of FPEnet?

Related topics