Different model_b1_gpu0_fp32.engine when running a new thread but same deepstream configuration

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
GPU

• DeepStream Version
Deepstream-6.3

• TensorRT Version
8.5.3-1+cuda11.8

• NVIDIA GPU Driver Version (valid for GPU only)
Driver Version: 565.57.01

• Issue Type( questions, new requirements, bugs)
When starting a different thread from deepstream one from my python code, a different model_b1_gpu0_fp32.engine is generated and it should not be generated, since the path of the model_b1_gpu0_fp32.engine is specified in the configuration file and it should be detected.
The new thread should not be related to deepstream.

What could be the reason of having a different model_b1_gpu0_fp32.engine file?

This topic is similar to your question, you can refer to it

Hi @junshengy

I think my topic is quite different from the one of the link you sent in the previous message.

We have set the following variables in the config file:

model-engine-file=/workspace/dev_deepstream_client_test/weights/model_b1_gpu0_fp32.engine
engine-create-func-name=NvDsInferYoloCudaEngineGet

The .engine file is already present in the folder model-engine-file=/workspace/dev_deepstream_client_test/weights/, but it is built anyway when I write a python example with a new thread.
The two files (the one already present and the one generated when the code runs in the same folder of the python example) are actually different in size.
Do you know the reason why these two files (both named model_b1_gpu0_fp32.engine) are actually different?

I get the following when I run the python code:

0:00:12.096713273   425     0x44101d00 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1976> [UID = 1]: deserialize engine from file :/workspace/dev_deepstream_client_test/weights/model_b1_gpu0_fp32.engine failed
0:00:12.420826662   425     0x44101d00 WARN                 nvinfer gstnvinfer.cpp:679:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Warning from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:2081> [UID = 1]: deserialize backend context from engine from file :/workspace/dev_deepstream_client_test/weights/model_b1_gpu0_fp32.engine failed, try rebuild

It seems that my guess is correct, your problem is the same as the one in the topic above. This is a known issue. When you use the engine-create-func-name configuration item to customize the engine file, nvinfer will ignore the model-engine-file and only generate a file named model_b1_gpu0_fp32.engine in the directory where the program is currently running.

I have given a workaround in the topic above, but you need to modify the code of nvinfer and recompile and install it.

I have done a structure of my repository to avoid to build the model_b1_gpu0_fp32.engine file everytime since it requires time.
In my repository, I have done the following:

--> deepstream_tests
          --> config
                    --> infer
                              --> config_infer_primary_yoloV8.txt
          --> deepstream_python_apps_test\
                    --> deepstream_test_1_usb.py
                    --> deepstream_test_1_new_thread.py
         --> weights
                    --> generate_weights_fake_sink.py

The file generate_weights_fake_sink.py is a simple copy of deepstream_test_1_usb.py that loads configuration from config_infer_primary_yoloV8.txt file.
If I run from deepstream_tests/weights folder the generate_weights_fake_sink.py I obtain the following:

--> deepstream_tests/
          --> config/
                    --> infer/
                              --> config_infer_primary_yoloV8.txt
          --> deepstream_python_apps_test/
                    --> deepstream_test_1_usb.py
                    --> deepstream_test_1_new_thread.py
         --> weights/
                    --> generate_weights_fake_sink.py
                    --> model_b1_gpu0_fp32.engine

Which is correct.
Then I run deepstream_test_1_usb.py from deepstream_tests/deepstream_python_apps_test that loads configuration from config_infer_primary_yoloV8.txt and when I run, it skips the build RT step and it uses the file model_b1_gpu0_fp32.engine inside deepstream_tests/weights, which is my expected behaviour.

But if I run deepstream_test_1_new_thread.py with the same configuration of deepstream pipeline, it does not skip the build RT step and it generates a file model_b1_gpu0_fp32.engine inside deepstream_tests/deepstream_python_apps_test/, which is not wanted.
If I cancel just generated the file deepstream_tests/deepstream_python_apps_test/model_b1_gpu0_fp32.engine and I comment out the lines that define and start my new thread, it skips the build RT step and it does not generate the file deepstream_tests/deepstream_python_apps_test/model_b1_gpu0_fp32.engine. It uses the .engine file inside deepstream_tests/weights.

Do you know the reason of this behaviour? It seems to me pretty different from the one you mentioned.

I see, in deepstream_test_1_new_thread.py, Has the value of batch-size property been modified?

When rebuilding the engine file, is there output similar to the following in the terminal?

deserialize backend context from engine from file :%s failed, "
                "try rebuild

deserialized backend context :%s failed to match config params, "
            "trying rebuild

During inference, you need to specify the max batch size to generate the engine file

The batch_size is set to 1 for both deepstream_test_1_new_thread.py and deepstream_test_1_usb.py. So no changes for batch-size of streammux.

I get this following output:

I have set max-batch-size=1 in config_infer_primary_yoloV8.txt, but the problem is not solved. I does not skip the build RT step and it generates a file model_b1_gpu0_fp32.engine inside deepstream_tests/deepstream_python_apps_test/

I mean nvinfer, not nvstreammux, there is no max-batch-size configuration item

For nvinfer, batch-size will affect the generation of engine file. For nvstreammux, it only affects Gstbuffer, Is there a complete log?

DS-6.3 should install CUDA-12.1.

https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Installation.html#id11

Try running it in docker. Mismatched cuda versions can also cause some issues.