deepstream_test_3.py loads really slow (multiple minutes) where deepstream_test_2.py loads fairly quickly for me (~<10 sec). At first I thought it had to do with the model it was loading but deepstream_test_2.py loads the same primary model in additional to secondary models with much less start up delay. The long delay is the same if I use RTSP or file based sources.
The main other difference I see is that deepstream_test_2.py is loading right off of a single h264 source wired directly where the deepstream_test_3.py example creates its sources via create_source_bin. Is using create_source_bin really that slow or could their be a perf bug in it?
The other thing I noticed is that with the slow load of deepstream_test_3.py I see tons of trace output with:
WARNING: [TRT]: Unknown embedded device detected. Using 59660MiB as the allocation cap for memory on embedded devices.
I’ve searched on this in the forum with comments that it won’t affect anything yet other users have noted it does seem to coincide with performance loss. What is this and why is it assumed a failure case needing a warning to output 1000’s of times isn’t an issue.
Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) Jetson OWIN • DeepStream Version 6.3 • JetPack Version (valid for Jetson only) 5.1.2-b104
I’ve narrowed the case down a bit for the perf issue. When I run deepstream_test_3.py with one inputs (file or rtsp) it starts within seconds with no warning spew. If I pass in 2 or more inputs (file or rtsp) it takes minutes to start and I get all the warning spew. Muxing problem?
This is really making DeepStream development impossible. I really need some help on this… please!
A bit more info. It happens on just about every sample I run. The latest I ran was the objectDector_Yolo sample with…
sudo deepstream-app -c deepstream_app_config_yoloV3.txt (tried with sudo and without)
This will run but every time it has to build the TensorRT engine. Here is the trace output at the moment it goes into 10 minutes of build with thousands of trace messages…
Building yolo network complete! Building the TensorRT Engine… WARNING: [TRT]: Unknown embedded device detected. Using 59660MiB as the allocation cap for memory on embedded devices. [repeats for 1000’s of outputs and over 10 minutes of processing before the sample runs]
Now at this point the samples runs fine and the inference looks good and I get the following trace output showing it shutdown clean.
nvstreammux: Successfully handled EOS for source_id=0
**PERF: 49.89 (49.53) *
*** INFO: <bus_callback:262>: Received EOS. Exiting …
Quitting [NvMultiObjectTracker] De-initialized App run successful
If I rerun at this point it does the same thing. It won’t use any results of the long build from the prior run. I’m guessing this is a permission problem of writing a cache but it’s running with sudo and have taken ownership of the whole sample directory as another attempt to fix.
You can try to opening the following comments in the config_infer_primary_yoloV3.txt.
model-engine-file=yolov3_b1_gpu0_int8.engine
In addition to the permissions, you need to configure the following property correctly: model-engine-file. Please make sure set the right engine name generated to it.
I think I got it figured out and I would suggest that the samples all are wrong on this front. This particular sample had
#model-engine-file=yolov3_b1_gpu0_int8.engine (commented out in the config file)
I tried commenting and uncommenting several times and it didn’t change the behavior. I finally noticed in the docs that it says model-engine-file needs to point to an absolute path. I changed it to an absolute path and it started working as expected to pick up the pre-compiled engine file. I also noticed that the samples I had copied out of the nvidia/deepstream tree and thus patched the config files with absolute paths would work where the ones left relative did not.
So I’m unblocked (thanks for you comments and efforts) but I question if all the samples are misconfigured and users are all just putting up with ridiculously long load times. Should all the samples be updated to absolute paths and are users really suffering through a poor out of box experience with them the way they are.
In theory, either the absolute path or the relative path to the current running directory should be OK. You can see many relative path in our config file.
Yes, for some of the samples I can confirm that if I delete the engine file and run with a config file with relative paths, then the engine file gets built on the first run and second runs don’t need to rebuild. I’ll have to revisit why the absolute path helped with the yolo sample.
But lets return to my original problem in that when I run deepstream_test_3.py with multiple files it won’t load from the pre-generated engine file. I think I see an issue buried in all that trace.
I run :
cd /opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps/apps/deepstream-test3
python3 deepstream_test_3.py -i file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_run.mov file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_walk.mov
If this was clean with no cached engine file then this triggers the regen of the engine and I see the trace at the end…
serialize cuda engine to file: /opt/nvidia/deepstream/deepstream-6.3/samples/models/Primary_Detector/resnet10.caffemodel_b2_gpu0_int8.engine successfully
Note that the config file for this sample is a mismatch to what is generated:
And when I run again it does the same thing because the model-engine-file is set to the b1 variant but it determines it needs the b2 variant with 2 batches and builds that one.
I can confirm that if I change the model-engine-file to be pointing to a b2 variant then the second run will pick it up correctly when using 2 sources. I also confirmed adding a 3rd input bumps the generated engine file to a b3 variant and you again need to change the config file to match or it load poorly.
So I think what this is telling me is that I must know how many sources will be passed into the command line and I need to change the model-engine-file to be specific to what is being passed in? This is a very misleading sample in my mind. It really needs to document that you have to change the config to match how many files you pass. The docs show two command lines with differing number of inputs and no instructions to change anything if you use the different flavors with different number of inputs. It also has no runtime checks at all to catch the mismatch.
Yes. You are right. You can check that in our code: deepstream_test_3.py.
For performance reasons of nvinfer, the code will automatically adapt batch size based on the number of sources, so every time you change the source, the engine will be regenerated. If you don’t consider the performance issues of nvinfer, you can modify the source code.