What's the expected performance of Python test examples?

So, I installed DeepStream SDK along Python bindings and run test program #3 on my Jetson Nano:

paco@paco-jetson:~/deepstream_sdk_v4.0.2_jetson/sources/python/apps/deepstream-test3$ python3 deepstream_test_3.py rtsp://admin:@192.168.50.90:554/h264/main/ch1/main/av_stream

I’m getting “There may be a timestamping problem, or this computer is too slow” messages when reading a stream from a wired connected IP camera. Is this normal? Besides the model is not perfect (which is expected), but, in a random frame, is detecting 4 cars and me when there’s only me (see https://drive.google.com/file/d/1buLKhHx93TI6y5xPDxDKMW5RN0e6wULs/view?usp=sharing ).

Regards.

whether you are using default configuration file, if not can you share the file?

I’m using the default configuration file.

Did you try local file? You can save the rtsp stream as a local file and run it again. Is there the same issue? If so, can your share the stream?

We provide optimized deepstream-app and deepstream-test5. You can refer to those and optimize based on that. Currenty there is no plan to optimize test3 app.

@ChrisDing - based on your last comment - is this suggesting that Nvidia recommends developers base custom apps of deepstream-app or deepstream-test4 instead of the other test apps?
What optimisations are you talking of or should we be aware of?

I tried with an mp4 video file I downloaded from the internet ( https://www.videezy.com/abstract/17062-people-walking-near-main-square-and-metropolitan-cathedral )
is a UHD 3840 x 2160 | 30 fps video.

Then I executed

paco@paco-jetson:~/deepstream_sdk_v4.0.2_jetson/sources/python/apps/deepstream-test3$ python3 deepstream_test_3.py file:///home/paco/Downloads/People_Walking_Near_Main_Square_And_Metropolitan_Cathedral_0829E.mp4

It ran fine and smooth (no “this computer is too slow” message), but it took 2:05 minutes to start playing,
Is this time normal?
How can I achieve the same performance with a RSTP live stream?

There’s no deepstream-test5 in my “~/deepstream_sdk_v4.0.2_jetson/sources/python/apps” dir,
Is there a python source for deepstream-test5?

Regards.

it took 2:05 minutes to start playing
In the first time, the engine file needs to be generated and cached. You can try the second time.

deepstream-test5 is in c package but not in python package.

Hi, I tried three times, getting the next results:
1st: 2:14 minutes.
2nd: 2:04 minutes.
3rd: 2:02 minutes.

Hence, a very marginally performance gain.

I get this from any network source, including wired IP cameras and youtube uris, in any language, so it’s not Python. I think the core of the problem is gstreamer’s network elements. I get the same problems with similarly configured pipelines on Intel/AMD.

If I figure out a good solution, I will post it in this forum (hopefully Nvidia is working on this as well). So far, I have found the best results automatically with uridecodebin for any sort of network (or local) source, but it struggles with many network sources at once.

I’m going to try experimenting with various queues next to see if that helps, since this splits the pipeline into threads. I am not sure if there is one in uridecodebin already. So far the limit on my Xavier is about 4 youtube streams before frames start to drop.

Re: C vs Python, DeepStream performance seems to be very similar so long as you don’t try to do anything fancy in Python in your callbacks. The main loop, as well as most stuff it calls, is in C.

If it’s taking two minutes to start the app, it’s likely the .engine file is not being loaded and/or the path it’s being written to is not writable. You can find the .engine file generated in “/opt/nvidia/deepstream/deepstream-4.0/samples/models/Primary_Detector/” for your case, and specify that path in the .config file for the primary inference engine. In the case of my xavier, the line in question is:

model-engine-file=/opt/nvidia/deepstream/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel_b1_int8.engine

The file generated for Nano is different but you will find it in that folder if you look. If you copy and paste that path to “model-engine-file”, it should no longer load. If it does anyway, and it did on my Xavier, you can comment out some lines so it looks like this:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
# model-file=/opt/nvidia/deepstream/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel
# proto-file=/opt/nvidia/deepstream/deepstream-4.0/samples/models/Primary_Detector/resnet10.prototxt
model-engine-file=/opt/nvidia/deepstream/deepstream-4.0/samples/models/Primary_Detector/resnet10.caffemodel_b1_int8.engine
labelfile-path=/opt/nvidia/deepstream/deepstream-4.0/samples/models/Primary_Detector/labels.txt
# int8-calib-file=/opt/nvidia/deepstream/deepstream-4.0/samples/models/Primary_Detector/cal_trt.bin
batch-size=1
network-mode=1
num-detected-classes=4
interval=0
gie-unique-id=1
output-blob-names=conv2d_bbox;conv2d_cov/Sigmoid

That is the only way I was able to get it to stop building the same .engine every single startup. Unfortunately, this whole setup requires your running user to have write access to certain global paths. If the paths aren’t already world writable, which I think they are by default, you can change their containing folder’s ownership to root:video, and make the folder mode 775 (group writable), or you can copy the required files in Primary_Detector somewhere in your ~/home, and modify the config file accordingly so it will be able to write a .engine file.

If the app isn’t able to write it’s .engine file to wherever the model-file is found, it will fail silently, and will try to re-generate the model every single time the app launches, which can take minutes.

@Nvidia, it would be nice if a model cache folder was created under something like ~/.deepstream/ and automatically searched to avoid all this.

1 Like

Got.

  1. " “There may be a timestamping problem, or this computer is too slow”"
    This message problem is from gstreamer’s network elements. Please share your solution here if you will.

  2. Make engine file path writable in default.

Sorry. I have a habit of being overly verbose. I blame typing too fast. Plus coffee. Lots and lots of coffee.

I am going to insert a multiqueue just before my stream muxer to see if it helps or hurts. Should have results by the end of the day.

Problem is nvinfer doesn’t seem to respect the model-engine-file for writing. Only for reading. It always tries to write to the model-file path. Also, I don’t want to make the global model install path writable. Current workaround I’m thinking of is to copy model to ~/.app/models/ at app startup and override all the nvinfer paths.

Welp. The multiqueue made the buffer dropping messages go away, so yay to that, but now the osd is going crazy. Any idea what might cause rather horrifying result? It seems like the stream ids are getting mixed up. figured it out (batch-size related)

attaching a pdf of the pipeline as well generated on quit.
0.00.50.550347214-pipeline0.quit.pdf (58.4 KB)

The thing is there are already queues after the network elements in the uridecodebin so i’m not sure why the multiqueue is helping (at least to the extent it makes the buffer dropping messages go away).

Regarding the flickering… I do have my osd element before my tiler. Is the metadata modified in such a way osd can work after the tiler?

edit: the flickering problem was unrelated to the multiqueue. I was related to batch-size.

So, the final result is: once I fixed the flickering issue, the buffer dropping messages came back. the multiqueue seems to not help, which isn’t surprising considering there are already queues in the uridecodebin I am using. Visually, I don’t see any frames missing, but the console is full of messages. I don’t think any other queue is going to help, but my linking code is a lot cleaner now.

going to try something else. If I come across an actual solution, i’ll post it here.

Edit: so I looked at tegrastats while my app is running and I think i’m hitting the limit of what the GPU can do “GR3D_FREQ 93%@1377”. The CPU isn’t very stressed at all, however. I’m going to experiment with doing detections on alternating frames, scaling things down, and optimizing the model I am using. I don’t think I need fp32 for my purposes.

Thank you for your detailed reply, it pointed me out of how to tackle this problem. When executing the program I realized of this Warning:

0:00:04.628446107 24836     0x3b3cdef0 INFO                 nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]:initialize(): Trying to create engine from model files
0:00:04.628708979 24836     0x3b3cdef0 <b>WARN                 nvinfer gstnvinfer.cpp:515:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]:generateTRTModel(): INT8 not supported by platform. Trying FP16 mode.</b>
0:02:11.533955165 24836     0x3b3cdef0 <b>INFO                 nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]:generateTRTModel(): Storing the serialized cuda engine to file at /home/paco/deepstream_sdk_v4.0.2_jetson/samples/models/Primary_Detector/resnet10.caffemodel_b1_fp16.engine</b>

You were right about the .engine file being regenerated on every run, as I could check in path

paco@paco-jetson:~/deepstream_sdk_v4.0.2_jetson/samples/models/Primary_Detector

Nevertheless, the config file I had to modify was neither “config_infer_primary.txt” or “config_infer_primary_nano.txt” (both located in “~/deepstream_sdk_v4.0.2_jetson/samples/configs/deepstream-app”) BUT

"dstest3_pgie_config.txt"

(located in the same path as “deepstream-test3.py”), i.e.

"~/deepstream_sdk_v4.0.2_jetson/sources/python/apps/deepstream-test3"

The modification consisted in adding the next line in bold, which points to the generated .engine file:

model-file=../../../../samples/models/Primary_Detector/resnet10.caffemodel
proto-file=../../../../samples/models/Primary_Detector/resnet10.prototxt
<b>model-engine-file=../../../../samples/models/Primary_Detector/resnet10.caffemodel_b1_fp16.engine</b>
labelfile-path=../../../../samples/models/Primary_Detector/labels.txt
int8-calib-file=../../../../samples/models/Primary_Detector/cal_trt.bin

And now every time I run “deepstream_test_3.py” with a mp4 video it starts after 6 seconds :)

P.S. Let me know if you find a fix for the network source issue.

Yes. the formula for the engine file basename seems to be (python):

model_engine_basename = f"{model_basename}_b{batch_size}_{precision}"

You can set this at runtime on nvinfer if you have a different number of sources (batch-size), but so far my understanding is it doesn’t respect this for writing the file out, so it’ll always try and generate it at the model dirname.

My planned solution is: when my app creates it’s ~/.config dir, copy/symlink it’s the model there, set it’s path to the model-file, and then the model-engine-file should be generated where I want it. I want to avoid writing to or reading from any globally writable paths. I may patch nvinfer itself at some point, but then i’d have to bundle a custom nvinfer with my app, which I don’t want.

Re: network source issue, if I figure that out, i’ll let you know. In the mean time I suppressed the warnings from that source (see bus call code below). Right now I am not sure there is anything I can do or if my case is even network source related. I took at look at uridecodebin and it creates a queue (separate thread) automaticlaly for the network IO itself. I think my issue may be related to a model that is not at all optimized for my platform (fp32 when the platform supports int8).

... (rest of class) ...
		def _on_message(bus: Gst.Bus, message: Gst.Message) : bool
			// note: in Genie there is no fallthrough in a case block, so no need to break;
			case message.type
				when Gst.MessageType.QOS
				when Gst.MessageType.BUFFERING
				when Gst.MessageType.TAG
					break
				when Gst.MessageType.EOS
					GLib.message("Got EOS")
					self.quit()
				when Gst.MessageType.STATE_CHANGED
					old_state:Gst.State
					new_state:Gst.State
					message.parse_state_changed(out old_state, out new_state, null)
					debug(@"STATE_CHANGED:$(message.src.name):$(old_state.to_string())->$(new_state.to_string())")
				when Gst.MessageType.ERROR
					err:GLib.Error
					debug:string
					message.parse_error(out err, out debug)
					if err.code == 3  // window closed
						self.quit()
					error(@"$(err.code):$(err.message):$(debug)")
				when Gst.MessageType.WARNING
					err:GLib.Error
					debug:string
[b]					message.parse_warning(out err, out debug)
					if err.code == 13[/b]
						// buffers being dropped spam
						break
					warning(@"$(err.code):$(err.message):$(debug)")
				default
					debug(@"BUS_MSG:$(message.src.name):$(message.type.get_name())")
			return true

Note: above is Genie, not Python, so it’s actually C under the hood, but you get the idea. Gstreamer is gstreamer. The highlighted bit should give you enough to shut those warning messages up. The python will be slightly differnet, but the error code you are looking to silence is 13. You can turn off the sink sending qos messages by setting it’s “qos” property to false.

I tested “deepstream_test_3.py” again with a WiFi connected IP camera and performance is much worse than with a wired connected one. Seeing a “A lot of buffers are being dropped.” and “There may be a timestamp problem, or this computer is too slow” messages. Fps of stream is 2.4. Image displayed frozen by a minute or two…
I will try to implement the same solution I followed in another program: capture the latest frame using queue/thread and discard previous one.

As the latest code you shared, will only suppress error messages but won’t fix the root cause… (I guess)

Thanks for your help! :)

YW. Unfortunately, no, my code won’t fix the problem, only suppress the messages. Although I think in your case the problem is network related and in mine it may be due to a model that is not at all optimized / calibrated for my platform. I found a nvidia demo this morning doing about the same thing and their benchmarks report similar numbers.

A queue/multiqueue may certainly help. It seemed to slightly hurt in my case, but again, I think we have different bottlenecks. Also, you might want to try other sources. I have some rtsp cameras that cause very strange behavior with DeepStream while these ones consistently work perfectly:

rtsp://freja.hiof.no:1935/rtplive/_definst_/hessdalen03.stream
rtsp://freja.hiof.no:1935/rtplive/_definst_/hessdalen02.stream

You can find other public rtsp cameras by googling. I havent’ been able to get any mjpeg sources working. Seom samples of those:

http://216.161.255.245:80/cgi-bin/faststream.jpg?stream=full
http://128.95.77.66:80/mjpg/video.mjpg
http://24.105.185.114:9000/mjpg/video.mjpg
http://173.160.211.81:80/mjpg/video.mjpg

Also, youtube has a lot of live streaming cameras. If you use youtube-dl with the -g option you can get a uri that works with gstreamer’s uridecodebin. Youtube-dl is also python based with a handy context manager so if you google you can integrate it into your app easily without having to launch a subprocess (but you may want to anyway to avoid blocking).