Ethernet Queue Building Up

Hi greg2,
We don’t see the issue by running the steps on r28.1/TX2:
1 gst-launch-1.0 -e videotestsrc pattern=1 ! video/x-raw,width=4096,height=2160,format=I420 ! nvvidconv ! omxh264enc bitrate=35000000 iframeinterval=1 control-rate=2 ! flvmux ! queue max-size-buffers=0 max-size-bytes=0 max-size-time=0 ! rtmpsink location=“rtmp://35.197.72.182/rtmptest”
2 wait for 60 secodns
3 <Ctrl + C>

Both ethernet and wifi exit immediately. Is there any mismatch between your steps and ours?

No, that’s correct. Sometimes it takes a couple minutes for the problem to appear. Other times it happens as soon as the stream starts. I’d let it run for 3-4 minutes before I stop it and giving it another try. It should reproduce on 1st or 2nd try - it’s quite easy to reproduce. The annoying part though while using gst-launch is that you have no idea until you stop it if the problem is occurring at that moment or not. Also when using gst-launch the problem could have already happened and corrected itself when <Ctrl+C> is pressed- you’re completely flying blind with it and can only see the issue at one point in time.

The test app is much nicer in that regard - you can see the queue in real time and know how it’s behaving immediately when the problem happens, helping to see and analyze the problem better.

The problem happens with either app, though, but WIFI and ethernet via USB dongle are both solid. Ran 3 hour stream at 50Mb/s to that test server today over USB ethernet dongle without any issues. Meanwhile I saw on-board ethernet buffering within 30 seconds.

Hello, did you guys have any luck at least reproducing the issue on your side?

Hi greg2,
We don’t successfully run the app. Do you run it on r28.1/tx2? It seems the IP is not correct and we have to modify the source code. We also hit segment fault in running it.

And what is the usecase of RTMP? Is it for streaming just like rtsp?

Hi,

Yes, I ran this on r28.1 build on the TX2. The public IP address to use is: 35.197.72.182, the full RTMP address to stream to is rtmp://35.197.72.182/rtmptest

I changed this in source and re-uploaded it to the same place (here: https://drive.google.com/open?id=0B6FhWjPAiQ_gS3JETU9nbkJScGc )

The app will now default to that public facing IP address listed above and OPTIONALLY if you want to test sending to another RTMP end-point you can just give it to the app as a command line argument, like this:

./str rtmp://ip.address.goes.here/rtmptest

The code is pretty simple if somebody there wants to alter it and if needed should be an easy build on a Jetson loaded with jetpack (just needs gstreamer-dev libs, then can run ./build.sh). Bitrate can easily be modified on line 34 if desired, but 35Mb/s shows issue very easily.

The use case for RTMP is streaming to media servers. This would be for example using youtube’s live video casting feature or facebook’s live video streaming. From the device (ie. a phone) the RTMP stream is sent to the RTMP server (ie. youtube.com). In the case of Youtube it’s then transcoded on the server in real time and made available on their website for many people to consume the stream in real time (using some other protocol). It’s very common in real-time video casting/streaming. Hope that explains the use-case well enough.

Hi greg2,
On r28.1/TX2, I got the following prints:

nvidia@tegra-ubuntu:~/rtmp_tester$ ./str
Capturing to "rtmp://35.197.72.182/rtmptest"
bitrate: 35000000
Framerate set to : 30 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 4
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
===== MSENC blits (mode: 1) into tiled surfaces =====
1.00042s, 2329.03 fps
#Buf: 31, byts: 4341845, tim: 1033333323
2.00071s, 2983.12 fps
#Buf: 58, byts: 8382867, tim: 1933333314
3.00082s, 2966.69 fps
#Buf: 51, byts: 7435743, tim: 1699999983
4.00126s, 2876.73 fps
#Buf: 25, byts: 3648507, tim: 833333325
5.00159s, 2870.05 fps
#Buf: 13, byts: 1889186, tim: 433333329
6.00188s, 2889.17 fps
#Buf: 35, byts: 5102655, tim: 1166666655
7.00228s, 2891.82 fps
#Buf: 58, byts: 8454294, tim: 1933333314
8.00263s, 2886.99 fps
#Buf: 70, byts: 10214233, tim: 2333333310
9.00264s, 2892.97 fps
#Buf: 82, byts: 11954676, tim: 2733333306

Is the fps normal? How to read the print? Please give more information. Thanks.

Yes, the frame rate is normal. It is not sending a video or any meaningful data, but rather sending a blank frame with enough pre-calculated noise/entropy added to it to be able to achieve the requested bitrate (should be good up to about 45Mb/s before it starts to undershoot the requested bitrate). You can see the pre-calculated noise in lookupdata.h.

Other than that the print is pretty basic, each second the following is printed:

<elapsed time> (in seconds), <achieved framerate> fps
#Buf: <number of queued buffers> (30 buffers = 1 second), <total size of queued buffers> (in bytes), <buffered total time> (in nanoseconds)

So if we take the bottom line there:

9.00264s, 2892.97 fps
#Buf: 82, byts: 11954676, tim: 2733333306

9 seconds have elapsed, achieved framerate is ~2900.
There are 82 buffers in queue. The total size of the queue is 11954676 bytes so we can estimate each frame is approximately 11954676/82 bytes in size (142KB). This makes sense because at a requested pipeline speed of 30fps, 30*142KB = 4,260KB per second, which if you multiply by 8 to convert from bytes to bits you get 34,080 Kbits, aka. 34Mb/s, which is just under what’s requested (35Mb/s). If you let it run longer it’ll stabilize closer to 35Mb/s.

As for the time in nanoseconds, it’s just that, the total queue time in nanoseconds. A nanosecond is 1/1000000000, so if we take that nanosecond value of 2733333306 and divide by 1000000000 we get 2.73 seconds, which is the amount of latency in the buffer/queue. We can confirm this because 2.73 seconds * 30 frames per second = 81.9 buffers (82 buffers), so the numbers correlate.

So the # of buffers will always match the delay time, and the number of bytes in the queue will always be:

((<requested bitrate in bits>/8) / <frame rate>) * <num queued buffers>

There should be no entries in this buffer. This buffer filling up means that it’s not able to push these frames onto the network. The frames are produced by the pipeline at a constant 30fps, so if no frames are pushed over the network then the queue will rise by 30 each second. Conversely, if all frames are instantly pushed over the network the queue will stay at 0.

On the outside it’s unknown given the internet connection how fast you can push the data over the network, but we’re seeing the problem on our internal network as well. It’s all very strange that connecting a USB ethernet dongle makes this problem go away entirely.

Thanks

Hi greg2,
What the print should be in normal and abnormal cases?

Normal vs abnormal or good vs bad? Normal case is bad case here. Abnormal is when it works nicely (good case).

The output you printed above is the bad case (normal case), where frames are buffering in the gstreamer pipeline because they can’t get pushed out over the network faster or equal to 30fps.

If it was a good scenario, where network TX throughput isn’t an issue, it’ll look something like this:

1.00042s, 2329.03 fps
#Buf: 0, byts: 0, tim: 0
2.00071s, 2983.12 fps
#Buf: 0, byts: 0, tim: 0
3.00082s, 2966.69 fps
#Buf: 0, byts: 0, tim: 0
4.00126s, 2876.73 fps
#Buf: 0, byts: 0, tim: 0
5.00159s, 2870.05 fps
#Buf: 0, byts: 0, tim: 0

Seeing 1 or 2 frames in the buffer is also “good” (fine). Though when you see what you pasted above where the latency is just rising and rising because it can’t be sent over the network fast enough that’s an issue.

Hi greg2,
Please try
sprintf(buf, “appsrc name=app_src ! omxh264enc name=enc ! flvmux ! queue name=myqueue max-size-bytes=32768 ! rtmpsink location=%s”, path);

Hi Dane,

This only works around the problem. By limiting the queue (you can limit to any size) the pipeline will simply drop frames when the queue is full. You can also do “max-size-buf=2” and it will only buffer 2 frames, therefore cannot have a latency of over 2/30 of a second. The problem there is that any frames beyond 2 get dropped on the floor and never make it to the server.

In this case with the size (in bytes) limitation of: 32768 and if we take the assumption of the previous log at 35Mb/s of approximately 142KB (142000 bytes) then to determine the number of frames that can be buffered, we do: floor(32768/142000) → floor(0.23) → 0. So 0 frames can be buffered. At that point the queue isn’t doing anything. Those frames that can’t be buffered are dropped. Because of that there will be no latency in communication, but many of the frames from the pipeline will also never reach the server.

If you set the queue to a constant size you eliminate the growing latency problem but suffer a problem of missing frames in the final video. So if the network can only push 20 frames per second through a pipeline running at 30 frames per second, and we let this pipeline run for 5 seconds, then instead of 150 frames in the final bitstream we will have 100 frames, and the final bitstream will only be of length 3.3 seconds (missing data).

So basically a queue allows you to build latency instead of drop frames - if we’re essentially disabling the queue, then we’re just dropping frames. We’re trying to stop the system from building latency, but we can’t drop frames.

Hi greg2,
Please share more about queue element. My understanding is that we configure max-size-bytes=32768, and it will make all buffers into 32768 bytes and send to rtmpsink.

Although we don’t configure configure max-size-buf, it drops frames automatically? Any print shows queue element drops frames?

Hi Dane,

You can read about the queue element here:
https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer-plugins/html/gstreamer-plugins-queue.html#GstQueue--max-size-bytes

I was mistaken about it dropping - it will just block instead by default unless the “leaky” attribute is set (where it will drop frames). This is obviously still a major issue if you’re shooting for real-time video 24FPS+ yet the pipeline is having trouble transmitting at the requested frame-rate. Also it’s an issue in the pipeline I originally provided with an “infinite” buffer size (buffer size=0) as it can continue to grow until the memory constraints of the system are reached. This can create an ever-increasing latency.

In our particular case since the stream is live it needs to be real time (and the queue size needs to stay around 0), the buffer filling up to a large number of frames causes a very bad user experience for viewers because it’ll cause their viewers to keep buffering while it waits for their buffer to fill up, which it can’t do in real time, so it’ll end up buffering in constant intervals.

I think the key notes about the queue and the max-size attributes from the documentation are as follows:

So by setting them all to 0 it creates a queue that can grow until system constraints become an issue (space alloc fails).

So when we set the max-size-bytes to a value less than 1 frame, no frame (not at 35Mb/s anyway) should be added to the queue because we can’t buffer a partial frame, the queue will always block in that case.

The only way we can change the size of each element (frame) in the buffer is if we were to change the bitrate, in which case the encoder would generate smaller frames.

Let me know if that clarifies usage.

Hi greg2,
Here is the log with max-size-bytes=32768 through on-board Ethernet:

nvidia@tegra-ubuntu:~/rtmp_tester$ ./str
Capturing to "rtmp://35.197.72.182/rtmptest"
bitrate: 35000000
Framerate set to : 30 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 4
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
===== MSENC blits (mode: 1) into tiled surfaces =====
1.64457s, 13.9854 fps
#Buf: 1, byts: 113330, tim: 33333333
2.71403s, 79.4797 fps
#Buf: 1, byts: 138995, tim: 33333333
3.77726s, 1567.87 fps
#Buf: 1, byts: 147859, tim: 33333333
4.83117s, 894.761 fps
#Buf: 1, byts: 147859, tim: 33333333
5.95349s, 7.1281 fps
#Buf: 1, byts: 147859, tim: 33333333
7.08508s, 7.06965 fps
#Buf: 1, byts: 147859, tim: 33333333
8.22025s, 7.04746 fps
#Buf: 1, byts: 147859, tim: 33333333
9.22037s, 242.969 fps
#Buf: 0, byts: 0, tim: 0
10.3458s, 1388.75 fps
#Buf: 1, byts: 148359, tim: 33333333
11.4386s, 7.32116 fps
#Buf: 1, byts: 148359, tim: 33333333
12.4889s, 7.61694 fps
#Buf: 1, byts: 148359, tim: 33333333
13.5892s, 7.27074 fps
#Buf: 1, byts: 148359, tim: 33333333
14.6868s, 7.28865 fps
#Buf: 1, byts: 148359, tim: 33333333
15.7154s, 6.80478 fps
#Buf: 1, byts: 138995, tim: 33333333
16.8353s, 7.14391 fps
#Buf: 1, byts: 138995, tim: 33333333
17.9496s, 7.1795 fps
#Buf: 1, byts: 138995, tim: 33333333

How to tell frames get blocked from the log?

17.9496s, <b>7.1795 fps</b>
#Buf: 1, byts: 138995, tim: 33333333

See how it’s only running at 7 FPS, where-as with the infinitely sized queue (where max-buf-size, max-buf-bytes, and max-buf-time were all 0), it was something like 2700?

This is because with that pipeline we’re only buffering max of 1 frame (which is even a little surprising, as you can see the frame size of that frame is 138KB, given the very small max size we provided, I expected it to not even accept the 1 frame), so we can’t push more frames into the pipeline.

If you look in:

gboolean idle_function(gpointer user_data)

At:

for (int i = 0; i < 4 ; ++i)
{
	if (!buffers[i].used) buf = &buffers[i];
}

if (!buf)
{
	return G_SOURCE_CONTINUE;
}

If the pipeline doesn’t mark these buffers as available (used = false), then we don’t go any further in this function. So we only wind up with 7 FPS.

On the opposite side, when it’s running at 2700FPS, how do we stop it from pushing 2700 buffers per second to the pipeline? That’s handled in:

void commitBuffer(Buffer *buf)
int64_t frameTimeIndex = gst_util_uint64_scale(baseTimeDiff, fps, GST_SECOND);
....
if (frameTimeIndex <= lastFrameTimeIndex)
     return;

So basically if a buffer is attempted to be committed in the same 1/30 of a second (30FPS) time index, we don’t go any further. So only 30FPS max will be pushed to the pipeline.

Hope that explains better what you’re seeing in the output and how that relates to what it’s doing as well as how you can tell it’s blocking.

Thanks

Hi greg2,
It runs fine with below setting:
1 Run ‘sudo nvpmodel -m 0’
2 Modify the code(as attached):
2.1 appsrc with fixed frame rate
2.2 Remove adding random data
2.3 Pre-load the frames generated via
$ gst-launch-1.0 videotestsrc num-buffers=30 pattern=1 ! video/x-raw,width=4096,height=2160 ! filesink location=/home/nvidia/a.yuv

Looks like the system is busier in the case and requires mode 0
main.cpp (7.26 KB)

Most of those steps aren’t necessary, I don’t think. No need to change the way the frames are generated - the issue shows up whether you’re using random noise or a test pattern; whether using gst-launch or the test app. If you read back in the thread (1st page) using model 0 works around the issue. We have some issues with model 0 though affecting other stuff negatively.

I feel like model 0 shouldn’t be the solution here, especially since this works fine over WIFI and through a USB->Ethernet dongle, this rings more of a system problem, or some buffer or packet size somewhere. I feel like model 0 is just masking the real issue, not actually fixing it. Never-the-less it looks like at the present we’ll need to try and make use of model 0 instead of 3.

Thanks

Hi greg2,
nvp model 0 is a working model we support. If you see issue in running it, please start a new post and give steps to reproduce the issue.

We have verified 4kp30 35Mbps RTMP via Ethernet working fine in model 0.

Yes, we definitely have an issue to correct on our side to fix running on model 0.

That being said, given that this same exact setup at the same bitrate and everything runs fine with model 3 on WIFI & USB ethernet the need for model 0 is suspicious at best, unless there’s some internal dependency of on-board ethernet on model 0. Though for now we’ll have to try and work around the issue using model 0, an ethernet dongle, or some other workaround.