First of all, thx for the doc, for “old-school” users like me it is really cool to find all those docs on your site specially the jetson-inference !! By the time being it is quite awesome !
Second my question is not specific to nano but here it is:
I have made some test with gstreamer and see that the pipe nvarguscamerasrc ! … ! nvoverlaysink use few CPU and no GPU cycles.
I have now adapted an jetson-inference example to my needs, currently i have a float4* img RGBA GPU only buffer, that i captured from gstCamera, on which i made some “augments” via cuda jetson-inference primitives… Now the render of this example is using glDisplay which is using GPU and is somehow serialized with the other GPU activity (detection) so i cannot have a constant fps for rendering even with 2 threads.
My question is I would like to render as nvoverlaysink does (no windowed output needed), to keep GPU for detection.
My understading is i probably need to convert image and do this in my prg since i don’t want to sendback to the pipe (i am not sure this is feasible via appsrc …).
So b4 looking at nvoverlaysink source code and adapt this, i prefer asking if i am on the right way (nvoverlaysink has been told to be “deprecated” ??) and if this kind of direct rendering is feasible, instead of glDisplay will it saves GPU ?
Hi @fr-op, the image overlay is already performed by the jetson-inference detectNet::Detect() function. For example, after this point in the code, the image has already been overlayed with the detected bounding boxes:
Note that imgRGBA is in GPU memory, but if you want to make it accessible from CPU too, you need to set the zeroCopy argument to true in the call to gstCamera::CaptureRGBA():
camera->CaptureRGBA(&imgRGBA, 1000, true);
Then you will be able to access imgRGBA from CPU, including the overlay, and do whatever you wish with it. And you can disable the display by setting glDisplay* display = NULL on line 121.
I was probably not clear with my question, this is not specifically an overlay (which is done and i am satisfied with) issue, but a display issue. Please correct me if i am wrong in my understanding: (I may have missed some points…)
1 - jetson-inference provides one API to display frames which is glDisplay.
This display gl API use more GPU than what is used by nvoverlaysink and make my displayTask GPU bounded.
2 - I did not managed to have info on nvoverlaysink, but it seems to be a low level displaying… getting over X.
I suppose this is something close to a frame buffer interface, i don’t know which magic they use to make it work over X (works either on console and graphical interface), as my understanding was that X takes over the frame buffer kernel abstraction /dev/fb0 (indeed writing to this only works outside X)
3 - What i need is a Low level way to display the frame (that i have on a GPU buffer RGBA float4*) in order to reduce GPU usage, if it’s not “Xwindowed”, this is not a problem.
As i already have an RGBA float4* frame on my GPU buffer, i thought i was close to something that could be usable to display on screen, just like nvoverlaysink does. this plugins accepts video/x-raw(memory:NVMM)…but i miss some info on the behaviour of this plugin.
4 - I do not really understand your point here:
Do you suggest i should use this buf in my prg to pass/copy to the fb ? Would this not be a waste of CPU getting for userland to kernel then back to GPU ? This will work only without X?
My thought from your original post is that you did not want to use any display, and just wanted the overlayed image. In which case you would probably want access to it from the CPU for transmission of the network or saving to disk, ect.
If you want to use nvoverlaysink, it is a gstreamer element, so you might find DeepStream SDK is a better fit for this since it can already use it as output.