record and run inference at the same time, split video

is it possible to capture incoming video frame and run inferencing at the same time?

the thing I found closest is the raspberry pi’s splitter for video capturing.

so I’m thinking it is possible to record and infer on the same frame

import picamera

with picamera.PiCamera() as camera:
    camera.resolution = (1280, 720)
    camera.framerate = 30
    camera.start_recording('Q00.h264', splitter_port=1, bitrate=1000000, quantization=0)
    camera.start_recording('Q01.h264', splitter_port=2, bitrate=1000000, quantization=1)
    camera.start_recording('Q20.h264', splitter_port=3, bitrate=1000000, quantization=20)
    camera.wait_recording(10)
    camera.stop_recording(splitter_port=1)
    camera.stop_recording(splitter_port=2)
    camera.stop_recording(splitter_port=3)

Yes, you can do this. Exactly how you do this depends on which code base you use for inferencing.
In C++ you’d open the encoder V4L2 device together with the capture device, and when you get a captured buffer, copy it over to the encoder as well as passing it on to the inference kernel.

Well, I’m not using anything complicated, just started using nano and xavier. downloaded jetson-inference and testing models comes along with them, no experience on coding.

jetson.utils.gstCamera will continue to capture in the background while you do something else.

Please see the jetson inference python camera inference examples here.

...
while display.IsOpen():
	# capture the image
	img, width, height = camera.CaptureRGBA()

	# detect objects in the image (with overlay)
	detections = net.Detect(img, width, height)

	# print the detections
	print("detected {:d} objects in image".format(len(detections)))

	for detection in detections:
		print(detection)

	# render the image
	display.RenderOnce(img, width, height)
...

In that code, the camera is continuing to capture internally while everything else is going on. There is nothing more you need to do. camera.CaptureRGBA will block until a frame is ready unless timeout=0 is passed as a keyword argument in which case it will return None and raise an exception if a frame is not ready when it’s called. If you really need the camera to operate in a non-blocking mode, that’s how you can do it in Python, but if you just want to infer and capture at the same time, there’s nothing more you need to do. The examples are great.

Thanks for the info, I’ve checked all the directories and several code examples before posting,

What I’m trying to do is,

open camera
get image
save 1st copy of image & simultaneously run inference on 2nd copy
save 2nd copy with result (i.e. dog, cat, car, boat etc.)
(optional) display input and output result side by side

I guess I have to use some opencv or some sort to get the result, will look into that.

Performance-wise, you’re probably better off with jetson-inference or the underlying libraries. OpenCV is slow and limited on Tegra (not Tegra’s fault, most OpenCV algorithms are CPU based and don’t use the GPU).

open camera
get image
save 1st copy of image & simultaneously run inference on 2nd copy
save 2nd copy with result (i.e. dog, cat, car, boat etc.)

This (should) do all that (I haven’t tested. Let me know if the saving doesn’t work):

import jetson.utils
import jetson.inference
import json

def cli_main():
    """from jetson inference detectnet-camera example"""
    import argparse

    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter)

    parser.add_argument("--camera", default="0",
                        help="Index of the MIPI CSI camera to use (NULL for CSI camera 0)\n"
                             "or for VL42 cameras the /dev/video node to use.\n"
                             "By default, MIPI CSI camera 0 will be used.")
    parser.add_argument("--width", type=int, default=1280,
                        help="desired width of camera stream (default is 1280 pixels)")
    parser.add_argument("--height", type=int, default=720,
                        help="desired height of camera stream (default is 720 pixels)")
    parser.add_argument("--dumpfile", help="json lines dumpfile",
                        default="dump.jl")

    args = parser.parse_args()
    main(**vars(args))

def main(width=1280, height=720, camera="0", dumpfile="dump.jl"):

    camera = jetson.utils.gstCamera(width, height, camera)
    classifier = jetson.inference.imageNet("googlenet")

    frame_count = 0
    with open(dumpfile, "w") as f:
        try:
            while True:
                image = camera.CaptureRGBA(zeroCopy=True)
                class_id, confidence = classifier.Classify(*image)
                class_description = classifier.GetClassDesc(class_id)
                jetson.utils.saveImageRGBA(f"{frame_count}.jpg", *image)
                f.write(json.dumps(
                    {"fnum": frame_count, "desc": class_description}) + "\n")
                frame_count += 1
        except KeyboardInterrupt:
            print("Got interrupt. Quitting.")

if __name__ == '__main__':
    cli_main()

That’s mostly from the examples apart from the PIL part, which saves the image. You’ll need to “pip(3) install Pillow” in addition installing jetson-inference. Run it with --help for the usage options and press ctrl+c to stop capture and inference. The saving doesn’t technically run at the same time as the inference, but I can’t see a way to make it any faster in Python. In c++ it could possibly go a little faster, but it would be a lot more complicated to write. I’ll leave that exercise to you.

[s]Edit: actually, Pillow has this in it’s fromarray source:

if mode in ["1", "L", "I", "P", "F"]:
    ndmax = 2

ndim (number of channels in this case) is 4 for a RGBA image. You may have to find another library to save a 4 channel floating point image to file or convert to uint8 using numpy beforehand. I’ll leave that as an exercise to you as well.[/s] There may already be a utility in jetson.utils. You might want to check.
Edit: There is indeed. Thanks Dusty. Updated the above.

edit: you may have to install libjpeg and headers if you have errors installing Pillow. (sudo apt install libjpeg-dev)

You shouldn’t have to make a spare copy of the camera image, because jetson.inference’s imageNet.Classify() function doesn’t overwrite or modify the input image.

If desired, you can use jetson.utils.saveImageRGBA() without needing to go through PIL or make extra memory copies.

If writing individual image files is too slow, instead look to GStreamer to utilize the hw video encoder to record to an H.264/H.265 video. Or try mounting an external SSD which has higher write speeds.

Hey Dusty, I’m having a bit of a problem with that function in another app using it. It’s not getting the pointer from the PyCapsule, but it works fine with inference elements earlier in my pipeline:

It also doesn’t work simpler example in the above code, or directly from camera to saveImageRGBA.

Edit: read the source and figured out the issue. I needed to use zeroCopy=True when capturing.

I edited the above example to reflect.

Any chance that TODO can get done (at line 102 here)to write a jpg to file from GPU memory? It would probably speed my app up a bit.

Yep, you found the zeroCopy argument - saveImageRGBA() expects memory that is accessible from CPU (which zeroCopy memory is)

I should do that for convenience, yes, - but note that it will actually be slower than just putting the image in zeroCopy memory in the first place (because what will need to be done, is a temporary buffer allocated, and then the GPU memory cudaMemcpy’d back to CPU).

Thanks, but if that’s the case, there’s no rush on my part. There isn’t a nvidia jpeg encoder somewhere that wouldn’t require a copy from GPU memory like is used by nvjpegenc (I assumed anyway)? Somewhere in libargus maybe?

[s]Edit: could this be used without a copy?
https://docs.nvidia.com/jetson/l4t-multimedia/NvJpegEncoder_8h_source.html[/s]
Read docs. I guess not.

Thank you, Dusty for your help! For giggles I made something fancier than the above and published it on Pypi and GitHub.

You may now pip3 install jetstreamer to perform various inferences on camera images (classification, detection) while writing them to file along with the metadata in a sidecar file.

The README.md is overly verbose and will need to be cut down to size, but the --help is short and to the point. Hopefully somebody will find it useful. The first thing i’m going to do is actually slow it down so the camera emits at a slower frame rate. I don’t need as-fast-as-it-can-go for my purposes.

The GPU or image processing parts can totally encode your RGBA image in GPU memory into JPEG.

Then what? You need to read the encoded data back to the CPU so it can write it to disk… I think that’s the additional memory allocation that dusty_nv is talking about.

WOW! You Sir are the King of DevTalk, I salute you! Works like a charm!

Thank you and I’m glad it works for you, but Snarky and Dusty are certainly more experienced programmers. My Python is decent, but I am learning all the time. I will update JetStreamer as time goes on, but probably not in September since I will be on vacation… we’ll see.

Yeah. I was hoping to accomplish the first part on the GPU so I would only have to copy a compressed RGB frame rather than a full RGBA float frame, if that makes any sense.

I may be totally confused but I think I remember reading somewhere that the nano has dedicated hardware for the task. I may be thinking of another dev board. It’s crowded around here. I think my spouse has given up on my dev board addiction.

That’s great mdegans, thanks for sharing that and putting it on PyPI. Sometime I will have to look into if that is possible to do with jetson.inference to make the install easier (I’m not sure it can exist soley as a pip wheel due to the configuration and setup).

When you get a chance, you could post JetStreamer to the Jetson Projects forum if you don’t mind. Enjoy your vacation!

There are various commercial and open-source software libraries out there that do JPEG encoding on the GPU with CUDA, however using CUDA this would take away from inferencing performance and other GPU workloads, so that isn’t really desirable.

There is, the hw-accelerated nvjpegdec / nvjpegenc elements are available through GStreamer (see the L4T Accelerated GStreamer User Guide for examples), but I would need to do the same memory copy to get it there (unless it was already in zeroCopy memory) and that saveImageRGBA() function does support saving other formats like PNG that I tend to use. I also have the gstEncoder class for H.264/H.265 in jetson-utils, but haven’t created the bindings for the Python version yet - although one could just use GStreamer from Python like here. And you could try the same using GStreamer from Python with nvjpeg element instead of video encoder if desired.

The hw-accelerated NVJPEG codec is also accessible from V4L2, see the 05_jpeg_encode and 06_jpeg_encode samples from the L4T Multimedia API.

I will certainly do that by the end of the week. Thanks!

Most python libraries I’ve used will just fail to build if dependencies are missing (like Pillow won’t install on Nano unless you “sudo install libjpeg-dev etc…” first). Perhaps you could provide two step installation instructions where the first is “sudo apt install such-and-such” and the second is “pip install jetson-inference”. Then I can just copy your first step into my README.md’s “Requirements” and add jetson-inference to my setup.py’s install_requires to have it automatically download and built. It would certainly be nice.

Well, the way JetStreamer is written, pipeline operations are (for the moment) still sequential because of the way standard generators (as opposed to async) work in python, so only one element of the pipeline is operating on a frame at any given time (although from what I understand your camera is still queuing up frames in it’s ring buffer). Because of this it may be worth it for me. I will look into whether such a library would be easy to add and whether it’s compatible with my license. Thank you for the suggestion.

I could use async generators to do things cooperatively but the syntax is harder, nothing is allowed to block (well, it can, but it’ll block the whole event loop) and it would require heavy modification to Jetson Inference to make functions awaitable. I’ve never done it before, and it would be a fun challenge, but for me it’s fast enough as it is, and trying to do more than one thing at once on the GPU might hurt performance as you say.

Aha. In this case I think i’ll just continue to use zeroCopy and that’ll allow me to do other things I want to do with numpyFromCuda, like image hashing. I’ve avoided writing GStreamer in Python because it’s about as verbose and ugly as GStreamer written in C. Also it’s more limited than the C version and I suspect much less performant because of the frequent calls to C code from python (parsing messages from the bus, for example, or a callback per buffer). I really wish there were something like GStreamer written in modern c++.

I added an --interval option to capture at regular intervals, rather than as fast as the pipeline will go. If it misses it’s target, it’ll print out a warning, increment the frame counter, and wait for the next interval. It’s not super super precise so it’s meant for larger values like 1/2 second or greater than a second.

A timestamp is still stored with every frame. You can upgrade with “pip3 install --upgrade jetstreamer”.

The hw-accelerated NVJPEG codec is also accessible from V4L2, see the 05_jpeg_encode and 06_jpeg_encode samples from the L4T Multimedia API.

(Although these are also written in C - although the PyPI V4L2 module does exist if you were to use V4L2 from Python).

I looked at that before but wasn’t sure if it would work for me. If I do it, I’d like to do it right by making a C extension that accepts a PyCapsule since that’s what the rest of the pipeline is using.

So I guess I need to convert the buffer from float4 to yuv420, assign it a file descriptor, and pass the fd to the encoder?

The documentation seems to indicate passing a buffer directly would be slower because it has to copy from software to hardware memory. Does that still apply if the buffer is allocated using zero copy?

I’m won’t be able to work on this over the next month, but you’ve certainly given me something to think about. Thanks again!