I am using TX2 to detect input video(H264) and output to mp4 file.
The system is like:
3rd party library decode H264 with ffmpeg → TX2 infer decoded image → write and encode image to mp4 file(use jetson-utils videoOutput with gstreamer).
But I always got CUDA error in gstEncoder::Render() function when output image. Error is like below.
[cuda] unspecified launch failure (error 719) (hex 0x2CF)
Previously I used jetson-utils videoSource(gstreamer) decode H264 and videoOut to encode mp4 file, it has no problems.
The 3-rd party library use ffmpeg to decode H264 in another thread, I am not sure if the issue is related to multiple threads run on one GPU device. Can I call CUDA runtime API from multiple different threads?
Hi @harry_xiaye, are you using jetson-inference for the inference portion too? If so, does the pipeline run ok with no video output or videoOutput('display://0')? I am wondering if there is problem earlier in the pipeline.
If you aren’t using jetson-inference for inference, what kind of image are you feeding gstEncoder::Render()? Is the memory been allocated on the GPU?
I am not using jetson-inference. The image I feeding gstEncoder::Render is RGB3 data. Actually I used videoOutput in jetson-utils to output image to mp4 file. It will call output->Render(image, Width, Height) to feed the image to gstEncoder::Render. The image is buffer of uchar3 type, and this memory of image is not allocated in GPU.
With the same inference and output(decode to mp4) code, if I use jetson-utils videoSource and input, I have no any issues.
Ah ok, gotcha - the memory would need to be allocated on GPU. Try using the cudaAllocMapped() function - this will allocate memory that is shared between the CPU and GPU (since Jetson shares the same physical memory between the CPU/GPU, it can use zero-copy memory).
If you image is in another buffer, you can do a simply memcpy() to the buffer you allocate with cudaAllocMapped(), since the cudaAllocMapped() pointer is accessible from both the CPU and GPU. Then pass that pointer from cudaAllocMapped() to gstEncoder::Render().
So when I used jetson-utils videoSource to decode H264, the image memory is allocated on GPU, that is why I have no issues when using videoSource as input, right?