cudaToNumpy() defect?

drakorg · December 27, 2020, 7:20am

Hi, I’ve noticed this in the past but I wasn’t sure where it came from and didnt have much time to dig into it as I was focused on other areas of the project. However now that I got back into this part, I still see that the problem persists, and after some digging I believe it has something to do with the cudaToNumpy() function … Apparently the cudaToNumpy() call works on some buffer that gets corrupted (or rewritten into) before we can finish working on the final numpy array (even if the first thing I do with this numpy array is to safe copy to another place: the b = a.copy() below)

You can reproduce this issue by modifying the detectnet-camera.py in the following manner:

while True:
    img = input.Capture()
    a = jetson.utils.cudaToNumpy()
    b = a.copy()
    cv2.imwrite('/tmp/{}.jpg'.format(time.monotonic()), b)
    detections = net.Detect(img, overlay=opt.overlay)

I would expect all the frames that come from the input (videoSource) to be saved as they came to disk, but they get corrupted/overwritten (you can even see some pieces of the overlays that are made in the detection step to show up in the numpy array just returned by the cudaToNumpy().

I can also confirm that the cudaImages returned by the videoSource component themselves are not corrupted because they show up perfectly on the opengl window without any corruption/overwriting, but when they are converted to numpy they start to show some signs of corruption/overwriting.

I can also confirm that if I sleep 1 second after the Capture I don’t experience this problem anymore. I’m sure that with a much smaller delay it would also work, but I wouldn’t really like to walk that path, I think we all agree that this should work with no delays at all.

Can anyone confirm/explain this behaviour?

Thank you,
Best regards,
Eduardo

drakorg · December 27, 2020, 3:34pm

Hi, I’ll leave the thread open, but I think I’ve found the reason/solution.
Apparently there’s a function that basically tells the code to wait until the gpu has finished processing whatever it is that it was doing. In this case the gpu was probably still processing the overlay operation of the bounding boxes/description of the previous frame by the time I wanted to access the cuda memory and make a copy of it. So a call to jetson.utils.cudaDeviceSynchronize() just before the cudaToNumpy() apparently waits until the ongoing operations on the cuda memory finish, doing the trick. Something similar to what I achieved with the fixed delays in my previous experiments, but in a much better sychronized manner.

Thank you and best regards.
Eduardo

dusty_nv · December 28, 2020, 7:31pm

Hi @drakorg, that is correct - after performing asynchronous GPU operations, you should use the cudaDeviceSynchronize() function before attempting to access the data on the CPU.

cudaToNumpy() maps the memory to a numpy array, so the GPU can still change it (and vice versa, changes to the numpy array will show up to the GPU).

Topic		Replies	Views
videoSource.Capture()'s cudaImage to numpy Jetson Nano cuda , jetson-inference	9	3735	October 18, 2021
Python: cudaImage <-> OpenCV conversions very slow Jetson Nano cuda	11	723	January 25, 2023
cudaMemcpy sometimes doesn't work CUDA Programming and Performance	5	4481	November 13, 2008
cudaToNumpy -> cv2.imshow not responding, no video output, no Error - csi camera Jetson Nano camera , opencv , cuda , jetson-inference	13	7714	October 15, 2021
Getting image bits to GPU for Inference (DetectNet) Jetson Nano	12	3350	October 15, 2021
Jetson Nano - Limiting the results shown by the DetectNet example. Jetson Nano	8	2308	October 14, 2021
Shared memory between GPU and CPU Jetson Nano	12	2385	October 14, 2021
'cudaMemcpy() failed to allocate memory' in Threaded module Jetson Nano cuda , python	7	738	August 10, 2022
Two streams are not working asynchronously CUDA Programming and Performance tensorrt , cuda , jetson-inference	7	747	November 20, 2021
Concurrent copy & execution problem Device to host memory copy is not overlapped with kernel exe CUDA Programming and Performance	1	1766	June 23, 2010

cudaToNumpy() defect?

Related topics