'cudaMemcpy() failed to allocate memory' in Threaded module

jc5p · July 20, 2022, 2:40am

Hi, Could I please have some of your wisdom to sort out my latest problem.

I am porting a program from using cv2 to using jetson.utils prior to adding jetson.inference
specifically I am using jetson.utils.videoSource and jetson.utils.videoOutput
I have completely gutted the program down to the version attached which has only one input and one output but retains 2 of the threads - that is sufficient to bring on the error.
the error is…

  File "/home/jc/jcCode/gutted/car_controller_P6_0v081g.py", line 80, in get_camera_frame
    rg.G_CudaImageToUse = jetson.utils.cudaMemcpy(image)   # 0v81
Exception: jetson.utils -- cudaMemcpy() failed to allocate memory

My basic strategy is to

allocate one global image for all the ‘consumers’ to use (rg.G_CudaImageToUse)
get_camera_frame()
- gets an image - into a local variable
- copies this to to global area and marks it valid ’ rg.G_CudaImageToUse = jetson.utils.cudaMemcpy(image) ’
show_image_in_window()
- is running at a different frequency
- retrieves the image to a local area (TempCudaImageToUse = jetson.utils.cudaMemcpy(rg.G_CudaImageToUse)
- writes all over it and displays it

In my simple thinking there should only be 3 ‘allocate memory’ steps.
One for the global and one each for the local area variables. Once underway things should just copy
In this version I have delayed the start of show_image_in_window() and it seems to swim along OK until that starts then
it goes pear shaped.
Am I doing something fundamentally wrong with this strategy ? (it worked ok for cv2)
Can I play with cuda memory buffers this way ?
Finally - is there a clue in ‘cudaMemcpy() failed to allocate memory’ what is cudaMemcpy doing trying to allocate memory - it is supposed to be copying it.

Thanks in anticipation
JC
cudaMemcpy failed to allocate memory.log (10.8 KB)
car_controller_P6_0v081g.py (10.8 KB)
rg.py (2.0 KB)

AastaLLL · July 20, 2022, 7:17am

Hi,

The error comes from the following line, which fails to allocate a unified memory.

github.com

dusty-nv/jetson-utils/blob/master/python/bindings/PyCUDA.cpp#L1154


      
          	int src_width = 0;

          	int src_height = 0;

          	imageFormat src_format = IMAGE_UNKNOWN;

          	uint64_t src_timestamp = 0;

          	

          	void* src_ptr = PyCUDA_GetImage(src_capsule, &src_width, &src_height, &src_format, &src_timestamp);

          	

          	if( !src_ptr ) 

          	{

          		PyErr_SetString(PyExc_TypeError, LOG_PY_UTILS "failed to get CUDA image from src argument");

          		return NULL;

          	}

          	

          	// allocate the dst image (if needed)

          	void* dst_ptr = NULL;

          	bool dst_allocated = false;

          

          	if( !dst_capsule )

          	{

          		if( !cudaAllocMapped(&dst_ptr, src_width, src_height, src_format) )

          		{

Have you run the same source without using thread?
If not, could you give it a try?

Thanks.

jc5p · July 20, 2022, 11:49pm

Thanks AastaLLL,
I have run many other, simpler, non-threaded, routines using the commands but it is not possible to run this one without the threads.
Thats why I was asking - what is special about using cuda related things in a threaded environment ?
eg What is the scope of any variable created with cudaAllocMapped() does it behave like any other python variable?
And what is cudaMemcpy doing trying to allocate memory in the first place?

dusty_nv · July 21, 2022, 1:16am

It only allocates it if you use the single-parameter version of jetson.utils.cudaMemcpy(), where the dst buffer isn’t provided by the user. In this case, the function allocates another buffer of the same size, and copies the src buffer to it. If you instead use it like cudaMemcpy(dst, src), it won’t do any allocation.

Are these Python “threads” running in different processes? CUDA memory and contexts aren’t shared across processes.

jc5p · July 21, 2022, 3:13am

Dusty thanks
No they are not in different processes.
I am currently using rg.G_CudaImageToUse = jetson.utils.cudaMemcpy(image) I will try jetson.utils.cudaMemcpy(rg.G_CudaImageToUse, image)

jc5p · July 21, 2022, 3:31am

That works much better! I stopped it at 5000 frames but seeing it used to crash between 30 and 100 or so I think its cured. Does sound like a deep issue though where that (temporary) allocation is not being released cleanly.
JC

dusty_nv · July 21, 2022, 1:09pm

Hmm…it may be that your board was running out of memory after it kept allocating frames. My guess is the Python garbage collector wasn’t deleting it. You could explicitly try del rg.G_CudaImageToUse after you are done with it. Then again, you have it working now (and it’s better to pre-allocate anyways), so just go with that. Glad that you got it working!

system · August 10, 2022, 5:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Memory allocate problem of cudaFromNumpy, using with opencv Jetson Xavier NX opencv , cuda	3	1381	October 18, 2021
Cuda memory access with cudaMallocManaged CUDA Programming and Performance camera , cuda	3	255	September 11, 2024
cudaMemcpy error, all data not being transferred. but cudaMemcpy returns cudaSuccess CUDA Programming and Performance	5	15227	August 7, 2019
videoSource.Capture()'s cudaImage to numpy Jetson Nano cuda , jetson-inference	9	3735	October 18, 2021
Unified memory with CUDA on Jetson Nano needs memcpy? Jetson Nano cuda	9	2339	October 18, 2021
cudaMemcpy is not working in Jetson Xavier CUDA Programming and Performance cuda , computer-vision-cv , jetson	5	530	November 7, 2023
Invalid Managed Memory Access Jetson TX2	2	1222	October 18, 2021
Two streams are not working asynchronously CUDA Programming and Performance tensorrt , cuda , jetson-inference	7	747	November 20, 2021
Unified memory not working completely Jetson TX1	4	1407	October 18, 2021
Dual problems with unified memory Jetson Nano	8	1213	October 14, 2021

'cudaMemcpy() failed to allocate memory' in Threaded module

Related topics