How to achieve best performances in feeding data to VisionWorks vx_image?

albestro · January 26, 2017, 10:20am

Hi,
I’m working on a TX1 with a VisionWorks vx_graph and profiling shows a lot of time spent in data transfer between CPU and GPU; it sounds wasteful to me because on TX1 the memory is shared between CPU and GPU.

In order to reduce them I tried to exploit the CUDA UVA/ZeroCopy feature by allocating pinned memory and using vx_image with the flag NVX_MEMORY_TYPE_CUDA, expecting not to see memcpy between host and device anymore, but vx_graph continues to copy data internally.

I looked also at NVXIO FrameSource but it does not seem to exploit CUDA UVA feature.

What is the best way to feed data to VisionWorks avoiding (apparently) redundant copies?

Thank you

AastaLLL · February 2, 2017, 7:06am

Hi,

Thanks for your question.
Although memory is shared, caches are not coherent. So there’s always some penalty for switching between CPU and GPU.

For the sample using NVX_MEMORY_TYPE_CUDA, please refer to the opengl_interop example.

Topic		Replies	Views
Real-time GPU processing Peer 2 peer data copy, Linux kernel memory, kernels in kernel, CUDA Programming and Performance	35	8271	June 30, 2010
newbie: Host to GPU overhead CUDA Programming and Performance	4	4619	April 23, 2009
TX1 camera data to GPU memory directly Jetson TX1	3	2986	May 27, 2016
Jetson TK1 memory management Jetson TK1	0	903	September 27, 2014
Copies between CPU and GPU CUDA Programming and Performance	8	5433	November 3, 2009
For TX1, are the device and host memories the same and indentical? Jetson TX1	8	1169	August 19, 2016
GPU to GPU transfers most effective method? CUDA Programming and Performance	27	38369	March 3, 2011
Common memory Jetson TX1	4	668	October 18, 2021
The memory sharing between cpu and gpu in Jetson TX2 Jetson TX2	6	7334	October 18, 2021
Neural network on GPU, physics on CPU? CUDA Programming and Performance	13	5722	October 6, 2013

How to achieve best performances in feeding data to VisionWorks vx_image?

Related topics