[Jetson-TK1] RAM clock CPU-GPU Hybrid Processing Slow

Greetings,

I have some performance issues on the Jetson-TK1. I have implemented a video processing filter which rotates a video stream for both the CPU and the GPU using CUDA. The workload can be partitioned between the GPU and the CPU (for example x % load to the CPU, and (100-x) % to the GPU) and is very fine-grained. The problem is, I don’t get the performance I expect, and I am beginning to wonder if the performance issues can be caused by memory conflicts? My implementation uses zero-copy memory, so the memory regions that are read / written are shared between the CPU and the GPU.

Is there any way to check memory clock speed / clock it manually?

When doing performance measurements I recommend maximising CPU, GPU and EMC clocks. More about them here:

http://elinux.org/Jetson/Performance

Read the last line for the EMC.

EDIT: I don’t know if this relates to the issue you are seeing, but there’s another thread open about OpenCV and zero-copy: https://devtalk.nvidia.com/default/topic/810053/embedded-systems/opencv-performance-tk1/

Thanks for the suggestion. I had not noticed the EMC clock at the bottom there… Does this only drive the DRAM clock?

There are so many clocks that I don’t really know how they are connected.

sudo cat /sys/kernel/debug/clock/clock_tree