How to reduce CPU utilization when running a CUDA program on TX2 platform

On TX2, When I run a CUDA program (using GPU to process something), the CPU utilization is almost 100%. According to NVIDIA documents, I use cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync) in the initialization of the program, but it is useless (while this method is useful on X86 platform). How to reduce a CUDA program’s CPU utilization? Thanks.


Could you profile your program with nvprof first?
This will help us figure out the major tasks of CPU and be able to give a further suggestion.

sudo ./nvprof -o data.nvvp [your program]



I solve this issue after installing the newest jetpack, thanks.