How to divide job to different CPU on ARM A57 in kernel module

Hi , I am developing video capture card driver with 4K signal. I use “memcpy()” to copy 16 MB video data to AP. I use four memcpy to copy 4MB video data each(since 4M is DMA maximum size), it costs 6.x ms to complete each memcpy. It costs 26ms to complete coping 16MB. It must be done less than 15 ms. CPU usage is 80%, 30%, 30%, 15%, 15%, 15% for each core. I wonder if there is any way to divide this memory copy job to each CPUs to save time. Any suggest is appreciated.

Maybe check with the taskset utility.

Thanks for your reply. I resolved it by work queue