scheduling on JetsonTK1

hello… i would ask about programming using Tegra k1 processor as i want to make scheduling of tasks between the 4-ARM cores and the GPU in parallel.
can i use 1-ARM core for sending and receiving the Data from the GPU while the remaining 3-ARM core are processing other jobs without ant affect with the GPU operations.
please help me in this or suggest me other way for best heterogeneous scheduling method.

You’re interested in “CPU/Processor Affinity”:
https://en.wikipedia.org/wiki/Processor_affinity

Keep in mind that you can do any software task on CPU1 through CPU3, but hardware interrupts (meaning most drivers controlling hardware) can occur only on CPU0. I don’t know if the GPU produces hardware interrupts, but as an example, if GPU uses hardware interrupts, then only CPU0 will be able to service it. On a desktop system there would either be an I/O APIC or other architecture differences (NUMA/non-NUMA) to allow balancing of hardware IRQs across cores.

what about the synchronization between running tasks on CPU and GPU ?

What would the specific use-case be for that?

Typically someone would want to run a process at a higher priority or lower priority, which isn’t difficult, but it sounds like you’re interested in modifying the scheduler to be aware of GPU tasks as being different than other tasks and behave in some custom way. This latter would be useful, but also quite difficult to do in comparison to simply upping a priority. One thing which would complicate this is that most threads which are intended for use with the GPU also require the CPU to hand off or prepare for GPU use…a general scheme of just knowing a thread is using GPU would not be very helpful…you’d still end up customizing your program or tools (nvcc) to interact with a GPU-aware scheduler option.

For whatever your use-case is, have you tried upping the priority (a more negative “nice level”…see “man nice” and “man renice”) of your process or thread? E.g., if you start it with the “nice” command on the command line, or use the “renice” command on the running process, you might have the results you’d want. I’d suggest that if you renice to -1 then you’d probably have a noticeable reduction in latency for cases where heavy loading from competing non-system threads is causing an issue (you could increase priority to a nice level of about -4 and not worry about interfering with the operating system…beyond that you may get some unintended side effects). Priority wouldn’t have much effect except under a more heavily loaded system.