Parallel Implementation on the Nvidia Jetson Nano B01

I have a question regarding parallel implementation on the Nvidia Jetson Nano B01. I have a sequential C code that performs matrix multiplication with square matrices of dimension N=1000. I executed this code on the Nvidia Jetson Nano using a single CPU, and it ran successfully. I measured the execution time of the matrix multiplication function.

Next, I parallelized this code with OpenMP, specifically in the matrix multiplication function where the for loop is located. I utilized the 4 available CPUs on the Nvidia Jetson. I measured the execution time of the parallel version.

My work focuses on comparing the execution times. However, the issue is that the parallel execution time is greater than the sequential execution time?. Is OpenMP working correctly? I would need assistance in resolving this problem.

1 Like

It seems to be an issue in OpenMP. You may try
taskset(1) - Linux manual page
to schedule the process to each CPU core and see if throughput is better. Please also execute sudo jetson_clocks to fix CPU cores running at maximum clock

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.