Does onnxruntime use data parallelism or model parallelism?

peppapiggy · July 26, 2020, 9:37am

Hi,

I build onnxruntime with ‘openmp’ on Nvidia AGX Xaiver. When i run onnx model (resnet) on cpu with onnxruntime, multiple threads are created to perform code. If I set four cpu cores, four threads will be created. The follow pictures are the result of running ‘pythoncifar10.py’. So I have two questions: 1. Is the num of threads producing by ‘openmp’ equal to cpu cores? 2. When onnxruntime produces multiple threads, whether it uses data parallelism or model parallelism?

Thanks!

AastaLLL · July 27, 2020, 4:19am

Hi,

This issue is related to onnxruntime implementation.
You can get a better support from onnxruntime team.

1. This depends on the onnxruntime source you used.
For example, the threads number is set to the CPU cores number in this test:

github.com

microsoft/onnxruntime/blob/main/onnxruntime/test/perftest/performance_runner.cc#L34


      
          #if defined(__GNUC__)
          #pragma GCC diagnostic push
          #pragma GCC diagnostic ignored "-Wunused-parameter"
          #pragma GCC diagnostic ignored "-Wunused-result"
          // cmake/external/eigen/unsupported/Eigen/CXX11/../../../Eigen/src/Core/arch/NEON/PacketMath.h:1633:9:
          // error: ‘void* memcpy(void*, const void*, size_t)’ copying an object of non-trivial type ‘Eigen::internal::Packet4c’
          // {aka ‘struct Eigen::internal::eigen_packet_wrapper<int, 2>’} from an array of ‘const int8_t’
          // {aka ‘const signed char’} [-Werror=class-memaccess]
          #ifdef HAS_CLASS_MEMACCESS
          #pragma GCC diagnostic ignored "-Wclass-memaccess"
          #endif
          #endif
          #include <unsupported/Eigen/CXX11/ThreadPool>
          #if defined(__GNUC__)
          #pragma GCC diagnostic pop
          #endif
          using DefaultThreadPoolType = Eigen::ThreadPool;
          static std::unique_ptr<DefaultThreadPoolType> default_pool;
          static std::once_flag default_pool_init;
          Eigen::ThreadPoolInterface* GetDefaultThreadPool(const onnxruntime::Env& env) {
            std::call_once(default_pool_init, [&env] {

2. It seems an operation-level parallelism.

Thanks.

peppapiggy · July 27, 2020, 9:07am

Thanks for fast reply! Get it!