Pinned memory implementation Increases Overall Execution time Drastically - Drive Px2

Hi All,

I have the code with following implementation

  1. Allocate memory during initialization( Host)
  2. Fill data into allocated buffer (Host)
  3. Transfer data to device and kernel operation
  4. use Host buffer on the CPU side

I used two different hardware for the same code.
Following is the response

Device : Quadro M1000M with 5.0 capability

I implemented Pinned Memory(Page-Locked) and mapped memory methods. Timing difference is negligible.

  1. Allocate memory during initialization( Host) - only once - So didn’t profile this part
  2. Fill data into allocated buffer (Host) - 1ms( 1024 x 640 buffer)
  3. Transfer data to device and kernel operation - 1ms for transfer
  4. use Host buffer on the CPU side - 3 ms

Device : Drive Px2(dGPU) with 6.1 capability

I implemented Pinned Memory(Page-Locked) and mapped memory methods. Timing difference is negligible.

  1. Allocate memory during initialization( Host) - only once - So didn’t profile this part
  2. Fill data into allocated buffer (Host) - 3ms( 1024 x 640 buffer)
  3. Transfer data to device and kernel operation - 1ms for transfer
  4. use Host buffer on the CPU side - 54 ms

same code for the pinned memory increases overall CPU run time drastically.

Note: CUDA device flag is set during initialization for mapped memory

Please let me know the missing part of my implementation.

Thanks,

You may receive better / faster answers in the sub-forum dedicated to the Drive platform:

https://devtalk.nvidia.com/default/board/182/drive-platforms/