I have been doing some performance measurements using the DXIFRShim sample, and I have started to notice that the shim layer seems to use a huge amount of CPU.
Some background first. Here is what I am using:
- System Specs
- GRID K520
- 4 GB RAM
- Intel Xeon E3-2603 v3 in a VM. Has 2 CPU cores @1.6 GHz
- Capture SDK 5.0
- Windows 8.1 64-bit
- Game is Unreal Engine's first person shooter template (version 4.13)
- DXIFRShim sample
The CPU measurement is done in a simple way: I use task manager to look at the percentages given for the game process.
Here is what I noticed:
- When the game is running without the shim layer (i.e. original .dll files), the CPU utilization % is about 22%.
- When the game is running with DXIFRShim, the CPU utilization % is about 40%. This is almost double.
Considering that the whole point of the Capture SDK is to do much faster GPU encoding instead of CPU encoding, having the CPU utilization % to almost double is rather strange.
I went ahead to comment out huge chunks of code from NvIFREncoder.cpp, and found out that this line is the offender:
NVIFRRESULT nr = pIFR->NvIFRSetUpHWEncoder(¶ms);
Leaving this line in and commenting out everything after it (which includes the actual encoding and writing to video file) still results in high CPU utilization %. The moment this line is commented out, the CPU utilization goes back to the percentages as if the shim layer was not there. I found this rather strange since one would expect the majority of the work to be in the encoding loop, which would be the "while" loop containing "NvIFRTransferRenderTargetToHWEncoder();".
I am not able to look at the implementation of NvIFRSetUpHWEncoder itself, so I am wondering: why is the CPU usage rather high? Is there a way to avoid or reduce it while still being able to do GPU encoding?