Main page of NVidia Codec SDK say that Quadro P4000 have 2 NVENC engines. This noted in section “NVENC - Hardware-Accelerated Video Encoding” in diagram “ENCODE PERFOMANCE” shown that “P4000/P5000/P6000” have x2 NVENC engines:
[url]https://developer.nvidia.com/nvidia-video-codec-sdk[/url]
Quadro P4000 is based on GP104 chip that have two NVENC. But someone say, that one of them is disabled in P4000, is it true? How many enabled NVENC engines Quadro P4000 really have?
Yeah, it’s true.
Do you have Quadro P2000? Can you, please, check performance of this card?
Need to run two ffmpeg encoding processes by (run this command two times simultaneously):
Need to see encoding fps of P2000. I have Quadro P4000 and in my system results of this commands:
frame= 5606 fps=258 q=28.0 Lsize=N/A time=00:03:53.79 bitrate=N/A speed=10.8x
frame= 5606 fps=257 q=28.0 Lsize=N/A time=00:03:53.79 bitrate=N/A speed=10.7x
If I run only one encoding process, I see this results:
frame= 5606 fps=466 q=28.0 Lsize=N/A time=00:03:53.79 bitrate=N/A speed=19.4x
Very thanks. Actually, information om main page of NVidia Codec SDK is wrong, Quadro P4000 have only one enabled NVENC, not two as noted :(
This result is for one encoding stream without overclocking?
Seems difference in encoding performance between P2000 and P4000 is a result of different clock rates only. In default settings P4000 have same encoding performance as overclocked P2000.
You can get more SM and more memory BW with P4000 but you are limited by nvENC (see “nvidia-smi dmon -c 1” in previous posts). Even you buy P5000 with two nvENC you get limit by nvDEC in transcoding situation because all NVidia chips have one nvDEC (see https://developer.nvidia.com/video-encode-decode-gpu-support-matrix#Decoder).
Is it safe to set maximum clocks for card that under encoding load at 24/7/365?
Can I just save some money, buy P2000 and update default clocks or better to buy P4000 and use dafault clocks?
Timestamp : Mon Jun 18 20:04:14 2018
Driver Version : 391.58
Attached GPUs : 1
GPU 00000000:65:00.0
Product Name : Quadro P4000
Product Brand : Quadro
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : WDDM
Pending : WDDM
Minor Number : N/A
VBIOS Version : 86.04.56.00.0B
MultiGPU Board : No
Board ID : 0x6500
GPU Part Number : 900-5G410-1750-000
Inforom Version
Image Version : G410.0501.00.03
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization mode : None
PCI
Bus : 0x65
Device : 0x00
Domain : 0x0000
Device Id : 0x1BB110DE
Bus Id : 00000000:65:00.0
Sub System Id : 0x11A310DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays since reset : 0
Tx Throughput : 9000 KB/s
Rx Throughput : 27000 KB/s
Fan Speed : 46 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 8192 MiB
Used : 449 MiB
Free : 7743 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 229 MiB
Free : 27 MiB
Compute Mode : Default
Utilization
Gpu : 20 %
Memory : 14 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending : N/A
Temperature
GPU Current Temp : 32 C
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 9.62 W
Power Limit : 105.00 W
Default Power Limit : 105.00 W
Enforced Power Limit : 105.00 W
Min Power Limit : 60.00 W
Max Power Limit : 105.00 W
Clocks
Graphics : 139 MHz
SM : 139 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : 1202 MHz
Memory : 3802 MHz
Default Applications Clocks
Graphics : 1202 MHz
Memory : 3802 MHz
Max Clocks
Graphics : 1708 MHz
SM : 1708 MHz
Memory : 3802 MHz
Video : 1544 MHz
Max Customer Boost Clocks
Graphics : 1708 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
I suppose that P4000 activate “boost” (to 1708 MHz and not in default 1202 MHz) when you run encoding (verify this with “nvidia-smi dmon”) and I suppose that P2000 has wrong “boost” management mechanism (not surprise for me) therefore I must manually fix the “graphics” clock. BTW NVidia power management (P states) is unusable in many cases (see https://gridforums.nvidia.com/default/topic/378/).
There is safety mechanism “Clocks Throttle Reasons” that should lower clock for many reasons (see “nvidia-smi -q” section) when running at max speed.
I suppose that P2000 previous test show low utilization load on SM (~10%) and MEM (~20%) that save power and total power draw is only ~30W (less then half of 75W TDP) and it should be OK for 24/7.
Just to clarify, what you are seeing on the “Encode Performance” section of the Video Codec SDK Main page refers to the number of SIMULTANEOUS ENCODING SESSIONS. This is different from the NVENC engines.
So is ‘number of streams’ in the diagram above the maximum simultaneous H264 encoder sessions for that hardware? I have a somewhat older laptop with a K1100M. It appears to support 1 GPU and 2 Streams. If I try to create 3 streams I get NV_ERR_OUT_OF_MEMORY on the 3rd call to nvEncOpenEncodeSessionEx(). If I were on a machine with better hardware would I be able to create 4 or 7 sessions as above? Also is there a programmatic way to determine how many sessions are available prior to calling nvEncOpenEncodeSessionEx() ?
Number of 1080p30/4kp30 encoded streams is performance metrics that depends on number of encoders (nvenc), generation of chip (Kepler/Maxwell/Maxwell2gen/Pascal/Volta), chip frequency and encoder parameters.
So, NVidia limits in API only 2 session for your hardware (K1100M = GK107 = CoreClock 716 Mhz) with one hw encoder (nvenc) Kepler and the performance estimations are for “High Performance”/“Constant QP” (maximum FPS) == cca 219FPS for 1080p (yes, hw is capable in one session this 1080p FPS) == 7 x 1080p30 (but your hw is limited by two sessions by API) == 1 x 4kp30 and for “High Quality”/“Dual Pass” (highest quality) == cca 57 FPS for 1080p == 1x 1080p30 == none 4kp30.