RegisterResource sometimes fails with multiple cards

Hi Leif

The problem is that this our decoders is old code that we have added a DirectX pipeline to , so that stuff like post processing runs hardware
accelerated no matter the hardware.

I do think that it is kind off messy mixing RT and driver API, but rewriting the decoder isn’t really an option right now. If we were to rewrite
the decoder, I would probably insist that we do decoding through DirectX so that all decoders are the same, which would mean that all hardware would work the same and our code base would be so much easier to maintain
😊

But whether we use runtime or driver API, or mixing them, seems a bit moot since the problem persist in my sandbox code whether I use one API
or the other, so until I have something that work in either runtime or driver API I can’t really ask management for time to refactor one way or another

Regarding the use of cudaGetLastError and our workaround, we already do and it isn’t a problem that needs solving , it was just extra information
that might help you guys figure out what goes wrong in the registerresource call.

Our workaround involves copying frames from Nvidia surfaces to sysmem and then back to DirectX surfaces and with multiple 4K streams the performance
decrease with a factor of approximately 4 , depending on the hardware off course, so we really want the registerresource thing to work.

Imagine the customers bewilderment when they have 2 equal Nvidia cards in their machines and are running 16 4K streams, 8 streams on each,
and one card is delivering 30 fps as it should, and the other is delivering 7-8 fps , they will call our support asking what the heck is wrong, especially since we’ve told them that if they want better performance they should buy some more GPUs.

To be fair, the performance hit they will take on the GPUs that fail aren’t that big compared to the old code, as the old code copied to sysmem
already, but the performance gains on the GPUs that use registerresource will make the difference to the GPUs that fail, will confuse the customers, just as it working on some computers and others not, is very confusing to us.

image008.jpg

image009.jpg

image010.jpg

image011.jpg

image012.jpg

image013.png

image014.png

Hi @mibosripl ,
We can’t find the excatly same setp, is it possible for you to try to reproduce this issue with the latest driver and CUDA release, if it’s still reproduciable, share us the complete repo code, including the makefile, we will use this repo to find help from our developer team, but, sorry, I can’t promise the fix ETA.

Thanks!

TestNvidiaDxThreadsv4.zip (8.0 KB) NvidiaDevMachine.txt (2.0 KB) NvidiaMandrake.txt (1.9 KB)

Here’s the solution i use for testing , please notice that it has configuration for building both cuda 10.02 and 11 (latest , i downloaded the latest after your last reply ), the program now uses Runtime API by default , but if you add cu (TestNvidiaDxThreads11_0.exe cu) to the commandline it uses Driver API

There is a bit of new info , the developer machines (see attached file NvidiaDevMachine.txt) still return 101 on the register on the second GPU that tries to register , BUT our performance test machines (NvidiaMandrake.txt) returns 999 with Runtime API but not on Driver API

I hope that gives you something to work on as our performance really suffers when RegisterResource fails and we have to go through system memory to get from Nvidia surfaces to DX11 surfaces

Hi @mibosripl,
Please help review my below understanding.

Running attached code - TestNvidiaDxThreadsv4.zip on dual RTX 2080Ti + Window 10 (or win 7?) + CUDA 10.2 or CUDA 11 system, it crashed as attached two logs : NvidiaDevMachine.txt and NvidiaMandrake.txt?

One question:
What’s the difference between developer machines and performance test machines?

Thanks!

sorry for the late response, i’ve been sick since friday NvidiaDevMachine_2.txt (2.0 KB) NvidiaMonster.txt (2.0 KB)

Two more logs for you, NvidiaMonster.txt is from another performance testing machine, and NvidiaDevMachine_2.txt is from another developer machine

all four machines are running fully updated windows 10 and have latest Nvidia drivers

differences between perfomance test machines and dev machines are :

  • perfomance test machines are really clean nothing has been installed on them except our software and some testing tools, while dev machines are normal workstations and have Visual Studio , Nvidia SDKs , Nsight , and such installed

  • Performance test machines have matching graphic cards while the dev machines doesn’t, NvidiaDevMachine.txt has a 1080 and a 1060 installed , and NvidiaDevMachine_2.txt has two 1060s from the same manufacturer but looking at the cards they look quite different , so i won’t say they are a matching set

which reminds me , we had an issue testing out software on the performance test machines, as GetDevice on either of the two matching GPUs both return a Nvidia Device ID of 0 ??? which can’t be right ?? we’ve fixed it by matching Dx11 and Nvidia device by luid, but it is a workaround

Hi @mibosripl,
I’ll file a ticket internally to ask help from Window CUDA experter.
WIll get back to you once I get the progress!

Thanks for your patience!

Hi @mibosripl,
Could you find out why the error log from perfomance test machines and dev machines are different?

Thanks!

Hi @mchi ,

i’m sorry but i must ask you to be more specific or else i don’t know what i’m looking for or what is relevant, and i don’t know in which cases (with regards to code) the APIs return 101 or 999

all four machines are running an updated windows 10 and newest Nvidia drivers, and as i wrote earlier , the perf test machines run matched sets of GPUs whereas the dev machines run mismatched GPUs

@mchi,

any progress on this ? This problem greatly reduces performance with our customers with more than one Nvidia GPU :-(