We have a problem were doing a cudaGraphicsD3D11RegisterResource on multiple cards fails on some machines , on my machine it has never worked , on a colleagues machine it worked until updating the drivers, and downgrading again didn’t help , on a third machine it works flawlessly
this is a critical issue for us as many customers have bought multiple Nvidia cards to improve decoding performance
my machine GTX 1080 + GTX 1060
Colleagues machine dual GTX 1060
Working machine dual RTX 2080Ti
Hi Mibosripl,
Sorry for late response!
The error log - “cudaErrorInvalidDevice” indicates the cuda api call does not run on the target device.
Could you refer to Programming Guide :: CUDA Toolkit Documentation to call cudaSetDevice() after CreateDeviceEx() call.
I tried that, but then I get a CUDA_ERROR_CONTEXT_IS_DESTROYED (709) when doing cuMemAllocPitch :-(
And then there’s the detail that it is currently working on two other machines, one with dual GTX 1080Ti , and one with dual RTX2080Ti , the same code i posted in this thread and that fail on my GTX 1060+GTX1080 machine , and another with dual GTX 1060
i played around with it some more, my daughter is playing in her room and i’m kinda baffled by the fact that the problem isn’t a specific GPU in my machine, that the first GPU to do registerresource work, but the following ones doesn’t
cudaGetDevice returns the correct deviceId (that cudaD3D11GetDevice returned) even without cudaSetDevice
cuCTXGetCurrent returns the same context i set earlier using the deviceId that cudaD3D11GetDevice returned
cuCTXGetDevice returns the same deviceId i set earlier , that cudaD3D11GetDevice returned
everything i can think of is exactly as expected, but still the register resource call fail with cudaErrorInvalidDevice (101)
Please remember that we have other dual GPU machines where it works on both GPUs in parallel , without releasing everything
and since we’re trying to decode a lot of streams at the same time, loadbalancing between all GPU’s, not being able to utilize multiple Nvidia GPUs is a major showstopper :-(
According to the log, your 1080 should have id :0 and 1060 id:1.
As as result, you should not use “set CUDA_VISIBLE_DEVICES=2” but “set CUDA_VISIBLE_DEVICES=1” instead.
Could you try “CUDA_VISIBLE_DEVICES=1”?
What is your CUDA and GPU driver version
It seems that CUDA is not the latest version on your side.
if CUDA_VISIBLE_DEVICES just enables disables one or more GPUs i’m not suprised by the result, the code works on both adapters fine on their own, but trying to do registerresource on two GPU’s will always fail on some machines, and it’s always the second GPU that registerresource fails on, i.e. if i do register resource on my 1080 the 1060 will fail, and vice versa
BUT if the first GPU unregisters and releases it’s DX resources the second GPU will be fine, but as long as the first GPU (1080 or 1060, it doesn’t matter) still has registered resources the second GPU will always fail.
We have made a fallback solution where instead of doing registerresource we copy from CUDA to sysmem and from there to DX memory and that works, except for the horrible performance on the GPUs that can’t registerresource
one thing that we noticed though was that if registerresource fail, ALL operations thereafter fail on that GPU/thread, we have to reinitialize on that GPU again , it’s as if registerresource corrupts the context/whatever
Question 2 :
CUDA version is 10.1, but i downloaded 10.2 and tried that as well with same outcome
Hi Mibisripl,
Because it’s hard for us to debug the issue, could you try the two actions:
Action 1:
move “CUresult res = cuInit(0);” to main() and keep one instance
The system wide API is for all devices
use cuD3D11GetDevice instead of cudaD3D11GetDevice
Because cudaD3D11GetDevice() return device id is ordinal value which cannot be used for cuCtxCreate (), cuCtxCreate () request CUdevice which is device handle ( check the data type below).
From
int cudaDevIx = -1;
cudaD3D11GetDevice ( int* pcudaDevIx , IDXGIAdapter* pAdapter )
cuCtxCreate ( CUcontext* pctx, unsigned int flags, CUdevice cudaDevIx )
to
CUdevice cudaDevIx = -1;
cuD3D11GetDevice(CUdevice* pCudaDevice, IDXGIAdapter* pAdapter )
cuCtxCreate ( CUcontext* pctx, unsigned int flags, CUdevice dev )
OR
Action 2:
USE CUDA RT API instead of CUDA driver API compeltely
remove “cuInit(0);”, “cuCtxCreate();” and “cuCtxPushCurrent();”
use cudaMallocPitch() instead of cuMemAllocPitch();
use cudaSetDevice(0) for 1080 / cudaSetDevice(1) for 1060 before cudaMallocPitch()
Anyway, we recommand CUDA RT API because it is easy to use.
And our code should be not a a mixture of CUDA RT API and CUDA driver API.
I realize that locating the problem without being able to reproduce it takes a lot of trial and error, and I’ll do my best to test stuff for
you guys and reporting my findings as accurately as possible 😊
after #ifdef’ing the heck out of my test program I have established that
It fails the same way with
RT and Driver API
Cuda 10.1 and 10.2
Debug and release
Registerresource on first the 1080 and on the 1060 first
I have tested all combinations, all fails
Neither of the GPUs shows any issues on their own, but after doing registerresources on one or the other, the subsequent GPUs fail. The first
adapter can continue doing registerresource without any problem though.
RegisterResource on subsequent GPUs not only fail, all cuda operations on that context fails, even freeing the memory allocated with cuMemAllocPitch/cudaMallocPitch
fails with 101 invalid device.
If however the first GPU does unregisterresource and releases all it’s Dx11 resources , the second GPU works just fine
My theory is that something about the registerresource code violates the context in rare situations/installations. My manager has a theory
that it only happens on developer machines, I’m not sure I buy that, however the two machines that fail are developer machines though, and the two that doesn’t fail aren’t.
If an error happened during CUDA invoking, the subsequent APIs cannot work.
Could you help that :
Use CUDA runtime API instead of CUDA driver API as like as commet #18 in Apr 23.
Add cuda cudaGetLastError() after every CUDA RT API
auto status = cudaGetLastError() ;
std::cout<< "Error Code: “<<status<<” ErrorString "<< cudaGetErrorString(status)<<std::endl;