Handling a lost Direct3D9 device

Hi,

I could not find anything about this in the documentation, so I’ll ask here… What is the proper way to handle a lost Direct3D device when using mapped D3D resources with cuda? Should the resources be unmapped and unregistered immediately or at reset stage? Does the cuda context need to be destroyed at some point and then recreated? Should all the cuda d3d functions be able to handle a situation when the device with registered/mapped resources is in lost state without crashing? Currently, whatever combination I try to come up with myself doesn’t seem to quite work.

offtopic: Damn, getting registered here was quite a a task, I guess the captcha is not meant for humans to read. :wacko:

Anyone from Nvidia care to comment? Or is this not supported/considered at all?

hey

having same, above-mentioned, problem

I tried quite a few things already, but nothing seems to work…

checked the sample in SDK, but it doesnt even bother with such thing as lost device
checked reference manual, but no mention about it
tried really a lot of combinations with registering textures, setting D3D device and everything… after device is lost setting D3DDevice causes no errors, but any subsequent resource registration results in “unknown error”

have you found a solution or does anyone knows what to do with it ?

PS my first post here ;)

No. It seems there is either some bug with it or the whole issue is neglected entirely. For now I “solved” it by forgetting about using Cuda, it seems it isn’t quite ready for “real” use.

Thanks for reporting this. What version of CUDA are you using? Can you post some sample code that demonstrates the problem?

2.0

The CUDA SDK package, projects\simpleD3D9 and projects\fluidsD3D9

Run the sample app, then make the D3D device go into lost state (by changing screen resolution, or suspend & wakeup, ctrl-alt-del on windows XP with fast user switching enabled, etc.) and the application fails to recover. Sometimes it just begins writing out errors, sometimes it seems to freeze the whole system for some time. This is on Windows XP SP3 with GeForce GTX 260 and GeForce 8500 GT.

CUDA version - 2.0, tried also with 2.1 beta. same thing happens

Im sorry but cant really post any sample code, it is just a bit too large

I am using DXUT (simple framework provided by Microsoft in DX SDK). it handles all things like lost/reset device, device creation and so on. I didnt modify this part (didnt have reason to) so lets just assume the problem is not caused by me

what Im doing looks like this:

  1. before testing lost device:
    a) cudaD3D9SetDirect3DDevice(DXUTGetD3D9Device()); (using DXUT device) - works fine, no error
    b ) create texture using D3DXCreateTextureFromFileEx(pd3dDevice, L"media/stones.jpg", D3DX_DEFAULT, D3DX_DEFAULT, D3DX_DEFAULT, 0, D3DFMT_UNKNOWN, D3DPOOL_DEFAULT, D3DX_DEFAULT, D3DX_DEFAULT, 0, NULL, NULL, &cudaTex); (pd3dDevice is DXUT device)
    c) pass this texture to CUDA and register it there - no error
    d) map / modify / unmap - no error, works absolutely fine. texture is rendered with modifications made in .cu file, no problems whatsoever

now lets cause a lost device… without CUDA it works fine, so no mistakes on my side
2) lost device…
a) since texture needs to be in D3DPOOL_DEFAULT memory pool - I have to release it. so first I cudaD3D9UnregisterResource(cudaTex) - no error. second - release texture, no error
b ) device reset - reloading resources - again - D3DXCreateTextureFromFileEx(pd3dDevice, L"media/stones.jpg", D3DX_DEFAULT, D3DX_DEFAULT, D3DX_DEFAULT, 0, D3DFMT_UNKNOWN, D3DPOOL_DEFAULT, D3DX_DEFAULT, D3DX_DEFAULT, 0, NULL, NULL, &cudaTex); works absolutely fine, texture exists, just like in 1b)
c) cudaD3D9RegisterResource(cudaTex, cudaD3D9RegisterFlagsNone); - ERROR - “invalid device ordinal”
d) screen and driver corruption when trying to map/unmap/modify this texture in CUDA

I tried 2 approaches:

  1. set CUDA Device on program start, without modifying or setting it again later
    error as above

  2. set CUDA Device on each device reset (including on startup of application)
    when setting Device again after resetting device I get error “setting the device when a process is active is not allowed”… then same error as above
    well, there is nothing active, device is being reseted. additionally, all calls for cudaD3D9SetDirect3DDevice and registering CUDA texture are made from exactly the same place as during startup of application. except now they cause massive error

I will be very thankful for any answer to this, switching to DX10 is not exactly what I want to do…

PS: in the SDK sample you didnt even bother with lost device ;) lost device simply causes sample to quit

Thanks, I have reproduced the problem and filed a bug. I’ll keep you posted on any progress.

We’ll also fix the SDK samples to handle device lost correctly.

thanks for the info. any guess when will it be corrected ? or at least when can we be given any further info ?

ok… it seems I somehow SOLVED the problem… it is really dumb, but it works…

I found it by complete accident, started commenting and uncommenting some lines of code and suddenly everything seems to be fine. CUDA lives through lost and reset device, even multiple ones

the only difference I made in the code was changing from:
cudaThreadExit();
CUT_CHECK_ERROR(“cudaThreadExit failed”);

to a simple
cudaThreadExit();

checking for error here causes CUDA to die immediately after device is reseted…


so the proper way of handling lost device in CUDA is as following:

  1. cudaD3D9SetDirect3DDevice after device creation and EVERY device reset (if using DXUT it is enough to do it in OnD3D9ResetDevice, it is called once on device creation as well)
  2. registering and resources with each device creation/reset and unregistering when device is lost. D3D resources treated as usual - release and recreate everything in POOL_DEFAULT
  3. cudaThreadExit(); WITHOUT checking for error here, or everything will break

should work. can someone test it and write his/her results ?


one more thing - CUDA programming guide and reference manual say that resources cannot be created in D3DPOOL_SYSTEMMEM. however, creating resources in D3DPOOL_MANAGED has identical results - CUDA will not work with them

I see the CUDA 2.1 samples now try to handle the lost device, but it doesn’t seem to be working. With simpleD3D9 for example when the device gets lost I get:

and the program exits.

Also, it seems the documentation has not been updated to tell how the lost device should be handled correctly? MikeSz’s method of not checking the error would appear to work but some “official” word on what is supposed to work would be nice to ensure the code doesn’t break in the future…