Cuda capable device not detected while using other frameworks

I built a daylight and sound simulation based on the optixPathTracer example. I built everything in a library that is called from python via ctypes as part of a reinforcement learning environment. When i run my python environment everything works fine. The problems arise when i want to use the environment in the reinforcement learning framework RLLib(basend on Ray). I get an sutil exception from my lib that tells me the device is not detected. The problem comes from the functionality in sutil that allocates device memory for the geometry buffers.

'sutil::Exception'
(RolloutWorker pid=6389)   what():  CUDA call (cudaMalloc( reinterpret_cast<void**>( &buffer ), buf_size ) ) failed with error: 'no CUDA-capable device is detected' (/home/my_package_path/SDK/sutil/Scene.cpp:656)

How is device usage handled in CUDA? Could i possibly solve this from the optix side or is it a problem on the Ray side?

For OptiX 7 applications you need two things to work:

  1. The initialization of the OptiX 7 entry point function table which is loaded dynamically from the OptiX user mode driver.
    Inside the OptiX SDK examples this happens inside one of the optix_stubs.h helper functions. Look for the string “optixQueryFunctionTable” to find the code.
  2. The initialization of a CUDA context on an OptiX 7 supported GPU device (Maxwell and newer).
    When using the CUDA Runtime API that happens on the very first CUDA runtime host call and is usually done with a dummy cudaFree(0) call.
    Inside the OptiX SDK that happens in each of the examples individually, usually in a function named createContext(), resp. createContexts() for the multi-GPU examples.
    The CUDA Driver API allows explicit initialization of CUDA and the CUDA contexts per device.
    Example code comparing the two mechanisms here:
    https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_runtime/src/Application.cpp#L730
    https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/intro_driver/src/Application.cpp#L750

If there was a general problem with not finding CUDA capable devices on your system, then that should have thrown errors in earlier CUDA runtime calls before that cudaMalloc().

I have no experience with any of the environments you cited, but if that framework is distributing work to multiple devices and some of them are not NVIDIA GPUs, or not supported by OptiX 7, or if that is something like a Docker environment where not all necessary drivers are present for the above mentioned initializations to succeed, or if that is distributing things to multiple GPUs dynamically and not as whole process and your OptiX 7 application is not multi-GPU aware and has not created CUDA and OptiX contexts on each active one of them, or if the framework is changing the current CUDA devices on its own, etc., then I would expect issues.

Means you would need to analyze what actually happens in these environments and focus on anything related to CUDA in the same process. If possible step through that in a debugger and see what CUDA calls happened before that cudaMalloc() error.