CUDA 11.0, Windoze 10: Error 801 when trying to execute a runtime API function

I got a really puzzling problem: I develop right now an image processing pipeline with the CUDA 11.0 for two architectures, SM_61 (GeForce 1050 Ti) and SM_75 ( Geforce 2060 OC). The code compiles for both architectures, but when run on the SM_61 device, any call to a CUDA runtime API (eg. at first, the cudaStreamCreate() is called) fails with the error 801 (operationNotSupported). The same binary runs without problems on SM_75. I have inspected literally every possible compiler setting and I am now running out of ideas. Tested on multiple machines, too. Any help welcome…

Is PyTorch part of this image processing pipeline, by any chance? Can you run any simple CUDA-based program on the machine with the GeForce 1050 Ti? Do you have the latest available NVIDIA driver package installed on the machine with the GeForce 1050 Ti?

GPU architectures are no binary compatible. Are you building a fat binary with binary code for both sm_61 and sm_75 embedded in the executable?

Hi njuffa,
Thanks for the reply.

  1. No, PyTorch is not a part of the pipeline. Actually, the entire code is written in C++ and the only external library it uses is the OpenCV 4.4.0. Sorry, I forgot to mention it.
  2. In general, I can run CUDA programs on the exactly the same computer but the GPU’s interchanged (eg. CUDA examples run without problems on both SM_61 and SM_75
  3. I use Visual Studio for compilation. The NVCC compiler flags I set are: -rdc (relocable device code), NVCC compilation type is --compile (“generate hybrid object type”), gencode = arch = compute_61, code = sm_61,
    gencode = arch = compute_75, code = sm_75
    Unfortunately, I cannot post here even a minute part of the source code (it is proprietary).

I do not recall ever encountering error 801 (operationNotSupported) in my work. I asked about PyTorch because the first page of search engine results for this error all involve PyTorch. I have no additional ideas.