CUDA Initialization in cuda docker container

I am building a docker to do portable ML development on. I’m using the official docker image nvidia/cuda:11.6.2-devel-ubuntu20.04 as the base.
I have a host of dependencies and I’m creating multiple docker images for each of the more complex ones to streamline my process. One of my dependencies is not installing from the Dockerfile, but I can install it from inside the container.

Specifically

FROM pytorchbase-cuda11.6-ptorch11.6:latest
#Install dependencies

#WORKDIR /home/myuser
RUN mkdir detectron2
WORKDIR /home/myuser/detectron2
COPY ./ /home/myuser/detectron2/

## break symlinks in model_zoo
#RUN rm detectron2/model_zoo/configs  #this is copied over as a symlink to /detectron/configs, but the link is to the host machine.  
#RUN mkdir detectron2/model_zoo/configs
#RUN cp -r configs/* detectron2/model_zoo/configs/
RUN python3 setup.py install

Fails during the installation with

File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1694, in _get_cuda_arch_flags
          arch_list[-1] += '+PTX'
      IndexError: list index out of range
      [end of output]

I have found this error is due to CUDA arch not being setup.

If I build the image without the last line, but just copy the data over, I can run the image and EXEC into it, then run
python3 setup.py install and it installs perfectly.

I can also run nvidia-smi from inside the the image and it returns my graphics card and the expected CUDA version. PyTorch demos also work.

It seems like some form of CUDA initialization is not happening during the docker build phase, but does occur when I run the machine. Is there an initialization step I need to take?