I’m setting up Windows Server 2016 instances within google cloud as part of an elastically scaling continuous integration farm for games development.
Cooking unreal-engine assets benefits greatly in terms of speed when a GPU is present. So, I’m attempting to create instances that have NVidia Tesla P100 GPUs attached (the other option is K80).
I am building these instances via ansible, a configuration management tool.
I have so far been unsuccessful when attempting to install the display driver via instructions at Criar uma VM com GPUs anexadas | Documentação do Compute Engine | Google Cloud - I’m running commands remotely via ansible to download then try to install the display-driver.
My ansible task looks like:
- name: install driver
win_package:
path: https://developer.nvidia.com/compute/cuda/10.0/Prod/network_installers/cuda_10.0.130_win10_network
arguments:
- "-s"
- "Display.Driver"
# product_id from HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\{B2FE1952-0186-46C3-BAEC-A80AA35AC5B8}_Display.Driver
product_id: "B2FE1952-0186-46C3-BAEC-A80AA35AC5B8"
though I’ve also tried
- name: fetch installer
win_get_url:
url: https://developer.nvidia.com/compute/cuda/10.0/Prod/network_installers/cuda_10.0.130_win10_network
dest: c:/windows/temp/cuda_10.0.130_win10_network.exe
# Only need the display driver - basically, what a consumer would have.
# Install docs: https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html
- name: install nvidia
win_shell: >-
start-process c:/windows/temp/cuda_10.0.130_win10_network.exe
-ArgumentList @("-s", "Display.Driver")
-wait
-PassThru
args:
creates: "C:/Program Files/NVIDIA Corporation"
The former gets me an error:
googlecompute: fatal: [10.49.0.148]: FAILED! => changed=false
googlecompute: msg: 'unexpected rc from install opvf1a5r.duz: see rc, stdout and stderr for more details'
googlecompute: rc: 3825205504
googlecompute: reboot_required: false
googlecompute: stderr: ''
googlecompute: stderr_lines: <omitted>
googlecompute: stdout: ''
googlecompute: stdout_lines: <omitted>
whereas the latter doesn’t error, but also does not install the driver successfully.
If I run the start-process call via powershell while remote-desktop’d into the instance, there’s a ‘trust this driver’ popup to click through; if I manually do that, then the driver successfully installs.
So - is there a way to automate, hands-off, the installation of the display driver? Is there some file or registry key I need to create in advance to allow Windows to pre-emptively trust the installation, something arcane like that…?
Any help greatly appreciated.