Nvidia.ko can't be insmod after OS disk encryption (Tesla V100 + Nvidia-440.31.01+ Ubuntu-18.04 + 5....

After os disk encryption, the nvidia-smi command it giving the below error.

nvidia-smi
“NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

Then I found that there is no nvidia.ko when I execute lsmod.

lsmod | grep nvidia
NULL

If we insmod nvidia.ko manually, the output is :

cd /lib/modules/5.0.0-1027-azure/updates/dkms
insmod nvidia.ko
insmod: ERROR: could not insert module nvidia.ko: Invalid module format

I decompressed initrd.img before and after os disk encryption, and compared them. There was only nvidia.ko has been changed except the files new added while os disk encrypting.

diff -Nrq before/ after/
Files before/.random-seed and after/.random-seed differ
Files before/boot/luks/osluksheader and after/boot/luks/osluksheader differ
Files before/conf/conf.d/cryptheader/osluksheader and after/conf/conf.d/cryptheader/osluksheader differ
Files before/conf/conf.d/cryptroot and after/conf/conf.d/cryptroot differ
Files before/etc/mdadm/mdadm.conf and after/etc/mdadm/mdadm.conf differ
Files before/etc/mdadm/mdadm.conf.tmp and after/etc/mdadm/mdadm.conf.tmp differ
Files before/initrd.img-5.0.0-1027-azure and after/initrd.img-5.0.0-1027-azure differ
Files before/initrd.img-5.0.0-1027-azure-after-encrypt and after/initrd.img-5.0.0-1027-azure-after-encrypt differ
Files before/lib/cryptsetup/scripts/azure_crypt_key.sh and after/lib/cryptsetup/scripts/azure_crypt_key.sh differ
Files before/lib/modules/5.0.0-1027-azure/updates/dkms/nvidia.ko and after/lib/modules/5.0.0-1027-azure/updates/dkms/nvidia.ko differ

We uses the DM-Crypt feature of Linux to encrypt os disk.

Here is other information:

lspci | grep NVIDIA
09d3:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
7f11:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
a891:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)
e783:00:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 16GB] (rev a1)

modinfo nvidia
filename: /lib/modules/5.0.0-1027-azure/updates/dkms/nvidia.ko
alias: char-major-195-*
version: 440.33.01
supported: external
license: NVIDIA
srcversion: A5E9226CB2A7B16B12DA2CA
alias: pci:v000010DEdsvsdbc03sc02i00
alias: pci:v000010DEdsvsdbc03sc00i00
depends: ipmi_msghandler,i2c-core
retpoline: Y
name: nvidia
vermagic: 5.0.0-1027-azure SMP mod_unload
signat: PKCS#7
signer:
sig_key:
sig_hashalgo: md4
parm: NvSwitchRegDwords:NvSwitch regkey (charp)
parm: NVreg_Mobile:int
parm: NVreg_ResmanDebugLevel:int
parm: NVreg_RmLogonRC:int
parm: NVreg_ModifyDeviceFiles:int
parm: NVreg_DeviceFileUID:int
parm: NVreg_DeviceFileGID:int
parm: NVreg_DeviceFileMode:int
parm: NVreg_InitializeSystemMemoryAllocations:int
parm: NVreg_UsePageAttributeTable:int
parm: NVreg_MapRegistersEarly:int
parm: NVreg_RegisterForACPIEvents:int
parm: NVreg_EnablePCIeGen3:int
parm: NVreg_EnableMSI:int
parm: NVreg_TCEBypassMode:int
parm: NVreg_EnableStreamMemOPs:int
parm: NVreg_EnableBacklightHandler:int
parm: NVreg_RestrictProfilingToAdminUsers:int
parm: NVreg_PreserveVideoMemoryAllocations:int
parm: NVreg_DynamicPowerManagement:int
parm: NVreg_EnableUserNUMAManagement:int
parm: NVreg_MemoryPoolSize:int
parm: NVreg_KMallocHeapMaxSize:int
parm: NVreg_VMallocHeapMaxSize:int
parm: NVreg_IgnoreMMIOCheck:int
parm: NVreg_NvLinkDisable:int
parm: NVreg_RegisterPCIDriver:int
parm: NVreg_RegistryDwords:charp
parm: NVreg_RegistryDwordsPerDevice:charp
parm: NVreg_RmMsg:charp
parm: NVreg_GpuBlacklist:charp
parm: NVreg_TemporaryFilePath:charp
parm: NVreg_AssignGpus:charp

dmesg
[ 44.941102] IPMI message handler: version 39.2
[ 44.942310] ipmi device interface
[ 45.106981] PKCS#7 signature not signed with a trusted key
[ 45.108645] PKCS#7 signature not signed with a trusted key
[ 45.116908] PKCS#7 signature not signed with a trusted key
[ 45.116916] nvidia: loading out-of-tree module taints kernel.
[ 45.116922] nvidia: module license ‘NVIDIA’ taints kernel.
[ 45.116923] Disabling lock debugging due to kernel taint
[ 45.120266] PKCS#7 signature not signed with a trusted key
[ 45.127798] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 45.130327] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 00000000b79ed1c9, val ffffffffc0cc4940
[ 45.723118] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 00000000b857c1d5, val ffffffffc1fbe940
[ 45.829042] PKCS#7 signature not signed with a trusted key
[ 46.019110] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 00000000232d6187, val ffffffffc3584940
[ 46.124604] PKCS#7 signature not signed with a trusted key
[ 46.367280] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 000000001b25ca96, val ffffffffc49e4940
[ 46.472970] PKCS#7 signature not signed with a trusted key
[ 46.646990] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 000000001f9cba4f, val ffffffffc5e44940
[ 46.754560] PKCS#7 signature not signed with a trusted key
[ 46.975067] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 0000000008f48b3b, val ffffffffc8704940
[ 47.247120] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 00000000500df2d6, val ffffffffc0bfd940
[ 47.535029] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 00000000f8012059, val ffffffffc72a4940
[ 47.867221] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 50.900475] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)
[ 50.934482] new mount options do not match the existing superblock, will be ignored
[ 50.968115] hv_utils: VSS: userspace daemon ver. 129 connected
[ 51.151374] PKCS#7 signature not signed with a trusted key
[ 51.255512] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 0000000003696d71, val ffffffffc0c19940
[ 51.638302] bpfilter: Loaded bpfilter_umh pid 1816
[ 51.804470] PKCS#7 signature not signed with a trusted key
[ 51.914605] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 000000006dd495a1, val ffffffffc0c75940
[ 52.589710] PKCS#7 signature not signed with a trusted key
[ 52.755651] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 0000000082c17d9b, val ffffffffc21e6940
[ 53.508507] PKCS#7 signature not signed with a trusted key
[ 53.522509] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 000000006dd495a1, val ffffffffc0c75940
[ 53.836219] PKCS#7 signature not signed with a trusted key
[ 53.850895] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 0000000082c17d9b, val ffffffffc21e6940
[ 54.183070] PKCS#7 signature not signed with a trusted key
[ 54.196668] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 00000000da39ad54, val ffffffffc3646940
[ 54.602530] PKCS#7 signature not signed with a trusted key
[ 54.616431] module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 1, loc 000000001c7a7a1b, val ffffffffc4aa6940
[ 71.221387] hv_balloon: Max. dynamic memory size: 458752 MB
nvidia-bug-report_after_disk_encryption.log (591 KB)
nvidia-bug-report_before_disk_encryption.log (2.16 MB)
nvidia.ko_before_disk_encryption.gz (12.6 MB)
nvidia.ko_after_disk_encryption.gz (12.9 MB)
nvidia-nvml-tmp_diff.png