I made two approaches to run CUDA on my Windows VM running in Azure (GPU enabled). With the first approach I wrote a kernel, exported a function and ran it from C# Windows Forms app. I’ve got an access violation on every CUDA function (see this). Then I installed ManagedCuda extension and tried another way. It turned out that my driver has a different version (12.2) than CUDA Toolkit (12.5). I couldn’t update the driver (can’t find which driver would be suitable! All I tried to download said that’s the wrong hardware) so I downgraded the CUDA toolkit.
When I run in Debug mode in VS:
the provided PTX was compiled with an unsupported toolchain.
(apparently other compiler used??)
from cmd line (dotnet build, dotnet run):
Error calling CUDA function: ErrorInvalidImage: This indicates that the device kernel image is invalid. This can also indicate an invalid CUDA module.
My testing code:
const int count = 1024;
float[] a = new float[count];
float[] b = new float[count];
float[] result = new float[count];
for (int i = 0; i < count; i++){
a[i] = i;
b[i] = i * 2;
}
using (CudaContext context = new CudaContext())
{
CudaDeviceVariable<float> d_a = new(count);
CudaDeviceVariable<float> d_b = new(count);
CudaDeviceVariable<float> d_result = new(count);
d_a.CopyToDevice(a);
d_b.CopyToDevice(b);
CudaKernel kernel = context.LoadKernelPTX("object_localization.ptx", "AddArrays");
int threadsPerBlock = 256;
int blocksPerGrid = (count + threadsPerBlock - 1) / threadsPerBlock;
kernel.BlockDimensions = new dim3(threadsPerBlock, 1, 1);
kernel.GridDimensions = new dim3(blocksPerGrid, 1, 1);
kernel.Run(d_a.DevicePointer, d_b.DevicePointer, d_result.DevicePointer, count);
d_result.CopyToHost(result);
for (int i = 0; i < 10; i++) {
Console.WriteLine($"result[{i}] = {result[i]}");
}
}
extern "C" {
__declspec(dllexport) __global__ void AddArrays(float* a, float* b, float* result, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < n) {
result[idx] = a[idx] + b[idx];
}
}
}
Output from device-query sample is visible in the provided link. Here is the output from cmd:
nvidia-smi
Sun Jun 9 22:46:42 2024
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 538.46 Driver Version: 538.46 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A10-4Q WDDM | 00000002:00:00.0 Off | 0 |
| N/A 0C P0 N/A / N/A | 625MiB / 4096MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1276 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 2212 C+G …ekyb3d8bbwe\PhoneExperienceHost.exe N/A |
| 0 N/A N/A 8096 C+G …2txyewy\StartMenuExperienceHost.exe N/A |
| 0 N/A N/A 8120 C+G …nt.CBS_cw5n1h2txyewy\SearchHost.exe N/A |
| 0 N/A N/A 10216 C+G …5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 N/A N/A 10408 C+G …Professional\Common7\IDE\devenv.exe N/A |
| 0 N/A N/A 10536 C+G …les\microsoft shared\ink\TabTip.exe N/A |
| 0 N/A N/A 10976 C+G …CBS_cw5n1h2txyewy\TextInputHost.exe N/A |
| 0 N/A N/A 11544 C+G …crosoft\Edge\Application\msedge.exe N/A |
| 0 N/A N/A 11820 C+G …cal\Microsoft\OneDrive\OneDrive.exe N/A |
| 0 N/A N/A 13136 C+G …480_x64__8wekyb3d8bbwe\ms-teams.exe N/A |
| 0 N/A N/A 13644 C+G …on\125.0.2535.92\msedgewebview2.exe N/A |
| 0 N/A N/A 15060 C+G …__8wekyb3d8bbwe\WindowsTerminal.exe N/A |
±--------------------------------------------------------------------------------------+
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:36:51_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
The compiling arguments:
nvcc -ptx -arch=sm_86 object_localization.cu -o object_localization.ptx
nvcc -ptx object_localization.cu -o object_localization.ptx
Sounds like problem with set up? Why do i have those errors?