Hello,
I am trying to run the basic CUDA example from here, https://developer.nvidia.com/blog/even-easier-introduction-cuda/, on an Ubuntu subsystem with the following:
Windows 10 Home build 20270.fe_release
GeForce GTX 1060 6GB
NVIDIA driver version 465.12
WSL Kernal version 5.4.72
Ubuntu 20.04.1 LTS
Output from deviceQuery is:
CUDA Driver Version / Runtime Version 11.2 / 11.0
CUDA Capability Major/Minor version number: 6.1
Output from wsl cat /proc/version:
Linux version 5.4.72-microsoft-standard-WSL2 (oe-user@oe-host) (gcc version 8.2.0 (GCC)) #1 SMP Wed Oct 28 23:40:43 UTC 2020
However, the tutorial script runs into a segmentation fault after cudaDeviceSynchronize(), when it’s trying to do the error checking using the value of y back on the host.
For completeness I copy the tutorial code below:
#include <iostream>
#include <math.h>
__global__
void add(int n, float *x, float *y)
{
for (int i = 0; i < n; i++)
y[i] = x[i] + y[i];
}
int main(void)
{
int N = 1<<20;
float *x, *y;
// Have checked for errors here
cudaMallocManaged(&x, N*sizeof(float));
cudaMallocManaged(&y, N*sizeof(float));
// initialize x and y arrays on the host
for (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
}
// Run kernel on 1M elements on the GPU
add<<<1, 1>>>(N, x, y);
// Wait for GPU to finish before accessing on host
cudaDeviceSynchronize();
// Program runs up to here fine
float maxError = 0.0f;
for (int i = 0; i < N; i++) {
maxError = fmax(maxError, fabs(y[i]-3.0f)); // SEGMENTATION FAULT HERE
std::cout << "Max error: " << maxError << std::endl;
cudaFree(x);
cudaFree(y);
return 0;
}
I have done extensive error checking throughout the script, in particular wrapping checkCudaErrors around the cudaMallocManaged calls, with no issues found. I have also reproduced it with totally different examples which use the cudaMallocManaged function, and find the same issue where computation runs fine, but accessing the memory back on the host is not possible.
I am also able to successfully compile and run the UnifiedMemory sample program in the cuda tool kit. However, on inspection I find that program doesn’t appear to try and access the memory back on the host after computation, so this makes sense.
It’s also worth noting I can run this example fine on Windows through Visual Studio, it’s only within the Ubuntu subsystem that I can’t.
Please let me know if there are other options I can explore to find the cause of this issue.
Thanks,
James