I’ve just installed 2 GTX 1080ti on Threadripper 1950x. However, if I run the P2P benchmarks provided by cuda’s sample (such as simpleP2P, p2pBandwidthLatencyTest), they crash.
The cause should be caused by the following function call:
cudaMemcpy(g1, g0, buf_size, cudaMemcpyDefault)
And g0 and g1 are defined as:
float *g0;
checkCudaErrors(cudaMalloc(&g0, buf_size));
float *g1;
checkCudaErrors(cudaMalloc(&g1, buf_size));
I’ve also enabled AMD-vi and IOMMU, but it still does not work. Does this mean that cuda’s UVA can only work on Intel platform?
Sorry, I made a mistake, the codes that caused the problem is:
printf("Run kernel on GPU%d, taking source data from GPU%d and writing to GPU%d...\n",
gpuid[0], gpuid[1], gpuid[0]);
checkCudaErrors(cudaSetDevice(gpuid[0]));
SimpleKernel<<<blocks, threads>>>(g1, g0);
checkCudaErrors(cudaDeviceSynchronize());
By saying crash I meant that the system totally did not respond. And sometimes the CPU would throw bugs like (NMI watchdog: Bug: soft lockup …).
However, this problem happened because the configuration of IOMMU on my motherboard was set to auto mode. Then I switched it to enable mode. In both these two modes, cuda’s P2P met problems. Finally the IOMMU was switched to disable mode, then the problem was solved.
Thank you! You are a life saver! I couldn’t figure out why my Titan RTXs kept crashing while running both CUDA workloads and things like basic 3D applications (games or Unreal Engine 4). Disabling IOMMU on my Threadripper 3970X solved the issue completely.
For some reason if I have IOMMU enabled, I get constant Nvidia driver crashes and sometimes system lockups while running any CUDA workloads or while working inside of Unreal Engine 4 (or playing games). Not sure what the issue is but Nvidia might want to look into it. I’m going to submit a bug report.
I also noticed strange behavior with IOMMU enabled like Code 43 error appearing on one or both GPUs seemingly at random after a cold boot. I am not currently running Windows inside of a VM, it’s running native. I would have to DDU the drivers for the Code 43 to disappear. Since disabling IOMMU, all issues have disappeared.