I am running these code snippets on WSL using CUDA 12.1.
snip1.txt (1.5 KB)
snip2.txt (1.5 KB)
The only difference between snip1 and snip2 is the position of cudaMemPrefetchAsync
. Running snip1 results in a CUDA error 101: “invalid device ordinal,” while snip2 runs without any issues.
What could be the mechanism behind this behavior?
// snip1
cudaMemPrefetchAsync(noisy, bytes, deviceId);
cudaMemPrefetchAsync(ising1, bytes, deviceId);
cudaMemPrefetchAsync(ising2, bytes, deviceId);
init_curand<<<numBlocks, threadsPerBlock>>>(states, time(NULL));
checkError();
// snip2
init_curand<<<numBlocks, threadsPerBlock>>>(states, time(NULL));
checkError();
cudaMemPrefetchAsync(noisy, bytes, deviceId);
cudaMemPrefetchAsync(ising1, bytes, deviceId);
cudaMemPrefetchAsync(ising2, bytes, deviceId);