Hello!
I’m investigating methods to transfer data b/w CPU and GPU.
I executed the following program compiled with nvcc(release 10.2, V10.2.89) on Jetson Xavier NX(CUDA 7.2).
On the GPU, it increments an int variable on pinned memory.
On the CPU, after starting the CUDA kernel, it observes the variable and writes it to stdout.
#include <iostream>
#include <cuda_runtime.h>
#include <helper_functions.h>
#include <helper_cuda.h>
__global__
void kernel(volatile int* ptr) {
for (volatile int i = 0; i < 1e8; ++i) {
*ptr = i;
}
}
int main() {
int *ptr, *dptr;
const auto flags = cudaHostAllocMapped;
checkCudaErrors(cudaHostAlloc(&ptr, 4, flags));
checkCudaErrors(cudaHostGetDevicePointer((void**)&dptr, ptr, 0));
dim3 d = {1, 1};
kernel<<<d, d>>>(dptr);
volatile int v = 0;
while (v != 1e8 - 1) {
v = *ptr;
std::cout << v << '\n';
}
std::cout << std::endl;
}
And the output is like this.
0
1350
1459
1553
1656
1758
1859
1960
2062
2316
2421
2523
2626
2728
2829
2930
3046
3147
3541
3652
3753
3855
3956
...
My question is why the CPU can read values updated frequently in the GPU. Does the CUDA driver flush the CPU cache when the GPU writes to the memory? Or is it achieved with hardware coherency?
According to this document, on Tegra devices with I/O coherency, pinned memory has cache on CPUs, but doesn’t have on GPUs.
And with I/O coherency, the GPU can access the latest updates in the CPU caches. However, there is no description why
the CPU can access the latest updates on GPU.
Thanks.