Function abnormal when cudaMallocManaged without memcpy

My C++ cuda sorting program used two cudaMallocManaged and one memcpy as I don’t need to initialize one of the array (it will be initialized in kernel). It compiles success, but function fail. If I use memcpy for both, it function well. Or, if I do printf on another array, it function well. This looks so weird. How to understand? GPU is V100 32G.

This code function fail.

cudaMallocManaged(&arrGpu, size * sizeof(int));

cudaMallocManaged(&sum, 1000 * sizeof(int));

cudaMemcpy(arrGpu, arrCpu, size * sizeof(int), cudaMemcpyHostToDevice);

Add cudaMemcpy(sum, arrCpu, 1000 * sizeof(int), cudaMemcpyHostToDevice); it function well.

Or add
for (int i=0; i<binarySize; i++)
printf(“%d,”, arrGpu[i]);
it function well.