In my case coping data takes 40% of all time if I comment lines with it I get significant speedup. Is it OK?
before = GetTick();
Here >>> CHECK(cudaMemcpyAsync(buffers[inputIndex], input, INPSIZE, cudaMemcpyHostToDevice, stream));
context.enqueue(batchSize, buffers, stream, nullptr);
And here >>> CHECK(cudaMemcpyAsync(output, buffers[outputIndex], OUTSIZE, cudaMemcpyDeviceToHost, stream));
cudaStreamSynchronize(stream);
after = GetTick();
printTimeInterval(before, after);
NVES_R
2
Hi,
In general, copying memory between the device and host can be a bottleneck in GPU-accelerated computation, so it’s best to be smart about your data transfers when possible for good performance. See this guide for memory optimizations on CUDA best practices: https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#memory-optimizations.
Can GPU work while some another process coping its data to device? If yes when some pipeline could be established, right?
NVES_R
4