Synchronization for NPP GetHostBufferSize functions

Hi,

A question about HostBufferSize in NPP calls:

There are a large number of NPP routines for calculating buffer sizes, with names like nppiDotProdGetBufferHostSize_32s64f_C3R_Ctx. Our understanding is that

  • the word Host in the name indicates that the buffer size will be stored into a memory location on the host (CPU), pointed to by hpBufferSize. The Host is for hp.
  • the extension _Ctx indicates that this function runs on the GPU and therefore that the buffer size will be stored asynchronously, by a mechanism like cudaMemcpyAsync .
  • Therefore: some kind of synchronization is needed following the call to this routine before accessing the buffer size at hpBufferSize

Is this the right understanding?

Thanks,
Arnoud.