According to the result of Visual Profiler, nvjpegEncodeImage() function seems to be blocked at cudaMemcpyAsync Device to Pageable (for only 4B!) at the almost end of the function call.
The attached image is a screen capture of visual profier, which is reproduced by GT710 using CUDA Samples\v11.2\7_CUDALibraries\nvJPEG_encoder.
This is serious limitation for large resolution image processing.
Is not nvjpegEncodeImage() designed as asynchronous?
I infer that it is a bug of nvJPEG; why 4B memory is not allocated as Pinned for asynchronous memcpy?
Or, is not there any workaround for asynchronous of nvjpegEncodeImage?nvjpegencodeimage_is_blocked_by_memcpyasync_devicetopageable|626x500