why API: nppiEncodeHuffmanScan_JPEG will copy 4bytes from device to host memory?

I use API:nppiEncodeHuffmanScan_JPEG_8u16s_P3R to implement JPEG encoder. But I find twice 4bytes memcpy from device memory to host memory. Then I replace output variable: nLength with device memory pointer, and one memcpy become from device memory to device memory, but the other is still from device to host. Can anyone tell me how to reduce memcpy from device to host in this API?