I use API:nppiEncodeHuffmanScan_JPEG_8u16s_P3R to implement JPEG encoder. But I find twice 4bytes memcpy from device memory to host memory. Then I replace output variable: nLength with device memory pointer, and one memcpy become from device memory to device memory, but the other is still from device to host. Can anyone tell me how to reduce memcpy from device to host in this API?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Problems of JpegEncoder with NPP library | 0 | 710 | June 24, 2018 | |
Can I transfer a image from host to device or inverse with "nppiCopy_32f_C3R" ? | 0 | 597 | January 22, 2018 | |
cudaMemcpyDeviceToHost time procces | 6 | 3016 | August 1, 2008 | |
Device to Host memcpy How do i make this faster? | 2 | 2514 | February 6, 2008 | |
Why cudaMemcpyDeviceToHost is too slowly? | 1 | 614 | November 16, 2021 | |
How to copy GPU buffer(from nppiMalloc_8u_C4), to dmabuf_fd(from NvBufferCreateEx) | 5 | 485 | January 16, 2020 | |
Are nppi_compression functions synchronous? | 3 | 1448 | October 1, 2013 | |
NPP Host to Device BGRA Memory Copy | 2 | 425 | October 10, 2018 | |
NvJPEGEncoder function EncodeFromBuffer or EncodeFromFd memory usage is too high | 5 | 484 | September 25, 2023 | |
NPP jpeg compression doesn't work if resulting jpeg is > 1 MB | 8 | 1916 | March 12, 2014 |