Get the tensor output in NumPy array

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
RTX 3090
• Deeptream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

Hi All,

I am developing a custom model and want to deploy it using DeepStream. I am using python as the primary development language because I can use NumPy. To parse the model result and attach it to the metadata structure, I am following the example and try to turn the final tensor output to a NumPy array for further processing.

After some search, I follow this thread and was able to convert the result if the output is FP32 format. If the output is FP16 format, I encounter a segmentation fault. Segmentation faults still happens if I use pyds.get_detections().

start = time.time()
output_layer = pyds.get_nvds_LayerInfo(tensor_meta, 0)
ptr = ctypes.cast(pyds.get_ptr(output_layer.buffer), ctypes.POINTER(ctypes.c_float))
output = np.ctypeslib.as_array(ptr, shape=(25200, 85))
print(f"Transform takes {(time.time() - start) / 1e-3} ms")

May I ask

  1. Is there any way to get the fp16 tensor output in NumPy array? Since it is pretty important to use half-precision model for best inference speed. It will also be much easier for developers to get the tensor output and perform post-processing in python for faster development cycles. (ML background developers are familiar with python.)

  2. about the parsing speed. Compare to cpp / cuda, will it create a lot of overhead (such as copy memory to NumPy) if we use python to develop the parsing function (for the mode output)? If so, is it the recommended way to use cpp to develop DeepStream application instead of Python?

Many Thanks!

The “get_detections” binding is implemented in deepstream_python_apps/bindfunctions.cpp at v1.1.4 · NVIDIA-AI-IOT/deepstream_python_apps (, you can modify it to output FP16 float instead of 32bit float.

The output layer is not in big size, the copy between GPU memory to system memory is OK. We will recommend to use cpp APIs for more flexibilities of customizations.

Thanks for the prompt reply. I wonder if there’s a way to get the FP16 NumPy array from the output tensor using Python without modifying the source code? It will be more user-friendly for the user to develop without knowledge of 2 languages.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

No. Currently pyds is pybind11 implementation.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.