I am testing out a simple two-fragment Holoscan app that sends a std::vector<uint8_t> from a TX fragment to an RX fragment. The app sends larger and larger vectors with each call to compute, and my goal is to measure the throughput between the fragments to see how close it comes to the NIC line rate.
The app works fine until the size of the vector reaches about 6kB. At this point the app crashes with the following error:
[error] [codec_registry.hpp:331] Error happens in serializing data of type 'St6vectorIlSaIlEE'
[error] [ucx_transmitter.cpp:227] Serialization failed
[error] [ucx_transmitter.cpp:306] Failed to send entity
[error] [entity_executor.cpp:613] Failed to sync outbox for entity: tx_frag__tx code: GXF_FAILURE
Both fragments are running in Docker containers on different hosts. The container is nvcr.io/nvidia/clara-holoscan/holoscan with tag v2.8.0-dgpu. When I run the container, I use the following flags:
This is currently a known limitation for sending non-tensor data between fragments of a distributed application. You can try increasing the HOLOSCAN_UCX_SERIALIZATION_BUFFER_SIZE to a larger value (documented here), but I think it cannot be very much larger due to the way data is being sent.
Specifically, the underlying GXF UCX extension used for sending data is designed primarily for transmitting holoscan::Tensor (or nvidia::gxf::Tensor or nvidia::gxf::VideoBuffer) data (for which there is not such a size limit). For other types such as scalars, std::vector, std::string, etc. they are all copied into the header sent by the underlying call to UCX’s ucp_am_send_nbx. I am not sure if it is documented how large that header can be, but I think it worked only up until something like 8 kB when I had tried it in the past.
The available workaround to your issue would be to use a Tensor of uint8_t data in which case only a small number of bytes indicating the shape/strides/data type information are sent in the header and the actual data is sent via a pointer to the buffer in that ucp_am_send_nbx API I linked above.
If you need guidance on how to wrap existing data into a Tensor, please let me know and I can link to some examples.
For C++ it is currently a bit complicated due to need to use underlying nvidia::gxf::Tensor APIs to either allocate a new tensor or wrap existing memory as a tensor.
It is probably safest to create a new nvidia::gxf::Tensor and use the reshape method to allocate memory. This method requires that the operator have an Allocator parameter available for memory allocation. Once the tensor has been allocated, you can copy your data into it. There is a concrete example copying std::array into a Tensor in this way in this example:
where in that example you can see that an allocator_ private member variable exists and is initialized during the initialize method of the operator.
To create and emit the holoscan::Tensor from the nvidia::gxf::Tensor, you can follow this code
You can wrap existing memory with a different wrapMemory method as shown in this operator
but that can be tricky to manage the lifetime of the memory to make sure it isn’t released when the compute method call ends
A third option using the MatX C++ library to create tensors is shown in this tutorial: