UDP packet sending too long (method sendto)

I have camera, that connected to TX2 through USB 3.0. I need send RAW frame data as UDP packets on multicast technology. Frame from camera is grayscale 8-bit per pixel format. Size is 2048x2048 pixels. I’m trying to send with c socket library with sendto method. I need to send around 64 packets with sendto method, because 2048*2048 / 65536 = 64. My test was to make high gpio line before starting send, and make low after sending frame. And i get time around 41 ms. Why? Why so long? Is it possible optimize ehternet on jetson tx2?

qint8 UdpSocketFilinVideo::SendFrame(quint8* frame, quint32 width, quint32 height, quint64 blockID){

Frame_offset = height;
nFrame.Header.FrameNumber = blockID;
nFrame.Header.FrameWidth = width;
nFrame.Header.FrameHeight = height;
nFrame.Header.FrameChannels = 1;
nFrame.Header.Version = 0x00;
nFrame.Header.Rezerved1 = 0x00;
nFrame.Header.Timestamp = 0;

int i = 2;
while(nFrame.Header.FrameWidth * Frame_offset * nFrame.Header.FrameChannels > MaximumPacketSize)
{
    Frame_offset = (nFrame.Header.FrameHeight / i) + 1;
    i++;
}

for(unsigned int number = 0; number < nFrame.Header.FrameHeight; number += Frame_offset) {
    nFrame.Header.StringOffset = number;

    if((nFrame.Header.FrameHeight - nFrame.Header.StringOffset) >= Frame_offset) {
        move_size = (nFrame.Header.FrameWidth * Frame_offset) * nFrame.Header.FrameChannels;
        nFrame.Header.StringsCount = Frame_offset;
    }
    else if((nFrame.Header.FrameHeight - nFrame.Header.StringOffset) < Frame_offset) {
        move_size = ((nFrame.Header.FrameHeight - nFrame.Header.StringOffset) * nFrame.Header.FrameWidth) * nFrame.Header.FrameChannels;
        nFrame.Header.StringsCount = nFrame.Header.FrameHeight - nFrame.Header.StringOffset;
    }
    memmove(nFrame.Data, frame + (number * nFrame.Header.FrameWidth * nFrame.Header.FrameChannels), move_size);

    size = ProtocolTools::FramePacketSize(nFrame);

    ssize_t err = sendto(sockfd, (const char*)&nFrame, size,
                         MSG_CONFIRM, (const struct sockaddr *)&servaddr, sizeof(servaddr));
}

}

Just some thoughts, not really an answer (everything below assumes IPv4, this is most common, and I don’t know enough about IPv6 to comment)…

If using IPv4, then you might consider setting jumbo frames (both sides of the connection would need to support jumbo frames; or all end points for multicast…do you know the MTU/MRU for all relevant hops?). For example, see:
https://linuxconfig.org/how-to-enable-jumbo-frames-in-linux

You would have to take into consideration header sizehttps://en.wikipedia.org/wiki/IPv4#Header, but then increase the amount of data to try to fill a single packet with the MTU (less header overhead). Part of that overhead is the IPv4, another part is the UDP header.

In most cases there is a send buffer, and if the buffer is not filled, then the send will wait on a timer. Sending too little data implies either waiting for a timer or waiting for more data. Sending too much data implies fragmentation and reassembly (at least under TCP, but the point is to illustrate that matching the buffer size triggers immediate send, and other sizes result in delays).

In some cases, if you were to append NULL bytes to the end of your data in order to correctly fill the send buffer, the extra bytes would actually reduce latency (a case where you have less than the MTU with overhead, and still want to fill up a buffer you cannot change).

The GPIO itself may have a lot of latency. I would not expect low latency, nor predictable latency from GPIO. Make sure that your latency measurements exclude the actual GPIO change time.

You will also want to make sure you are running at max clock speed with jetson_clocks.