Assistance Required for Offloading CRC Calculation to Multiple DPA Cores on BlueField-3

I hope this message finds you well. I am currently working on a research project involving the BlueField-3 device, and I have encountered some difficulties that I would greatly appreciate your help with.

My objective is to offload the CRC (Cyclic Redundancy Check) validation and calculation steps to the DPA (Data Processing Accelerator). While reviewing the official examples, I came across the L2 reflection example. However, during my testing, I observed that the network bandwidth was only 300M. I suspect that this example might be utilizing only one DPA core.

Could you please advise me on the following:

  1. How can I expand the CRC calculation to multiple DPA cores?
  2. Once the CRC calculation is successfully offloaded, how can I transmit the data packets to an application running on the DPU ARM system?

Any guidance or examples on these issues would be extremely helpful.

Thank you very much for your time and assistance.

Best regards

Hope DPDK CRC test help you.

thank you so much!
I am currently attempting to accelerate the CRC calculation using DPDK(23.11.1). I have confirmed that the ARM cores on the DPU indeed have instructions to accelerate CRC32.

However, during my testing, the efficiency was quite low, and DPDK issued warnings such as:

avx512_vpclmulqdq_get_handlers(): Requirements not met, can't use AVX512
sse42_pclmulqdq_get_handlers(): Requirements not met, can't use SSE
neon_pmull_get_handlers(): Requirements not met, can't use NEON

Here is my DPDK test code. Could you please advise if there is anything I am overlooking?

static void add_crc32_to_mbuf(struct rte_mbuf *mbuf) {
    uint32_t crc;
    uint8_t *crc_pos;
    crc = (uint32_t)rte_net_crc_calc(rte_pktmbuf_mtod(mbuf, uint8_t *), rte_pktmbuf_pkt_len(mbuf), RTE_NET_CRC32_ETH);
    crc_pos = rte_pktmbuf_append(mbuf, sizeof(uint32_t));
    if (crc_pos == NULL) {
        fprintf(stderr, "Failed to append CRC to mbuf\n");
    rte_memcpy(crc_pos, &crc, sizeof(uint32_t));

    mbuf->pkt_len += sizeof(uint32_t);
    mbuf->data_len = mbuf->pkt_len;

Thank you very much for your time and assistance.

Best regards.