I encounter an error CQE when launch a put_singal kernel with doca_dpa_kernel_launch_update_set
. All my codes run in a single thread.
The error message I receive is:
/ 6/Received CQE of type ERR_CQE with the following attributes: srqn_or_user_index 0x0, vendor_err_synd 0x88, syndrome 0x13, s_wqe_opcode_qpn 0xa56b0008, wqe_counter 0x200, signature 0x72, op_own 0xd2
The kernel logic is as follows:
__dpa_global__ void put_signal(doca_dpa_dev_ep_t ep,
doca_dpa_dev_mem_t src_mem,
uint64_t src_addr,
uint32_t src_rkey,
uint64_t dst_addr,
uint32_t dst_rkey,
doca_dpa_dev_sync_event_remote_t event)
{
int rank = doca_dpa_dev_thread_rank();
doca_dpa_dev_printf("Thread %d starts!\n", rank);
int length = 64;
doca_dpa_dev_put_signal_set_nb(
ep, src_addr + rank * length, src_mem, length,
dst_addr + rank * length, dst_rkey,
event, 1);
doca_dpa_dev_printf("Thread %d posts a request!\n", rank);
doca_dpa_dev_ep_synchronize(ep);
doca_dpa_dev_printf("Thread %d finishes sending data!\n", rank);
}
My bug is ok when I run doca_dpa_dev_put_nb
in kernel, but is buggy in the above case. I wonder what vendor_err_synd 0x88 means and what should I do to fix it, thanks.