Error from cudaMemset()

graphics card rtx 3060 ti cuda 11.6. i get cudaErrorIllegalAddress error when i change batch from 1 to 50

typedef bool boxStatus_t;
boxStatus_t * boxStatus{nullptr};
// part of initialization
PostprocessingCudaYolo::init(uint8_t batch)
    int rs = 0;
    rs = cudaMalloc((void **) &(this->boxStatus), YOLO_OUT_SIZE * batch);
    if  ( rs != cudaSuccess) {
        return INIT_ERROR;
// part of running pipeline
PostprocessingCudaYolo::clearVectors(uint8_t batch)
        cudaMemset(this->boxStatus, true , YOLO_OUT_SIZE * batch );
        int r = cudaDeviceSynchronize();
        if (r != cudaSuccess) {
            std::cout << "cudaDeviceSynchronize() return  " << r << std::endl;
            return CUDA_ERROR;
// if batch == 1  OK
// if batch == 50 cudaDeviceSynchronize() return cudaErrorIllegalAddress
// Why??

One of the kernels launched for the processing associated with the batches is running into an execution error. That error shows up asynchronously (later) when you call cudaDeviceSynchronize(). Presumably, the kernel does not hit an error in the batch=1 case, and does hit an error in the batch=50 case.

cudaMemset can launch kernels to do its work. If the error is coming from cudaMemset, then either the pointer passed is invalid, or the size passed is too large for the pointer (allocation) passed.

The pointer passed is the same as the pointer when the memory was allocated, the size of the memory is equals. I check it.

problem is nppiCopy_32f_C3P3R() causes undefined behavior