CUDA Error: an illegal instruction was encountered when use cudaHostAlloc

Hi,
Sometimes when I start the program, no error is reported. Sometimes an error will be reported: CUDA Error: an illegal instruction was encountered. My code is as follows:

        if (bgr_cpu_raw_buffer_ == nullptr || bgr_gpu_raw_buffer_ == nullptr) {
            cudaHostAlloc(reinterpret_cast<void **>(&bgr_cpu_raw_buffer_),
                          width_ * height_ * 3 / 2, cudaHostAllocMapped);
            cudaHostGetDevicePointer(
                reinterpret_cast<void **>(&bgr_gpu_raw_buffer_),
                reinterpret_cast<void *>(bgr_cpu_raw_buffer_), 0);
        }
        memcpy(bgr_cpu_raw_buffer_, av_frame_decode_->data[0], size0);
        memcpy(bgr_cpu_raw_buffer_ + size0, av_frame_decode_->data[1], size1);
        memcpy(bgr_cpu_raw_buffer_ + size0 + size1, av_frame_decode_->data[2],
               size2);
        cuda_error = senseAD::adHal::cudaYUV420PToBGR(
                bgr_gpu_raw_buffer_, bgr_gpu_out, width_, height_,
                av_frame_decode_->linesize);
        if (cuda_error != cudaSuccess) {
            AD_LERROR(FFmpegH264Decoder)
                << "CUDA Error: " << cudaGetErrorString(cuda_error);
        }

This error occurs randomly, but the probability of occurrence is quite high.

cudaError_t cudaYUV420PToBGR(unsigned char *inputYUV,
                             unsigned char *output,
                             size_t width,
                             size_t height,
                             int linesize[3],
                             cudaStream_t stream) {
    if (!inputYUV || !output) {
        return cudaErrorInvalidDevicePointer;
    }
    if (width == 0 || height == 0) {
        return cudaErrorInvalidValue;
    }
    unsigned int blockSize = 1024;
    unsigned int numBlocks = (width / 2 + blockSize - 1) / blockSize;
    YUV420PToBGR<<<numBlocks, blockSize>>>(
        inputYUV, output, width, height, linesize[0], linesize[1], linesize[2]);
    return cudaGetLastError();
}

Hi,

Would you mind sharing a runnable source with us?
We want to reproduce this internally to gather more info first.

More, which JetPack version do you use?

Thanks.

Hi,
I can’t share the executable source code, but I can describe the situation. I have seven channels of video that need to be decoded through ffmpeg-nvidia. My problematic code is to copy the decoded data and convert it to bgr. But if I apply for cudaHostAlloc data, this error will occasionally appear when I decode seven-channel video, and I can guarantee that it will not appear when decoding six-channel video.
And my JetPack version is:

jetson_release 
 - NVIDIA Jetson Xavier NX (Developer Kit Version)
   * Jetpack 4.4 [L4T 32.4.3]
   * NV Power Mode: MODE_15W_6CORE - Type: 2
   * jetson_stats.service: active
 - Libraries:
   * CUDA: 10.2.89
   * cuDNN: 8.0.0.180
   * TensorRT: 7.1.3.0
   * Visionworks: 1.6.0.501
   * OpenCV: NOT_INSTALLED compiled CUDA: NO
   * VPI: 0.3.7
   * Vulkan: 1.2.70

Hi,

Is it possible that there is more than one processor accessing the buffer?
Thanks.

Hi,
I can make sure that only one processor accessing the buffer.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

We will need a reproducible source to get more info about the failure.
Is it possible to write a simple source that can hit the same error?

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.